From 4bb41156b9b2b74c8b87d67835a5ba4e46d14b83 Mon Sep 17 00:00:00 2001
From: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Fri, 18 Jul 2025 16:02:17 -0700
Subject: [PATCH] brw: mark 'volatile' sends as uncached on LSC messages

The residencyNonResidentStrict property requires that writes to
unbound memory be ignored and reads return zero. We need this
property, otherwise vkd3d will claim we don't support DX12.

If a shader writes to a variable associated with an unbound memory
region (i.e., mapped to a null tile), reads it back (in the same
shader) and expects the value be 0 instead of what is wrote, it has to
use the 'volatile' access qualifier to the variable associated with
the access, otherwise the compiler will be allowed to optmize things
and use the non-zero value.  This is explained in the "Accessing
Unbound Regions" section of the Vulkan spec.

Our hardware adds an extra problem on top of the above. BSpec page
"Overview of Memory Access" (47630, 57046) says:

  "If a read from a Null tile gets a cache-hit in a
   virtually-addressed GPU cache, then the read may not return
   zeroes."

So, when we detect this type of access, we have to turn off the
caching.

There's a proposed Vulkan CTS test that does exactly the above.

No shaders on shader_db seem to be using 'volatile'.

v2:
 - Reorder commit order
 - Rewrite commit message

v3:
 - Rework the patch after Caio pointed out the interaction with
   'coherent'.
 - Remove previous R-B tags due to the patch differences.

v4:
 - Rework the patch and commit message again after further
   discussions.

v5:
 - Check for atomic first so we don't regress DG2 atomic tests.

Fixes future test: dEQP-VK.sparse_resources.buffer.ssbo.read_write.sparse_residency_non_resident_strict

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36150>
---
 src/intel/compiler/brw_lower_logical_sends.cpp | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_lower_logical_sends.cpp b/src/intel/compiler/brw_lower_logical_sends.cpp
index 1aab7ca644d..a13ae0a5d90 100644
--- a/src/intel/compiler/brw_lower_logical_sends.cpp
+++ b/src/intel/compiler/brw_lower_logical_sends.cpp
@@ -1574,10 +1574,25 @@ lower_lsc_memory_logical_send(const brw_builder &bld, brw_inst *inst)
     *
     *    Atomic messages are always forced to "un-cacheable" in the L1
     *    cache.
+    *
+    * Bspec: Overview of memory Access:
+    *
+    *   If a read from a Null tile gets a cache-hit in a virtually-addressed
+    *   GPU cache, then the read may not return zeroes.
+    *
+    * If a shader writes to a null tile and wants to be able to read it back
+    * as zero, it will use the 'volatile' decoration for the access, otherwise
+    * the compiler may choose to optimize things out, breaking the
+    * residencyNonResidentStrict guarantees. Due to the above, we need to make
+    * these operations uncached.
     */
    unsigned cache_mode =
       lsc_opcode_is_atomic(op) ? (unsigned) LSC_CACHE(devinfo, STORE, L1UC_L3WB) :
-      lsc_opcode_is_store(op)  ? (unsigned) LSC_CACHE(devinfo, STORE, L1STATE_L3MOCS) :
+      volatile_access ?
+         (lsc_opcode_is_store(op) ?
+            (unsigned) LSC_CACHE(devinfo, STORE, L1UC_L3UC) :
+            (unsigned) LSC_CACHE(devinfo, LOAD, L1UC_L3UC)) :
+      lsc_opcode_is_store(op) ? (unsigned) LSC_CACHE(devinfo, STORE, L1STATE_L3MOCS) :
       (unsigned) LSC_CACHE(devinfo, LOAD, L1STATE_L3MOCS);
 
    /* If we're a fragment shader, we have to predicate with the sample mask to