From 4bb41156b9b2b74c8b87d67835a5ba4e46d14b83 Mon Sep 17 00:00:00 2001 From: Paulo Zanoni Date: Fri, 18 Jul 2025 16:02:17 -0700 Subject: [PATCH] brw: mark 'volatile' sends as uncached on LSC messages The residencyNonResidentStrict property requires that writes to unbound memory be ignored and reads return zero. We need this property, otherwise vkd3d will claim we don't support DX12. If a shader writes to a variable associated with an unbound memory region (i.e., mapped to a null tile), reads it back (in the same shader) and expects the value be 0 instead of what is wrote, it has to use the 'volatile' access qualifier to the variable associated with the access, otherwise the compiler will be allowed to optmize things and use the non-zero value. This is explained in the "Accessing Unbound Regions" section of the Vulkan spec. Our hardware adds an extra problem on top of the above. BSpec page "Overview of Memory Access" (47630, 57046) says: "If a read from a Null tile gets a cache-hit in a virtually-addressed GPU cache, then the read may not return zeroes." So, when we detect this type of access, we have to turn off the caching. There's a proposed Vulkan CTS test that does exactly the above. No shaders on shader_db seem to be using 'volatile'. v2: - Reorder commit order - Rewrite commit message v3: - Rework the patch after Caio pointed out the interaction with 'coherent'. - Remove previous R-B tags due to the patch differences. v4: - Rework the patch and commit message again after further discussions. v5: - Check for atomic first so we don't regress DG2 atomic tests. Fixes future test: dEQP-VK.sparse_resources.buffer.ssbo.read_write.sparse_residency_non_resident_strict Reviewed-by: Caio Oliveira Signed-off-by: Paulo Zanoni Part-of: --- src/intel/compiler/brw_lower_logical_sends.cpp | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_lower_logical_sends.cpp b/src/intel/compiler/brw_lower_logical_sends.cpp index 1aab7ca644d..a13ae0a5d90 100644 --- a/src/intel/compiler/brw_lower_logical_sends.cpp +++ b/src/intel/compiler/brw_lower_logical_sends.cpp @@ -1574,10 +1574,25 @@ lower_lsc_memory_logical_send(const brw_builder &bld, brw_inst *inst) * * Atomic messages are always forced to "un-cacheable" in the L1 * cache. + * + * Bspec: Overview of memory Access: + * + * If a read from a Null tile gets a cache-hit in a virtually-addressed + * GPU cache, then the read may not return zeroes. + * + * If a shader writes to a null tile and wants to be able to read it back + * as zero, it will use the 'volatile' decoration for the access, otherwise + * the compiler may choose to optimize things out, breaking the + * residencyNonResidentStrict guarantees. Due to the above, we need to make + * these operations uncached. */ unsigned cache_mode = lsc_opcode_is_atomic(op) ? (unsigned) LSC_CACHE(devinfo, STORE, L1UC_L3WB) : - lsc_opcode_is_store(op) ? (unsigned) LSC_CACHE(devinfo, STORE, L1STATE_L3MOCS) : + volatile_access ? + (lsc_opcode_is_store(op) ? + (unsigned) LSC_CACHE(devinfo, STORE, L1UC_L3UC) : + (unsigned) LSC_CACHE(devinfo, LOAD, L1UC_L3UC)) : + lsc_opcode_is_store(op) ? (unsigned) LSC_CACHE(devinfo, STORE, L1STATE_L3MOCS) : (unsigned) LSC_CACHE(devinfo, LOAD, L1STATE_L3MOCS); /* If we're a fragment shader, we have to predicate with the sample mask to