From 1c243f4f8bac2aa5c740fc8e8ea098e77646fdbc Mon Sep 17 00:00:00 2001
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Date: Mon, 23 Jan 2023 11:58:50 +0200
Subject: [PATCH] intel/fs: avoid cmod optimization on instruction with
 different write_mask
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I've been running into failures with tests like :

dEQP-VK.robustness.robustness2.bind.notemplate.rgba32i.unroll.nonvolatile.uniform_buffer_dynamic.no_fmt_qual.len_4.samples_1.1d.frag

With the load_global_const_block_intel NIR intrinsic, you can load a
vec8/vec16 with a predicate. The predicate is correctly uniformized to
feed into the SEND instruction's flag register.

The problem is that a series of optimization first remove the
find_live_channel and then changes the broadcast into a simple MOV
instruction, on the assumption that the first channel is always active
if there is not control flow. This is correct.

But after that the cmod optimzation will remove this instruction :

   mov.nz.f0.0(16) null:D, vgrf16+0.0<0>:D NoMask

because it seems to be equivalent to :

   cmp.g.f0.0(16) vgrf16:D, vgrf12:D, 63d

In this case vgrf16 is the predicate to the load block SEND
instruction. Since the execution mask is different between both, some
of the channels of the SEND instruction end up not being loaded or
loaded with the wrong predication and we end up with incorrect UBO
data.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20852>
(cherry picked from commit a50d2fdb4654984061bffb9293abb4178cbe435f)
---
 .pick_status.json                              | 2 +-
 src/intel/compiler/brw_fs_cmod_propagation.cpp | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/.pick_status.json b/.pick_status.json
index 14d953bbcd4..dae95ebe379 100644
--- a/.pick_status.json
+++ b/.pick_status.json
@@ -1426,7 +1426,7 @@
         "description": "intel/fs: avoid cmod optimization on instruction with different write_mask",
         "nominated": true,
         "nomination_type": 0,
-        "resolution": 0,
+        "resolution": 1,
         "main_sha": null,
         "because_sha": null
     },
diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp b/src/intel/compiler/brw_fs_cmod_propagation.cpp
index 3b38d0925a7..8f0ce321054 100644
--- a/src/intel/compiler/brw_fs_cmod_propagation.cpp
+++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp
@@ -303,6 +303,10 @@ opt_cmod_propagation_local(const intel_device_info *devinfo, bblock_t *block)
                 scan_inst->exec_size != inst->exec_size)
                break;
 
+            /* If the write mask is different we can't propagate. */
+            if (scan_inst->force_writemask_all != inst->force_writemask_all)
+               break;
+
             /* CMP's result is the same regardless of dest type. */
             if (inst->conditional_mod == BRW_CONDITIONAL_NZ &&
                 scan_inst->opcode == BRW_OPCODE_CMP &&