From e3328dfa2fe565e9d490c576c551f989cfa06a3f Mon Sep 17 00:00:00 2001 From: Alyssa Rosenzweig Date: Tue, 25 Nov 2025 16:13:39 -0500 Subject: [PATCH] brw: only initialize sample mask flag if needed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is a refinement of 7c129d93658 ("intel/brw/xe2+: Keep PS sample mask in the f1.0 register whether or not kill is used."). Rather than always insert this move, do so only when we'll actually read the register: for memory writes and for discards. This deletes an instruction from piles of fragment shaders. shader-db on LNL: total instructions in shared programs: 17134031 -> 17042706 (-0.53%) instructions in affected programs: 9065743 -> 8974418 (-1.01%) helped: 65045 HURT: 0 helped stats (abs) min: 1.0 max: 3.0 x̄: 1.40 x̃: 1 helped stats (rel) min: <.01% max: 50.00% x̄: 3.06% x̃: 1.64% 95% mean confidence interval for instructions value: -1.41 -1.40 95% mean confidence interval for instructions %-change: -3.10% -3.03% Instructions are helped. total cycles in shared programs: 885172098 -> 884835306 (-0.04%) cycles in affected programs: 590294230 -> 589957438 (-0.06%) helped: 53636 HURT: 4500 helped stats (abs) min: 2.0 max: 1126.0 x̄: 8.02 x̃: 4 helped stats (rel) min: <.01% max: 50.00% x̄: 1.24% x̃: 0.24% HURT stats (abs) min: 2.0 max: 7706.0 x̄: 20.77 x̃: 6 HURT stats (rel) min: <.01% max: 82.06% x̄: 1.09% x̃: 0.54% 95% mean confidence interval for cycles value: -6.15 -5.43 95% mean confidence interval for cycles %-change: -1.10% -1.02% Cycles are helped. LOST: 385 GAINED: 47 Signed-off-by: Alyssa Rosenzweig Reviewed-by: Lionel Landwerlin Part-of: --- src/intel/compiler/brw/brw_compile_fs.cpp | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/intel/compiler/brw/brw_compile_fs.cpp b/src/intel/compiler/brw/brw_compile_fs.cpp index b0430168636..cf42ebb570f 100644 --- a/src/intel/compiler/brw/brw_compile_fs.cpp +++ b/src/intel/compiler/brw/brw_compile_fs.cpp @@ -1372,9 +1372,12 @@ run_fs(brw_shader &s, bool allow_spilling, bool do_rep_send) } /* We handle discards by keeping track of the still-live pixels in f0.1. - * Initialize it with the dispatched pixels. + * On Xe2+, we also predicate stores with this mask. Initialize it with + * the dispatched pixels if we use discard or (on Xe2) memory stores. */ - if (devinfo->ver >= 20 || wm_prog_data->uses_kill) { + if ((devinfo->ver >= 20 && nir->info.writes_memory) || + wm_prog_data->uses_kill) { + const unsigned lower_width = MIN2(s.dispatch_width, 16); for (unsigned i = 0; i < s.dispatch_width / lower_width; i++) { /* According to the "PS Thread Payload for Normal