brw: only initialize sample mask flag if needed

This is a refinement of 7c129d9365 ("intel/brw/xe2+: Keep PS sample mask in the
f1.0 register whether or not kill is used."). Rather than always insert this
move, do so only when we'll actually read the register: for memory writes and
for discards. This deletes an instruction from piles of fragment shaders.

shader-db on LNL:

total instructions in shared programs: 17134031 -> 17042706 (-0.53%)
instructions in affected programs: 9065743 -> 8974418 (-1.01%)
helped: 65045
HURT: 0
helped stats (abs) min: 1.0 max: 3.0 x̄: 1.40 x̃: 1
helped stats (rel) min: <.01% max: 50.00% x̄: 3.06% x̃: 1.64%
95% mean confidence interval for instructions value: -1.41 -1.40
95% mean confidence interval for instructions %-change: -3.10% -3.03%
Instructions are helped.

total cycles in shared programs: 885172098 -> 884835306 (-0.04%)
cycles in affected programs: 590294230 -> 589957438 (-0.06%)
helped: 53636
HURT: 4500
helped stats (abs) min: 2.0 max: 1126.0 x̄: 8.02 x̃: 4
helped stats (rel) min: <.01% max: 50.00% x̄: 1.24% x̃: 0.24%
HURT stats (abs)   min: 2.0 max: 7706.0 x̄: 20.77 x̃: 6
HURT stats (rel)   min: <.01% max: 82.06% x̄: 1.09% x̃: 0.54%
95% mean confidence interval for cycles value: -6.15 -5.43
95% mean confidence interval for cycles %-change: -1.10% -1.02%
Cycles are helped.

LOST:   385
GAINED: 47

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38665>
This commit is contained in:
Alyssa Rosenzweig 2025-11-25 16:13:39 -05:00 committed by Marge Bot
parent aa9435f5d1
commit e3328dfa2f

View file

@ -1372,9 +1372,12 @@ run_fs(brw_shader &s, bool allow_spilling, bool do_rep_send)
} }
/* We handle discards by keeping track of the still-live pixels in f0.1. /* We handle discards by keeping track of the still-live pixels in f0.1.
* Initialize it with the dispatched pixels. * On Xe2+, we also predicate stores with this mask. Initialize it with
* the dispatched pixels if we use discard or (on Xe2) memory stores.
*/ */
if (devinfo->ver >= 20 || wm_prog_data->uses_kill) { if ((devinfo->ver >= 20 && nir->info.writes_memory) ||
wm_prog_data->uses_kill) {
const unsigned lower_width = MIN2(s.dispatch_width, 16); const unsigned lower_width = MIN2(s.dispatch_width, 16);
for (unsigned i = 0; i < s.dispatch_width / lower_width; i++) { for (unsigned i = 0; i < s.dispatch_width / lower_width; i++) {
/* According to the "PS Thread Payload for Normal /* According to the "PS Thread Payload for Normal