brw: Fully write temporary destinations
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run

Consider an innocuous instruction like:

    and(1) v250:UD, g0.3<0,1,0>:UD, 4294967264u NoMask group0

If register allocation decides to spill v250, it will see this
instruction and say, "Oh no! The other components of v250 aren't set, so
I'd better add a fill before that instruction!"

But it gets even worse than that... if register coalesce decided to
merge two of these, the live range gets massively extended because the
writes don't fully initialize the value. This causes the need to spill
these registers in the first place.

Changing that instruction to SIMD16 on Xe2 or SIMD8 on other platforms
alleviates these issues.

shader-db:

Lunar Lake
total instructions in shared programs: 17118324 -> 17113191 (-0.03%)
instructions in affected programs: 93701 -> 88568 (-5.48%)
helped: 42 / HURT: 6

total cycles in shared programs: 895422566 -> 895079488 (-0.04%)
cycles in affected programs: 30111338 -> 29768260 (-1.14%)
helped: 35 / HURT: 40

total spills in shared programs: 3588 -> 3304 (-7.92%)
spills in affected programs: 285 -> 1 (-99.65%)
helped: 10 / HURT: 0

total fills in shared programs: 2218 -> 1663 (-25.02%)
fills in affected programs: 556 -> 1 (-99.82%)
helped: 10 / HURT: 0

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results.  (Meteor Lake shown)
total instructions in shared programs: 20059218 -> 20053563 (-0.03%)
instructions in affected programs: 96938 -> 91283 (-5.83%)
helped: 43 / HURT: 6

total cycles in shared programs: 884174588 -> 883536475 (-0.07%)
cycles in affected programs: 22105268 -> 21467155 (-2.89%)
helped: 35 / HURT: 27

total spills in shared programs: 5032 -> 4679 (-7.02%)
spills in affected programs: 355 -> 2 (-99.44%)
helped: 12 / HURT: 0

total fills in shared programs: 4782 -> 4113 (-13.99%)
fills in affected programs: 671 -> 2 (-99.70%)
helped: 12 / HURT: 0

Skylake
total instructions in shared programs: 19097658 -> 19097665 (<.01%)
instructions in affected programs: 14202 -> 14209 (0.05%)
helped: 0 / HURT: 5

total cycles in shared programs: 862058109 -> 862058267 (<.01%)
cycles in affected programs: 3450244 -> 3450402 (<.01%)
helped: 7 / HURT: 11

fossil-db:

Lunar Lake
Totals:
Cycle count: 31439652246 -> 31439652272 (+0.00%)

Totals from 2 (0.00% of 707091) affected shaders:
Cycle count: 2602 -> 2628 (+1.00%)

No other Intel platforms had any fossil-db changes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35721>
This commit is contained in:
Ian Romanick 2025-06-20 12:39:36 -07:00 committed by Marge Bot
parent 8a2f43c9bd
commit b83f618fb2

View file

@ -878,10 +878,10 @@ lower_sampler_logical_send(const brw_builder &bld, brw_inst *inst,
* with the ones included in g0.3 bits 4:0. Mask them out.
*/
if (devinfo->ver >= 11) {
sampler_state_ptr = ubld1.vgrf(BRW_TYPE_UD);
ubld1.AND(sampler_state_ptr,
retype(brw_vec1_grf(0, 3), BRW_TYPE_UD),
brw_imm_ud(INTEL_MASK(31, 5)));
sampler_state_ptr = ubld.vgrf(BRW_TYPE_UD);
ubld.AND(sampler_state_ptr,
retype(brw_vec1_grf(0, 3), BRW_TYPE_UD),
brw_imm_ud(INTEL_MASK(31, 5)));
}
if (sampler.file == IMM) {
@ -891,9 +891,9 @@ lower_sampler_logical_send(const brw_builder &bld, brw_inst *inst,
ubld1.ADD(component(header, 3), sampler_state_ptr,
brw_imm_ud(16 * (sampler.ud / 16) * sampler_state_size));
} else {
brw_reg tmp = ubld1.vgrf(BRW_TYPE_UD);
ubld1.AND(tmp, sampler, brw_imm_ud(0x0f0));
ubld1.SHL(tmp, tmp, brw_imm_ud(4));
brw_reg tmp = ubld.vgrf(BRW_TYPE_UD);
ubld.AND(tmp, component(sampler, 0), brw_imm_ud(0x0f0));
ubld.SHL(tmp, tmp, brw_imm_ud(4));
ubld1.ADD(component(header, 3), sampler_state_ptr, tmp);
}
} else if (devinfo->ver >= 11 && shader_opcode_uses_sampler(op)) {