brw/reg_allocate: Require SIMD32 for destination / source interference on Xe2

No platforms other than Lunar Lake were affected in shader-db or
fossil-db for obvious reasons.

shader-db:

Lunar Lake
total instructions in shared programs: 17070074 -> 17069908 (<.01%)
instructions in affected programs: 151939 -> 151773 (-0.11%)
helped: 61 / HURT: 60

total cycles in shared programs: 891338314 -> 880188516 (-1.25%)
cycles in affected programs: 550482120 -> 539332322 (-2.03%)
helped: 8053 / HURT: 7183

total spills in shared programs: 3294 -> 3278 (-0.49%)
spills in affected programs: 138 -> 122 (-11.59%)
helped: 8 / HURT: 0

total fills in shared programs: 1653 -> 1632 (-1.27%)
fills in affected programs: 212 -> 191 (-9.91%)
helped: 8 / HURT: 0

LOST:   96
GAINED: 70

fossil-db:

Lunar Lake
Totals:
Instrs: 208555066 -> 208509387 (-0.02%); split: -0.03%, +0.00%
Cycle count: 31487691872 -> 31318442816 (-0.54%); split: -0.88%, +0.34%
Spill count: 508701 -> 504809 (-0.77%); split: -0.86%, +0.10%
Fill count: 612583 -> 607047 (-0.90%); split: -1.03%, +0.13%
Scratch Memory Size: 35311616 -> 35037184 (-0.78%); split: -0.81%, +0.04%

Totals from 214417 (30.33% of 706852) affected shaders:
Instrs: 123732970 -> 123687291 (-0.04%); split: -0.04%, +0.01%
Cycle count: 27410928904 -> 27241679848 (-0.62%); split: -1.01%, +0.39%
Spill count: 452458 -> 448566 (-0.86%); split: -0.97%, +0.11%
Fill count: 550991 -> 545455 (-1.00%); split: -1.15%, +0.14%
Scratch Memory Size: 31138816 -> 30864384 (-0.88%); split: -0.92%, +0.04%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35903>
This commit is contained in:
Ian Romanick 2025-07-01 16:21:03 -07:00 committed by Marge Bot
parent e9ae997ffc
commit 4e05de7c3d

View file

@ -536,16 +536,29 @@ brw_reg_alloc::setup_inst_interference(const brw_inst *inst)
}
/* A compressed instruction is actually two instructions executed
* simultaneously. On most platforms, it ok to have the source and
* destination registers be the same. In this case, each instruction
* over-writes its own source and there's no problem. The real problem
* here is if the source and destination registers are off by one. Then
* you can end up in a scenario where the first instruction over-writes the
* source of the second instruction. Since the compiler doesn't know about
* this level of granularity, we simply make the source and destination
* interfere.
* simultaneously. If the source and destination registers are the same,
* each instruction overwrites its own source, and there's no problem. The
* real problem here is if the source and destination registers are off by
* one. Then you can end up in a scenario where the first instruction
* overwrites the source of the second instruction. Consider this
* instruction:
*
* and(16) g17<1>UD g16<1,1,0>UD g13<1,1,0>UD
*
* The EU processes this as
*
* and(8) g17<1>UD g16<1,1,0>UD g13<1,1,0>UD
* and(8) g18<1>UD g17<1,1,0>UD g14<1,1,0>UD
*
* The first SIMD8 part of the instruction overwrites the source used in
* the second SIMD8 part. Since there's no way to tell the register
* allocator "the destination register number can be src, but it can't be
* src+1," simply make the source and destination interfere.
*
* Theoretically, the register_coalesce passes should have done the dest ==
* src merging.
*/
if (inst->dst.component_size(inst->exec_size) > REG_SIZE &&
if (inst->dst.component_size(inst->exec_size) > (reg_unit(devinfo) * REG_SIZE) &&
inst->dst.file == VGRF) {
for (int i = 0; i < inst->sources; ++i) {
if (inst->src[i].file == VGRF) {