brw: Fix register unit calculation in SIMD32 LOAD_PAYLOAD lowering

We were wanting to check if the destination region spanned multiple
registers.  But we were checking against REG_SIZE, when the register
size is actually REG_SIZE * reg_unit(devinfo) now.

This meant that SIMD32 LOAD_PAYLOAD was always getting SIMD-split
on Xe2 platforms, generating a lot of unnecessary mess for compute
shaders.

fossil-db results on Lunar Lake:

   Totals:
   Instrs: 146178614 -> 143291988 (-1.97%); split: -1.98%, +0.00%
   Subgroup size: 11089632 -> 11089376 (-0.00%); split: +0.00%, -0.00%
   Cycle count: 22528892444 -> 22507551650 (-0.09%); split: -0.12%, +0.03%
   Max live registers: 48834202 -> 48886685 (+0.11%); split: -0.09%, +0.20%

   Totals from 134306 (24.10% of 557327) affected shaders:
   Instrs: 28806335 -> 25919709 (-10.02%); split: -10.02%, +0.00%
   Subgroup size: 4297680 -> 4297424 (-0.01%); split: +0.00%, -0.01%
   Cycle count: 956867650 -> 935526856 (-2.23%); split: -2.84%, +0.61%
   Max live registers: 13085711 -> 13138194 (+0.40%); split: -0.33%, +0.73%

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32471>
This commit is contained in:
Kenneth Graunke 2024-12-03 01:41:51 -08:00 committed by Marge Bot
parent dfa4c55a4f
commit 815236b417

View file

@ -438,7 +438,8 @@ brw_fs_get_lowered_simd_width(const fs_visitor *shader, const fs_inst *inst)
case SHADER_OPCODE_LOAD_PAYLOAD: {
const unsigned reg_count =
DIV_ROUND_UP(inst->dst.component_size(inst->exec_size), REG_SIZE);
DIV_ROUND_UP(inst->dst.component_size(inst->exec_size),
REG_SIZE * reg_unit(devinfo));
if (reg_count > 2) {
/* Only LOAD_PAYLOAD instructions with per-channel destination region