From 815236b41788a3612a8bf37bcc692def8cf8957a Mon Sep 17 00:00:00 2001 From: Kenneth Graunke Date: Tue, 3 Dec 2024 01:41:51 -0800 Subject: [PATCH] brw: Fix register unit calculation in SIMD32 LOAD_PAYLOAD lowering We were wanting to check if the destination region spanned multiple registers. But we were checking against REG_SIZE, when the register size is actually REG_SIZE * reg_unit(devinfo) now. This meant that SIMD32 LOAD_PAYLOAD was always getting SIMD-split on Xe2 platforms, generating a lot of unnecessary mess for compute shaders. fossil-db results on Lunar Lake: Totals: Instrs: 146178614 -> 143291988 (-1.97%); split: -1.98%, +0.00% Subgroup size: 11089632 -> 11089376 (-0.00%); split: +0.00%, -0.00% Cycle count: 22528892444 -> 22507551650 (-0.09%); split: -0.12%, +0.03% Max live registers: 48834202 -> 48886685 (+0.11%); split: -0.09%, +0.20% Totals from 134306 (24.10% of 557327) affected shaders: Instrs: 28806335 -> 25919709 (-10.02%); split: -10.02%, +0.00% Subgroup size: 4297680 -> 4297424 (-0.01%); split: +0.00%, -0.01% Cycle count: 956867650 -> 935526856 (-2.23%); split: -2.84%, +0.61% Max live registers: 13085711 -> 13138194 (+0.40%); split: -0.33%, +0.73% Reviewed-by: Ian Romanick Part-of: --- src/intel/compiler/brw_fs_lower_simd_width.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_fs_lower_simd_width.cpp b/src/intel/compiler/brw_fs_lower_simd_width.cpp index b751d56dca9..db09a3bb2cd 100644 --- a/src/intel/compiler/brw_fs_lower_simd_width.cpp +++ b/src/intel/compiler/brw_fs_lower_simd_width.cpp @@ -438,7 +438,8 @@ brw_fs_get_lowered_simd_width(const fs_visitor *shader, const fs_inst *inst) case SHADER_OPCODE_LOAD_PAYLOAD: { const unsigned reg_count = - DIV_ROUND_UP(inst->dst.component_size(inst->exec_size), REG_SIZE); + DIV_ROUND_UP(inst->dst.component_size(inst->exec_size), + REG_SIZE * reg_unit(devinfo)); if (reg_count > 2) { /* Only LOAD_PAYLOAD instructions with per-channel destination region