brw/reg_allocate: Optimize spill offset calculation using more SIMD8

Re-associate the calculation. The current calcuation is

    ((lane + zero_or_8) << 2) + offset

The first addition is SIMD8, and the shift and second addition are
SIMD16. By switching to

    ((lane << 2) + offset) + zero_or_32

All operations are SIMD8.

The SHL operates directly on the UW 0x76543210UV value, and that
eliminates the MOV to expand the UW to UD.

v2: Switch to alternate method. Update for SIMD32 on Xe2.

No shader-db or fossil-db changes on any pre-Gfx12.5 platforms.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17121519 -> 17119962 (<.01%)
instructions in affected programs: 73208 -> 71651 (-2.13%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 129 x̄: 43.25 x̃: 56
helped stats (rel) min: 0.05% max: 4.92% x̄: 2.50% x̃: 2.79%
95% mean confidence interval for instructions value: -56.02 -30.48
95% mean confidence interval for instructions %-change: -3.24% -1.75%
Instructions are helped.

total cycles in shared programs: 895450146 -> 895433316 (<.01%)
cycles in affected programs: 13709400 -> 13692570 (-0.12%)
helped: 31
HURT: 2
helped stats (abs) min: 26 max: 1654 x̄: 543.10 x̃: 672
helped stats (rel) min: <.01% max: 3.43% x̄: 0.43% x̃: 0.51%
HURT stats (abs)   min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -652.42 -367.58
95% mean confidence interval for cycles %-change: -0.61% -0.19%
Cycles are helped.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 210566294 -> 210052706 (-0.24%)
Cycle count: 31582309052 -> 31486266412 (-0.30%); split: -0.30%, +0.00%

Totals from 7091 (1.00% of 707082) affected shaders:
Instrs: 17408115 -> 16894527 (-2.95%)
Cycle count: 6443785290 -> 6347742650 (-1.49%); split: -1.49%, +0.00%

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34886>
This commit is contained in:
Ian Romanick 2023-06-19 16:35:25 -07:00 committed by Marge Bot
parent dbef8f1791
commit 3db8dbfdc3

View file

@ -736,22 +736,26 @@ brw_reg_alloc::build_lane_offsets(const brw_builder &bld, uint32_t spill_offset,
inst = ubld.group(8, 0).MOV(retype(offset, BRW_TYPE_UW),
brw_imm_uv(0x76543210));
_mesa_set_add(spill_insts, inst);
inst = ubld.group(8, 0).MOV(offset, retype(offset, BRW_TYPE_UW));
/* Make the offset a dword */
inst = ubld.group(8, 0).SHL(offset, retype(offset, BRW_TYPE_UW), brw_imm_uw(2));
_mesa_set_add(spill_insts, inst);
/* Add the base offset */
if (spill_offset) {
inst = ubld.group(8, 0).ADD(offset, offset, brw_imm_ud(spill_offset));
_mesa_set_add(spill_insts, inst);
}
/* Build offsets in the upper 8 lanes of SIMD16 */
if (ubld.dispatch_width() > 8) {
inst = ubld.group(8, 0).ADD(
byte_offset(offset, REG_SIZE),
byte_offset(offset, 0),
brw_imm_ud(8));
brw_imm_ud(8 << 2));
_mesa_set_add(spill_insts, inst);
}
/* Make the offset a dword */
inst = ubld.SHL(offset, offset, brw_imm_ud(2));
_mesa_set_add(spill_insts, inst);
/* Build offsets in the upper 16 lanes of SIMD32 */
if (ubld.dispatch_width() > 16) {
inst = ubld.group(16, 0).ADD(
@ -761,12 +765,6 @@ brw_reg_alloc::build_lane_offsets(const brw_builder &bld, uint32_t spill_offset,
_mesa_set_add(spill_insts, inst);
}
/* Add the base offset */
if (spill_offset) {
inst = ubld.ADD(offset, offset, brw_imm_ud(spill_offset));
_mesa_set_add(spill_insts, inst);
}
return offset;
}