aco/insert_NOPs: explicitly wait for sa_sdst to resolve SALU -> VALU hazards

The assumption that these waits are not required has been proven incorrect
in at least some cases.

Totals from 190 (0.24% of 79825) affected shaders: (Navi31)
Instrs: 499718 -> 500491 (+0.15%)
CodeSize: 2658228 -> 2661916 (+0.14%)
Latency: 5964632 -> 5965453 (+0.01%); split: -0.00%, +0.01%
InvThroughput: 794221 -> 794289 (+0.01%)

Totals from 17093 (21.41% of 79839) affected shaders: (Navi48)
Instrs: 22805214 -> 22854313 (+0.22%)
CodeSize: 121240428 -> 121432904 (+0.16%); split: -0.00%, +0.16%
Latency: 166500300 -> 166530529 (+0.02%); split: -0.00%, +0.02%
InvThroughput: 28770053 -> 28772870 (+0.01%); split: -0.00%, +0.01%

Fixes: 018f45f981 ("aco/insert_NOPs: remove redundant VALUReadSGPRHazard waits")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14516

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39252>
This commit is contained in:
Georg Lehmann 2026-01-10 17:23:01 +01:00 committed by Marge Bot
parent 94f2d110a1
commit 3e10ab34e1

View file

@ -1498,8 +1498,8 @@ handle_instruction_gfx11(State& state, NOP_ctx_gfx11& ctx, aco_ptr<Instruction>&
for (unsigned i = 0; i < op.size(); i++) {
unsigned reg = op.physReg() + i;
/* s_waitcnt_depctr on sa_sdst */
if (ctx.sgpr_read_by_valu_as_lanemask_then_wr_by_salu[reg] && wait.sa_sdst > 0) {
/* s_waitcnt_depctr on sa_sdst, implicit wait.sa_sdst=0 is not enough. */
if (ctx.sgpr_read_by_valu_as_lanemask_then_wr_by_salu[reg]) {
imm &= 0xfffe;
wait.sa_sdst = 0;
}
@ -1619,8 +1619,8 @@ handle_instruction_gfx11(State& state, NOP_ctx_gfx11& ctx, aco_ptr<Instruction>&
for (unsigned i = 0; i < op.size(); i++) {
PhysReg reg = op.physReg().advance(i * 4);
if (ctx.sgpr_read_by_valu_then_wr_by_salu.get(reg) < expiry_count &&
wait.sa_sdst > 0) {
/* Implicit wait.sa_sdst=0 is not enough. */
if (ctx.sgpr_read_by_valu_then_wr_by_salu.get(reg) < expiry_count) {
imm &= 0xfffe;
wait.sa_sdst = 0;
}