ir3/legalize: take wrmask into account for delay updates

When updating delays, we'd update all dst regs based on reg_elems.
However, when wrmask has gaps, this would update delays for regs that
aren't actually written. Fix this by skipping regs for which the
corresponding wrmask bit is zero.

Note that this wasn't just a performance issue but could result in
illegal code because the delay is reset to zero for tex/sfu
instructions. For example, the following (post-legalization) code was
observed in the wild:

(rpt1)add.f r1.w, (r)r2.w, (r)c3.z
sam.base0 (f32)(w)r2.x, r3.y, s#0, t#1
rcp r2.x, r2.x

Here, the add would result in a required delay for r2.x which would then
be cleared by the sam (even though it doesn't write to it), resulting in
insufficient delay before the rcp.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 61b2bd861f ("ir3: Rewrite nop insertion")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34107>
This commit is contained in:
Job Noorman 2025-03-21 08:58:22 +01:00 committed by Marge Bot
parent fb6d933827
commit 84dbd34332

View file

@ -255,6 +255,10 @@ delay_update(struct ir3_legalize_ctx *ctx,
continue;
for (unsigned elem = 0; elem < elems; elem++, num++) {
/* Don't update delays for registers that aren't actually written. */
if (!(dst->flags & IR3_REG_RELATIV) && !(dst->wrmask & (1 << elem)))
continue;
for (unsigned consumer_alu = 0; consumer_alu < 2; consumer_alu++) {
for (unsigned matching_size = 0; matching_size < 2; matching_size++) {
unsigned *ready_slot =