From cabc8c606f9e0ba5c1ff04e9a6671d32c40f91e6 Mon Sep 17 00:00:00 2001 From: Job Noorman Date: Fri, 21 Mar 2025 08:58:22 +0100 Subject: [PATCH] ir3/legalize: take wrmask into account for delay updates When updating delays, we'd update all dst regs based on reg_elems. However, when wrmask has gaps, this would update delays for regs that aren't actually written. Fix this by skipping regs for which the corresponding wrmask bit is zero. Note that this wasn't just a performance issue but could result in illegal code because the delay is reset to zero for tex/sfu instructions. For example, the following (post-legalization) code was observed in the wild: (rpt1)add.f r1.w, (r)r2.w, (r)c3.z sam.base0 (f32)(w)r2.x, r3.y, s#0, t#1 rcp r2.x, r2.x Here, the add would result in a required delay for r2.x which would then be cleared by the sam (even though it doesn't write to it), resulting in insufficient delay before the rcp. Signed-off-by: Job Noorman Fixes: 61b2bd861f9 ("ir3: Rewrite nop insertion") Part-of: (cherry picked from commit 84dbd34332fe63169bb48ff7741032d4857c43b5) --- .pick_status.json | 2 +- src/freedreno/ir3/ir3_legalize.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/.pick_status.json b/.pick_status.json index 018e41f8228..7e112fce65a 100644 --- a/.pick_status.json +++ b/.pick_status.json @@ -3844,7 +3844,7 @@ "description": "ir3/legalize: take wrmask into account for delay updates", "nominated": true, "nomination_type": 2, - "resolution": 0, + "resolution": 1, "main_sha": null, "because_sha": "61b2bd861f97affbfdbe7e92eccb3dd4f7e65609", "notes": null diff --git a/src/freedreno/ir3/ir3_legalize.c b/src/freedreno/ir3/ir3_legalize.c index 3e6ddfd3ca4..043e2190ecc 100644 --- a/src/freedreno/ir3/ir3_legalize.c +++ b/src/freedreno/ir3/ir3_legalize.c @@ -247,6 +247,10 @@ delay_update(struct ir3_legalize_ctx *ctx, continue; for (unsigned elem = 0; elem < elems; elem++, num++) { + /* Don't update delays for registers that aren't actually written. */ + if (!(dst->flags & IR3_REG_RELATIV) && !(dst->wrmask & (1 << elem))) + continue; + for (unsigned consumer_alu = 0; consumer_alu < 2; consumer_alu++) { for (unsigned matching_size = 0; matching_size < 2; matching_size++) { unsigned *ready_slot =