Commit graph

4298 commits

Author SHA1 Message Date
Rhys Perry
43603f9b1d amd: add ac_cu_info::local_invocation_ids_packed
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39992>
2026-02-26 15:49:15 +00:00
Rhys Perry
29f8237d30 amd: move various flags to ac_cu_info
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39992>
2026-02-26 15:49:14 +00:00
Georg Lehmann
8f4de30d05 aco/insert_fp_mode: don't skip setting round for fract
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
fract(-FLT_MIN) is < 1.0 with rtz but 1.0 with rtne.

Fixes: 7212a75c5e ("aco/insert_fp_mode: exclude some instructions that will never round")

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40078>
2026-02-26 00:20:02 +00:00
Rhys Perry
613b4fe407 aco: resolve hazards before calls
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:55 +00:00
Rhys Perry
dfda890ae8 aco: reset all vgpr_used_by_vmem_ in resolve_all_gfx11
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:55 +00:00
Rhys Perry
72923ad2f0 aco: fix VALUReadSGPRHazard with s_call_b64/s_swappc_b64
This probably doesn't do anything because sgpr_read_by_valu are all set
already for raytracing shaders.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:54 +00:00
Georg Lehmann
dd067088ef aco: allow dpp for fp8/bf8 dot4
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
a033cd95a4 aco: allow modifiers for fp16 dot
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
3238e64d3c aco/ra: create v_dot2c_f32_f16
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
237b8ca205 aco: mixed float dot product opcodes
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:52 +00:00
Rhys Perry
af27fb23f3 aco/ra: don't modify parallelcopies if get_reg_for_affinity fails
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes baldurs_gate_3/60c8b7ff623fbb18 with vega10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 310f588f92 ("aco/ra: move variables from affinity register to avoid waitcnt")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39986>
2026-02-20 08:40:55 +00:00
Rhys Perry
75722da909 aco: fix gfx6-8 store_scratch() with function calls
Might happen with radv_emulate_rt=true.

Fixes the_great_circle/a6079328b8df7712 with polaris10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: e006f68b11 ("aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39986>
2026-02-20 08:40:55 +00:00
Rhys Perry
f81aaee7f1 aco/ra: create vectors for affinities of split definitions
For example:
a = ...
b = ...
if {
   c, d = split
}
phi(a, c)
phi(b, d)

This patch will allocate 'a' and 'b' as a vector.

fossil-db (navi31):
Totals from 2556 (3.20% of 79825) affected shaders:
MaxWaves: 59957 -> 59955 (-0.00%)
Instrs: 9170941 -> 9154954 (-0.17%); split: -0.19%, +0.02%
CodeSize: 48245956 -> 48182620 (-0.13%); split: -0.15%, +0.02%
VGPRs: 189372 -> 189900 (+0.28%); split: -0.04%, +0.32%
Latency: 85469322 -> 85262360 (-0.24%); split: -0.32%, +0.08%
InvThroughput: 14515911 -> 14486970 (-0.20%); split: -0.27%, +0.07%
VClause: 197980 -> 197959 (-0.01%); split: -0.02%, +0.01%
Copies: 787838 -> 774288 (-1.72%); split: -1.91%, +0.19%
Branches: 271810 -> 271799 (-0.00%); split: -0.01%, +0.01%
VALU: 5331813 -> 5318566 (-0.25%); split: -0.28%, +0.03%
SALU: 1133559 -> 1133054 (-0.04%); split: -0.05%, +0.01%
VOPD: 2435 -> 2418 (-0.70%); split: +0.12%, -0.82%

fossil-db (navi21):
Totals from 37513 (46.99% of 79825) affected shaders:
Instrs: 26734825 -> 26681225 (-0.20%); split: -0.23%, +0.03%
CodeSize: 141353284 -> 141144360 (-0.15%); split: -0.17%, +0.02%
VGPRs: 1556760 -> 1556384 (-0.02%); split: -0.21%, +0.18%
Latency: 146201548 -> 146156473 (-0.03%); split: -0.20%, +0.17%
InvThroughput: 33921803 -> 33867398 (-0.16%); split: -0.23%, +0.07%
VClause: 502263 -> 502209 (-0.01%); split: -0.27%, +0.26%
SClause: 593142 -> 593155 (+0.00%); split: -0.00%, +0.00%
Copies: 2600995 -> 2551257 (-1.91%); split: -2.16%, +0.25%
Branches: 857910 -> 857787 (-0.01%); split: -0.03%, +0.02%
VALU: 15674532 -> 15625013 (-0.32%); split: -0.35%, +0.04%
SALU: 4635548 -> 4634680 (-0.02%); split: -0.04%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
86f0195f5c aco/ra: prefer phi operands which don't create waitcnt
fossil-db (navi31):
Totals from 89 (0.11% of 79825) affected shaders:
Instrs: 343443 -> 343384 (-0.02%); split: -0.10%, +0.09%
CodeSize: 1792948 -> 1792668 (-0.02%); split: -0.10%, +0.08%
Latency: 2656294 -> 2656490 (+0.01%); split: -0.02%, +0.02%
InvThroughput: 517696 -> 517691 (-0.00%); split: -0.01%, +0.01%
SClause: 9213 -> 9215 (+0.02%); split: -0.01%, +0.03%
Copies: 39138 -> 39089 (-0.13%); split: -0.84%, +0.71%
Branches: 10863 -> 10872 (+0.08%); split: -0.05%, +0.13%
SALU: 49185 -> 49136 (-0.10%); split: -0.67%, +0.57%

fossil-db (navi21):
Totals from 34490 (43.21% of 79825) affected shaders:
Instrs: 23005853 -> 22956529 (-0.21%); split: -0.25%, +0.04%
CodeSize: 120532004 -> 120341412 (-0.16%); split: -0.19%, +0.03%
VGPRs: 1396928 -> 1397520 (+0.04%); split: -0.07%, +0.11%
Latency: 108740068 -> 108499644 (-0.22%); split: -0.53%, +0.30%
InvThroughput: 25286526 -> 25358695 (+0.29%); split: -0.11%, +0.39%
VClause: 421179 -> 421132 (-0.01%); split: -0.29%, +0.27%
SClause: 446414 -> 446423 (+0.00%); split: -0.00%, +0.00%
Copies: 2242236 -> 2243168 (+0.04%); split: -0.42%, +0.46%
Branches: 724556 -> 724903 (+0.05%); split: -0.02%, +0.07%
VALU: 13321078 -> 13321940 (+0.01%); split: -0.07%, +0.08%
SALU: 4069929 -> 4070580 (+0.02%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
310f588f92 aco/ra: move variables from affinity register to avoid waitcnt
If we don't use this affinity register, we're likely to end up moving the
temporary later. If it's a memory instruction destination, that's probably
more expensive than just copying the blocking variables.

fossil-db (navi31):
Totals from 504 (0.63% of 79825) affected shaders:
Instrs: 4108284 -> 4109026 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21226764 -> 21229764 (+0.01%); split: -0.01%, +0.02%
Latency: 26931635 -> 26806989 (-0.46%); split: -0.47%, +0.00%
InvThroughput: 8443520 -> 8439235 (-0.05%); split: -0.06%, +0.01%
VClause: 99209 -> 99314 (+0.11%); split: -0.00%, +0.11%
SClause: 85089 -> 85085 (-0.00%)
Copies: 340323 -> 340993 (+0.20%); split: -0.06%, +0.26%
Branches: 117225 -> 117209 (-0.01%); split: -0.02%, +0.00%
VALU: 2421859 -> 2422529 (+0.03%); split: -0.01%, +0.04%
SALU: 503465 -> 503470 (+0.00%); split: -0.00%, +0.00%

fossil-db (navi21):
Totals from 582 (0.73% of 79825) affected shaders:
Instrs: 3714908 -> 3714990 (+0.00%); split: -0.02%, +0.02%
CodeSize: 19977880 -> 19973076 (-0.02%); split: -0.04%, +0.01%
VGPRs: 40480 -> 40496 (+0.04%)
Latency: 26028895 -> 25772711 (-0.98%); split: -0.99%, +0.00%
InvThroughput: 9827389 -> 9818194 (-0.09%); split: -0.10%, +0.01%
VClause: 103702 -> 103815 (+0.11%); split: -0.02%, +0.13%
SClause: 90861 -> 90857 (-0.00%)
Copies: 335276 -> 335992 (+0.21%); split: -0.09%, +0.30%
Branches: 123912 -> 123897 (-0.01%); split: -0.02%, +0.00%
VALU: 2466032 -> 2466748 (+0.03%); split: -0.01%, +0.04%
SALU: 533658 -> 533667 (+0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
681ec4cba7 aco/ra: track cost of moving variables
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
69bc4efa37 aco/sched_ilp: improve scheduling with VMEM/DS->VALU WaW
This improves scheduling with one side of a divergent branch writing to a
VGPR using VMEM/DS, and the other writing using VALU. At the merge block,
it will properly consider that the VGPR was written by a VMEM/DS.

fossil-db (navi31):
Totals from 1224 (1.53% of 79825) affected shaders:
Instrs: 5264815 -> 5267604 (+0.05%); split: -0.00%, +0.06%
CodeSize: 27406404 -> 27422132 (+0.06%); split: -0.00%, +0.06%
Latency: 48325204 -> 48293975 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8923880 -> 8919191 (-0.05%); split: -0.07%, +0.02%

fossil-db (navi21):
Totals from 1267 (1.59% of 79825) affected shaders:
Instrs: 4628583 -> 4629190 (+0.01%); split: -0.00%, +0.01%
CodeSize: 24974672 -> 24977188 (+0.01%); split: -0.00%, +0.01%
Latency: 45080476 -> 44998120 (-0.18%); split: -0.20%, +0.02%
InvThroughput: 12288202 -> 12269634 (-0.15%); split: -0.16%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
88b6b6db17 aco: only consider cost of memory loads at waitcnt
We don't run this code before waitcnt insertion, so this isn't necessary.

This change improves accuracy in these two situations, because the waitcnt
insertion pass is more aware of divergent control flow:

v0 = valu
if (divergent) {
    v0 = vmem
} else {
    use(v0)
}

v0 = vmem
if (divergent) {
    wait vmcnt(0)
} else {
    wait vmcnt(0)
}
use(v0)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
6963c8dd80 radv,aco/gfx11: preserve s2 when NGG_WAVE_ID_EN=1
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
According to the ISA doc, this is needed for hang recovery.

This works by just avoiding putting temporaries in s0-3 unless they're
precolored there.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
2026-02-16 14:33:58 +00:00
Rhys Perry
b60bff0429 aco: consider 64-bit transcendental normal valu for s_delay_alu
https://github.com/llvm/llvm-project/pull/180940

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39851>
2026-02-13 17:03:34 +00:00
Marek Olšák
d1e6a5c1c8 ac: lower load_num_workgroups in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
a9e47751d2 ac: lower load_subgroup_id for ACO in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
0a9bdcac79 ac: lower load_workgroup_ids for ACO in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Daniel Schürmann
97f095f6e0 aco/lower_branches: Add try_rotate_latch_block() optimization
This optimization looks for unconditional back-edges and aims
to rotate the loop in a way that the final block is emitted
before the loop header, essentially turning

BB1:
  if ()
    goto BB3;
BB2:
  <loop body>
  goto BB1;
BB3:
  ...

into

  goto BB1;
BB2:
  <loop body>
BB1:
  if(!cond)
    goto BB2;
BB3:
  ...

Totals from 4969 (5.89% of 84383) affected shaders: (Navi48)

Instrs: 15253038 -> 15254019 (+0.01%); split: -0.00%, +0.01%
CodeSize: 81225300 -> 81227696 (+0.00%); split: -0.02%, +0.02%
Latency: 320796283 -> 320693480 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 51395922 -> 51376156 (-0.04%); split: -0.04%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
ade5e300ab aco/insert_delay_alu: handle loop latch block before loop body
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
102aca9843 aco/assembler: emit block_kind_loop_latch before the loop header
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
da1594f8bb aco: introduce notion of block_kind_loop_latch
A block annotated with block_kind_loop_latch denotes a block
the re-entry point for a loop back-edge. It is emitted after
the loop preheader and (potentially) before the loop header.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
9887ce6709 aco/print_asm: Sort block markers by block offset
We are going to emit blocks in a different order.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
800a4957bb aco/lower_branches: Consider branch target of nested conditional branches
Totals from 1470 (1.74% of 84383) affected shaders: (Navi48)

Instrs: 5128451 -> 5126842 (-0.03%)
CodeSize: 29359832 -> 29353656 (-0.02%); split: -0.02%, +0.00%
Latency: 41047203 -> 41040786 (-0.02%)
InvThroughput: 6040459 -> 6039619 (-0.01%); split: -0.01%, +0.00%
Branches: 146219 -> 144648 (-1.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
fbf2083b8f aco/isel: Don't emit ELSE side of divergent branches which jump
Totals from 50 (0.06% of 84383) affected shaders: (Navi48)

Instrs: 402490 -> 402444 (-0.01%); split: -0.01%, +0.00%
CodeSize: 2239024 -> 2238864 (-0.01%); split: -0.01%, +0.00%
SpillSGPRs: 1493 -> 1496 (+0.20%)
Latency: 5836785 -> 5836747 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1120893 -> 1120909 (+0.00%); split: -0.00%, +0.00%
Copies: 46128 -> 46082 (-0.10%)
VALU: 222708 -> 222715 (+0.00%); split: -0.00%, +0.00%
SALU: 53039 -> 52993 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
ba32219cf8 aco/isel: Don't emit ELSE side of uniform branches which jump
Totals from 4 (0.00% of 84383) affected shaders: (Navi48)

Instrs: 16473 -> 16468 (-0.03%)
CodeSize: 85276 -> 85300 (+0.03%)
SpillSGPRs: 175 -> 176 (+0.57%)
Latency: 267907 -> 267885 (-0.01%)
InvThroughput: 36302 -> 36298 (-0.01%)
Copies: 1353 -> 1345 (-0.59%)
VALU: 9025 -> 9029 (+0.04%)
SALU: 2635 -> 2627 (-0.30%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
96a639918c aco: don't emit p_logical_start / p_logical_end after divergent branches
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
3743230252 aco/isel: Do IF-simplification if that didn't happen during NIR optimizations
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:43 +00:00
Daniel Schürmann
50b093ec90 aco/builder: Fix v_add_co_u32 carry-out to VCC if post_ra
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:43 +00:00
Georg Lehmann
d7814bcad0 aco: remove redundant can_use_DPP declaration
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39801>
2026-02-11 11:34:29 +00:00
Georg Lehmann
fc7b5d7eed aco/opt_postRA: don't optimize across calls
Could do better by checking which registers are clobbered/preserved,
but that's unlikely to be useful anyway.

Backport-to: 26.0

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39801>
2026-02-11 11:34:29 +00:00
Georg Lehmann
10b12a6ee2 aco: handle all SALU that modifies PC in needs_exec_mask
Calls use swappc.

Backport-to: 26.0

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39801>
2026-02-11 11:34:29 +00:00
Georg Lehmann
421a4dacf0 aco/lower_branches: consider jump target of conditional branches based on vcc
Cc: mesa-stable

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39801>
2026-02-11 11:34:29 +00:00
Georg Lehmann
77d05ac1ba aco/optimizer: stop checking precise for med3
No Foz-DB changes.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641>
2026-02-10 18:42:03 +00:00
Georg Lehmann
e873b8764a aco/optimizer: use nan preserve flag to prevent incorrect med3
No Foz-DB changes.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39641>
2026-02-10 18:42:02 +00:00
Georg Lehmann
0c46053c05 aco/optimzer: apply extract with any uses
Foz-DB Navi48:
Totals from 362 (0.44% of 82405) affected shaders:
MaxWaves: 5052 -> 5066 (+0.28%)
Instrs: 5297858 -> 5294009 (-0.07%); split: -0.09%, +0.01%
CodeSize: 30187188 -> 30177592 (-0.03%); split: -0.05%, +0.02%
VGPRs: 44280 -> 44172 (-0.24%)
Latency: 35632812 -> 35619796 (-0.04%); split: -0.05%, +0.01%
InvThroughput: 7050206 -> 7041058 (-0.13%); split: -0.14%, +0.01%
VClause: 137780 -> 137794 (+0.01%); split: -0.01%, +0.02%
SClause: 114821 -> 114781 (-0.03%)
Copies: 466018 -> 465150 (-0.19%); split: -0.24%, +0.05%
Branches: 171990 -> 171988 (-0.00%)
PreVGPRs: 39268 -> 39084 (-0.47%)
VALU: 2557456 -> 2554297 (-0.12%); split: -0.15%, +0.02%
SALU: 893170 -> 893192 (+0.00%); split: -0.00%, +0.01%
VOPD: 393760 -> 394427 (+0.17%); split: +0.39%, -0.22%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:40 +00:00
Georg Lehmann
85c62f1515 aco/optimizer: only copy propagate p_split_vector if it can be eliminated
Foz-DB Navi48:
Totals from 402 (0.49% of 82405) affected shaders:
Instrs: 3078116 -> 3070117 (-0.26%); split: -0.28%, +0.02%
CodeSize: 17329444 -> 17240360 (-0.51%); split: -0.53%, +0.01%
VGPRs: 48960 -> 48924 (-0.07%); split: -0.12%, +0.05%
SpillVGPRs: 1683 -> 1687 (+0.24%)
Latency: 27758978 -> 27728451 (-0.11%); split: -0.17%, +0.06%
InvThroughput: 5748513 -> 5741761 (-0.12%); split: -0.18%, +0.06%
VClause: 69557 -> 69575 (+0.03%); split: -0.01%, +0.03%
SClause: 74850 -> 74866 (+0.02%)
Copies: 338241 -> 329400 (-2.61%); split: -2.71%, +0.10%
Branches: 118443 -> 118431 (-0.01%)
PreVGPRs: 44561 -> 44598 (+0.08%)
VALU: 1463081 -> 1455438 (-0.52%); split: -0.56%, +0.04%
SALU: 574113 -> 574013 (-0.02%); split: -0.03%, +0.01%
VMEM: 105789 -> 105797 (+0.01%)
VOPD: 140203 -> 139009 (-0.85%); split: +0.44%, -1.29%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
5ecc800edd aco/optimizer: add second copy prop for pseudo instructions
Foz-DB Navi48:
Totals from 28 (0.03% of 82405) affected shaders:
Instrs: 144993 -> 144645 (-0.24%); split: -0.26%, +0.02%
CodeSize: 784668 -> 783604 (-0.14%); split: -0.19%, +0.05%
SpillVGPRs: 215 -> 209 (-2.79%)
Latency: 2529900 -> 2526895 (-0.12%); split: -0.12%, +0.00%
InvThroughput: 775379 -> 773859 (-0.20%); split: -0.20%, +0.00%
VClause: 2815 -> 2803 (-0.43%)
Copies: 23474 -> 23170 (-1.30%); split: -1.38%, +0.09%
Branches: 4638 -> 4632 (-0.13%)
VALU: 81924 -> 81620 (-0.37%); split: -0.40%, +0.03%
SALU: 23986 -> 23995 (+0.04%); split: -0.03%, +0.07%
VMEM: 3726 -> 3714 (-0.32%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
269007faf3 aco/optimizer: apply byte p_split_vector as extract
Foz-DB Navi48:
Totals from 80 (0.10% of 82405) affected shaders:
Instrs: 3022374 -> 3024178 (+0.06%); split: -0.00%, +0.06%
CodeSize: 17396984 -> 17403108 (+0.04%); split: -0.00%, +0.04%
Latency: 17685547 -> 17687073 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 3622683 -> 3622618 (-0.00%); split: -0.02%, +0.02%
VClause: 83840 -> 83841 (+0.00%)
Copies: 242072 -> 242528 (+0.19%); split: -0.01%, +0.20%
Branches: 81582 -> 81578 (-0.00%)
PreVGPRs: 7536 -> 7527 (-0.12%)
VALU: 1520822 -> 1521762 (+0.06%); split: -0.01%, +0.07%
VOPD: 294392 -> 293908 (-0.16%); split: +0.03%, -0.20%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
b21b36b6ab aco/optimizer: apply further extracts to v_cvt_f32_ubyte
Foz-DB Navi48:
Totals from 21 (0.03% of 82405) affected shaders:
Instrs: 2818255 -> 2817482 (-0.03%)
CodeSize: 16282360 -> 16273080 (-0.06%)
Latency: 14172672 -> 14172405 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 2728551 -> 2728493 (-0.00%); split: -0.00%, +0.00%
Copies: 213703 -> 212973 (-0.34%)
VALU: 1407351 -> 1406585 (-0.05%)
VOPD: 291185 -> 291221 (+0.01%); split: +0.04%, -0.03%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
08f9bad0b5 aco/isel: avoid extracts for continuous alu src components
Helps fp8 FSR4, hurts parallel_rdp.

Foz-DB Navi48:
Totals from 23 (0.03% of 82405) affected shaders:
MaxWaves: 380 -> 383 (+0.79%)
Instrs: 71228 -> 71487 (+0.36%); split: -0.26%, +0.62%
CodeSize: 411500 -> 415004 (+0.85%); split: -0.21%, +1.06%
VGPRs: 2856 -> 2784 (-2.52%)
Latency: 1654160 -> 1665555 (+0.69%); split: -0.14%, +0.83%
InvThroughput: 354145 -> 361122 (+1.97%); split: -0.10%, +2.07%
VClause: 1557 -> 1541 (-1.03%); split: -1.41%, +0.39%
Copies: 9857 -> 10059 (+2.05%); split: -1.76%, +3.80%
PreVGPRs: 2285 -> 2182 (-4.51%); split: -4.73%, +0.22%
VALU: 38873 -> 39066 (+0.50%); split: -0.47%, +0.96%
VOPD: 1237 -> 1246 (+0.73%); split: +1.13%, -0.40%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
a0c663378c aco/isel: split vector into dwords/words first
Foz-DB Navi48:
Totals from 361 (0.44% of 82405) affected shaders:
MaxWaves: 5806 -> 5832 (+0.45%)
Instrs: 2343746 -> 2343762 (+0.00%); split: -0.04%, +0.04%
CodeSize: 13270504 -> 13267116 (-0.03%); split: -0.10%, +0.08%
VGPRs: 42008 -> 41708 (-0.71%)
SpillVGPRs: 308 -> 303 (-1.62%)
Scratch: 1574656 -> 1574400 (-0.02%)
Latency: 26571385 -> 22602486 (-14.94%); split: -14.95%, +0.01%
InvThroughput: 5474157 -> 4614777 (-15.70%); split: -15.70%, +0.00%
VClause: 57512 -> 57515 (+0.01%); split: -0.03%, +0.03%
SClause: 56313 -> 56319 (+0.01%)
Copies: 251626 -> 248707 (-1.16%); split: -1.24%, +0.08%
Branches: 89620 -> 89614 (-0.01%)
PreVGPRs: 37361 -> 36910 (-1.21%); split: -1.21%, +0.01%
VALU: 1111534 -> 1108507 (-0.27%); split: -0.29%, +0.02%
SALU: 443684 -> 443687 (+0.00%); split: -0.00%, +0.00%
VMEM: 85287 -> 85277 (-0.01%)
VOPD: 97987 -> 98091 (+0.11%); split: +0.30%, -0.20%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
1a3e627223 aco: improve emit_extract_vector for vector of vecs
No Foz-DB changes, but nessecary for dword first splits.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
1b491cc51a aco/optimizer: don't remove label_extract for splits
No Foz-DB changes, but will become nessecary with dword first splits.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00
Georg Lehmann
66f2a35954 aco/optimizer: repeat vector of split opt
Foz-DB Navi48:
Totals from 13 (0.02% of 82405) affected shaders:
Instrs: 12071 -> 12119 (+0.40%); split: -0.07%, +0.46%
CodeSize: 86908 -> 86960 (+0.06%); split: -0.29%, +0.35%
Latency: 104959 -> 105385 (+0.41%); split: -0.60%, +1.00%
InvThroughput: 46518 -> 46598 (+0.17%); split: -0.03%, +0.20%
VClause: 515 -> 506 (-1.75%); split: -3.11%, +1.36%
SClause: 32 -> 30 (-6.25%)
Copies: 973 -> 1038 (+6.68%); split: -0.82%, +7.50%
PreVGPRs: 1185 -> 1191 (+0.51%)
VALU: 7126 -> 7166 (+0.56%); split: -0.08%, +0.65%
SALU: 1127 -> 1129 (+0.18%)
VOPD: 1516 -> 1539 (+1.52%); split: +1.78%, -0.26%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39532>
2026-02-06 11:29:39 +00:00