Commit graph

4322 commits

Author SHA1 Message Date
Georg Lehmann
ae2968c4ec aco: allow spilling to LDS in RT shaders without stack pointer
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No Foz-DB changes because most RT shaders use function calls now.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
133ef9f94b aco: spill VGPRs to LDS if it doesn't further limit occupancy
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.

Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.

Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.

Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
17a9ee7152 aco/optimizer: apply dpp to v_dot before RA for gfx10.3
This is a bit unusual, as we otherwise only use the VOP2 codesize
optimization opcodes in the register allocator.

But unless we change the scheduler to not split v_mov_b32_dpp and
v_dot, we have no other choice.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40510>
2026-03-24 09:05:40 +00:00
Emre Cecanpunar
c60e5df798 aco: drop optimizer peephole TODO comment
The remaining items are either handled elsewhere or unlikely to be
implemented in the optimizer.

Signed-off-by: Emre Cecanpunar <emreleno@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40497>
2026-03-23 11:03:59 +00:00
Georg Lehmann
559a35dcb3 aco: skip fract for sin/cos on gfx6-8 if the src is already in range
Foz-DB Polaris10:
Totals from 1301 (1.86% of 69950) affected shaders:
Instrs: 1447217 -> 1445610 (-0.11%); split: -0.11%, +0.00%
CodeSize: 7775988 -> 7769588 (-0.08%); split: -0.08%, +0.00%
SGPRs: 101712 -> 101776 (+0.06%)
SpillSGPRs: 931 -> 927 (-0.43%)
Latency: 16119433 -> 16115293 (-0.03%); split: -0.03%, +0.01%
InvThroughput: 9605952 -> 9577042 (-0.30%); split: -0.31%, +0.01%
VClause: 24591 -> 24593 (+0.01%); split: -0.01%, +0.02%
SClause: 29656 -> 29655 (-0.00%)
Copies: 133968 -> 134001 (+0.02%); split: -0.01%, +0.03%
VALU: 1157855 -> 1156235 (-0.14%)
SALU: 124626 -> 124639 (+0.01%); split: -0.00%, +0.01%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40545>
2026-03-23 09:27:32 +00:00
Marek Olšák
2283244975 nir: change export_amd intrinsics to use target instead of base
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40415>
2026-03-23 06:10:49 +00:00
Marek Olšák
b75a3112fd nir: change export_amd intrinsics to use enabled_channels instead of write_mask
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40415>
2026-03-23 06:10:49 +00:00
Daniel Schürmann
4b238690cb aco/tests: add and lower loop continue constructs in all tests which use continues
We are going to disallow continue statements without
loop continue constructs.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
2026-03-21 07:42:55 +00:00
Rhys Perry
e2ebcba11b aco/tests: fix assembler/isel tests with LLVM 23
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40513>
2026-03-20 10:24:06 +00:00
Rhys Perry
0826685f1b aco/tests: fix assembler tests with LLVM 22
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40513>
2026-03-20 10:24:06 +00:00
Samuel Pitoiset
639207701d aco,radv,radeonsi: remove debug report support in ACO
This doesn't seem very useful since ACO will abort and print the
error messages to stderr.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40379>
2026-03-16 11:55:45 +00:00
Georg Lehmann
d7348ea501 aco/ra: don't tie definition when the operand is in a preserved reg
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
444eb3dce5 aco/ra: try to allocate registers for dot2 to allow VOPD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
788aafba2a aco/sched_vopd: create dot2acc from VOP3P dot2
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
47599b2c38 aco/opt_postRA: remove try_convert_fma_to_vop2
This is now done directly in the VOPD scheduler.

Foz-DB GFX1201:
Totals from 600 (0.52% of 114655) affected shaders:
no stats changed

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
6cef434478 aco/sched_vopd: convert fma with inline constants to fmamk/fmaak
This optimization was previously done in the post-RA optimizer,
but it is more fitting for the vopd scheduler.

Doing it here also has the benefit that we don't unnecessarily use
the constant bus when VOPD can't be used.

No Foz-DB changes on GFX12 until the next commit.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
1ae9931145 aco/scheld_vopd: make VOPDInfo more flexible by adding a swizzle
No Foz-DB changes on GFX1201.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:55 +00:00
Georg Lehmann
08cac48170 aco/isel: skip min/max for SALU fsat if possible
Foz-DB Navi48:
Totals from 789 (0.95% of 82636) affected shaders:
Instrs: 4144156 -> 4141345 (-0.07%); split: -0.07%, +0.00%
CodeSize: 23345212 -> 23333960 (-0.05%); split: -0.05%, +0.00%
Latency: 22988205 -> 22986666 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 4378321 -> 4377874 (-0.01%); split: -0.01%, +0.00%
Copies: 302311 -> 302313 (+0.00%); split: -0.00%, +0.00%
SALU: 647622 -> 645901 (-0.27%); split: -0.27%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Rhys Perry
82420ebc2c aco: fix PS epilog dual-source blending with only one color output
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40005>
2026-03-05 09:38:23 +00:00
Marek Olšák
fae7aef5ca ac: tidy up ac_hw_cache_flags
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40022>
2026-03-04 21:14:56 +00:00
Rhys Perry
5c3b5688a1 amd: rename ac_cu_info to ac_compiler_info
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40042>
2026-03-03 08:50:12 +00:00
Rhys Perry
8801ca188d ac/nir: don't pass radeon_info to ac_nir_set_options
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40042>
2026-03-03 08:50:10 +00:00
Rhys Perry
17b18496f6 aco: perform dce for blocks skipped for process_block()
We might need to DCE users of dead instructions removed by
process_block().

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 9e8ba10447 ("aco/vn: remove dead instructions early")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40091>
2026-03-02 13:38:16 +00:00
Marek Olšák
f22f117d1a amd: add meson variable idep_amd_generated_headers for all generated headers
group all generated header under the same variable

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40084>
2026-02-28 05:23:59 +00:00
Rhys Perry
43603f9b1d amd: add ac_cu_info::local_invocation_ids_packed
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39992>
2026-02-26 15:49:15 +00:00
Rhys Perry
29f8237d30 amd: move various flags to ac_cu_info
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39992>
2026-02-26 15:49:14 +00:00
Georg Lehmann
8f4de30d05 aco/insert_fp_mode: don't skip setting round for fract
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
fract(-FLT_MIN) is < 1.0 with rtz but 1.0 with rtne.

Fixes: 7212a75c5e ("aco/insert_fp_mode: exclude some instructions that will never round")

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40078>
2026-02-26 00:20:02 +00:00
Rhys Perry
613b4fe407 aco: resolve hazards before calls
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:55 +00:00
Rhys Perry
dfda890ae8 aco: reset all vgpr_used_by_vmem_ in resolve_all_gfx11
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:55 +00:00
Rhys Perry
72923ad2f0 aco: fix VALUReadSGPRHazard with s_call_b64/s_swappc_b64
This probably doesn't do anything because sgpr_read_by_valu are all set
already for raytracing shaders.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:54 +00:00
Georg Lehmann
dd067088ef aco: allow dpp for fp8/bf8 dot4
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
a033cd95a4 aco: allow modifiers for fp16 dot
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
3238e64d3c aco/ra: create v_dot2c_f32_f16
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
237b8ca205 aco: mixed float dot product opcodes
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:52 +00:00
Rhys Perry
af27fb23f3 aco/ra: don't modify parallelcopies if get_reg_for_affinity fails
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes baldurs_gate_3/60c8b7ff623fbb18 with vega10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 310f588f92 ("aco/ra: move variables from affinity register to avoid waitcnt")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39986>
2026-02-20 08:40:55 +00:00
Rhys Perry
75722da909 aco: fix gfx6-8 store_scratch() with function calls
Might happen with radv_emulate_rt=true.

Fixes the_great_circle/a6079328b8df7712 with polaris10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: e006f68b11 ("aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39986>
2026-02-20 08:40:55 +00:00
Rhys Perry
f81aaee7f1 aco/ra: create vectors for affinities of split definitions
For example:
a = ...
b = ...
if {
   c, d = split
}
phi(a, c)
phi(b, d)

This patch will allocate 'a' and 'b' as a vector.

fossil-db (navi31):
Totals from 2556 (3.20% of 79825) affected shaders:
MaxWaves: 59957 -> 59955 (-0.00%)
Instrs: 9170941 -> 9154954 (-0.17%); split: -0.19%, +0.02%
CodeSize: 48245956 -> 48182620 (-0.13%); split: -0.15%, +0.02%
VGPRs: 189372 -> 189900 (+0.28%); split: -0.04%, +0.32%
Latency: 85469322 -> 85262360 (-0.24%); split: -0.32%, +0.08%
InvThroughput: 14515911 -> 14486970 (-0.20%); split: -0.27%, +0.07%
VClause: 197980 -> 197959 (-0.01%); split: -0.02%, +0.01%
Copies: 787838 -> 774288 (-1.72%); split: -1.91%, +0.19%
Branches: 271810 -> 271799 (-0.00%); split: -0.01%, +0.01%
VALU: 5331813 -> 5318566 (-0.25%); split: -0.28%, +0.03%
SALU: 1133559 -> 1133054 (-0.04%); split: -0.05%, +0.01%
VOPD: 2435 -> 2418 (-0.70%); split: +0.12%, -0.82%

fossil-db (navi21):
Totals from 37513 (46.99% of 79825) affected shaders:
Instrs: 26734825 -> 26681225 (-0.20%); split: -0.23%, +0.03%
CodeSize: 141353284 -> 141144360 (-0.15%); split: -0.17%, +0.02%
VGPRs: 1556760 -> 1556384 (-0.02%); split: -0.21%, +0.18%
Latency: 146201548 -> 146156473 (-0.03%); split: -0.20%, +0.17%
InvThroughput: 33921803 -> 33867398 (-0.16%); split: -0.23%, +0.07%
VClause: 502263 -> 502209 (-0.01%); split: -0.27%, +0.26%
SClause: 593142 -> 593155 (+0.00%); split: -0.00%, +0.00%
Copies: 2600995 -> 2551257 (-1.91%); split: -2.16%, +0.25%
Branches: 857910 -> 857787 (-0.01%); split: -0.03%, +0.02%
VALU: 15674532 -> 15625013 (-0.32%); split: -0.35%, +0.04%
SALU: 4635548 -> 4634680 (-0.02%); split: -0.04%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
86f0195f5c aco/ra: prefer phi operands which don't create waitcnt
fossil-db (navi31):
Totals from 89 (0.11% of 79825) affected shaders:
Instrs: 343443 -> 343384 (-0.02%); split: -0.10%, +0.09%
CodeSize: 1792948 -> 1792668 (-0.02%); split: -0.10%, +0.08%
Latency: 2656294 -> 2656490 (+0.01%); split: -0.02%, +0.02%
InvThroughput: 517696 -> 517691 (-0.00%); split: -0.01%, +0.01%
SClause: 9213 -> 9215 (+0.02%); split: -0.01%, +0.03%
Copies: 39138 -> 39089 (-0.13%); split: -0.84%, +0.71%
Branches: 10863 -> 10872 (+0.08%); split: -0.05%, +0.13%
SALU: 49185 -> 49136 (-0.10%); split: -0.67%, +0.57%

fossil-db (navi21):
Totals from 34490 (43.21% of 79825) affected shaders:
Instrs: 23005853 -> 22956529 (-0.21%); split: -0.25%, +0.04%
CodeSize: 120532004 -> 120341412 (-0.16%); split: -0.19%, +0.03%
VGPRs: 1396928 -> 1397520 (+0.04%); split: -0.07%, +0.11%
Latency: 108740068 -> 108499644 (-0.22%); split: -0.53%, +0.30%
InvThroughput: 25286526 -> 25358695 (+0.29%); split: -0.11%, +0.39%
VClause: 421179 -> 421132 (-0.01%); split: -0.29%, +0.27%
SClause: 446414 -> 446423 (+0.00%); split: -0.00%, +0.00%
Copies: 2242236 -> 2243168 (+0.04%); split: -0.42%, +0.46%
Branches: 724556 -> 724903 (+0.05%); split: -0.02%, +0.07%
VALU: 13321078 -> 13321940 (+0.01%); split: -0.07%, +0.08%
SALU: 4069929 -> 4070580 (+0.02%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
310f588f92 aco/ra: move variables from affinity register to avoid waitcnt
If we don't use this affinity register, we're likely to end up moving the
temporary later. If it's a memory instruction destination, that's probably
more expensive than just copying the blocking variables.

fossil-db (navi31):
Totals from 504 (0.63% of 79825) affected shaders:
Instrs: 4108284 -> 4109026 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21226764 -> 21229764 (+0.01%); split: -0.01%, +0.02%
Latency: 26931635 -> 26806989 (-0.46%); split: -0.47%, +0.00%
InvThroughput: 8443520 -> 8439235 (-0.05%); split: -0.06%, +0.01%
VClause: 99209 -> 99314 (+0.11%); split: -0.00%, +0.11%
SClause: 85089 -> 85085 (-0.00%)
Copies: 340323 -> 340993 (+0.20%); split: -0.06%, +0.26%
Branches: 117225 -> 117209 (-0.01%); split: -0.02%, +0.00%
VALU: 2421859 -> 2422529 (+0.03%); split: -0.01%, +0.04%
SALU: 503465 -> 503470 (+0.00%); split: -0.00%, +0.00%

fossil-db (navi21):
Totals from 582 (0.73% of 79825) affected shaders:
Instrs: 3714908 -> 3714990 (+0.00%); split: -0.02%, +0.02%
CodeSize: 19977880 -> 19973076 (-0.02%); split: -0.04%, +0.01%
VGPRs: 40480 -> 40496 (+0.04%)
Latency: 26028895 -> 25772711 (-0.98%); split: -0.99%, +0.00%
InvThroughput: 9827389 -> 9818194 (-0.09%); split: -0.10%, +0.01%
VClause: 103702 -> 103815 (+0.11%); split: -0.02%, +0.13%
SClause: 90861 -> 90857 (-0.00%)
Copies: 335276 -> 335992 (+0.21%); split: -0.09%, +0.30%
Branches: 123912 -> 123897 (-0.01%); split: -0.02%, +0.00%
VALU: 2466032 -> 2466748 (+0.03%); split: -0.01%, +0.04%
SALU: 533658 -> 533667 (+0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
681ec4cba7 aco/ra: track cost of moving variables
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
69bc4efa37 aco/sched_ilp: improve scheduling with VMEM/DS->VALU WaW
This improves scheduling with one side of a divergent branch writing to a
VGPR using VMEM/DS, and the other writing using VALU. At the merge block,
it will properly consider that the VGPR was written by a VMEM/DS.

fossil-db (navi31):
Totals from 1224 (1.53% of 79825) affected shaders:
Instrs: 5264815 -> 5267604 (+0.05%); split: -0.00%, +0.06%
CodeSize: 27406404 -> 27422132 (+0.06%); split: -0.00%, +0.06%
Latency: 48325204 -> 48293975 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8923880 -> 8919191 (-0.05%); split: -0.07%, +0.02%

fossil-db (navi21):
Totals from 1267 (1.59% of 79825) affected shaders:
Instrs: 4628583 -> 4629190 (+0.01%); split: -0.00%, +0.01%
CodeSize: 24974672 -> 24977188 (+0.01%); split: -0.00%, +0.01%
Latency: 45080476 -> 44998120 (-0.18%); split: -0.20%, +0.02%
InvThroughput: 12288202 -> 12269634 (-0.15%); split: -0.16%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
88b6b6db17 aco: only consider cost of memory loads at waitcnt
We don't run this code before waitcnt insertion, so this isn't necessary.

This change improves accuracy in these two situations, because the waitcnt
insertion pass is more aware of divergent control flow:

v0 = valu
if (divergent) {
    v0 = vmem
} else {
    use(v0)
}

v0 = vmem
if (divergent) {
    wait vmcnt(0)
} else {
    wait vmcnt(0)
}
use(v0)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
6963c8dd80 radv,aco/gfx11: preserve s2 when NGG_WAVE_ID_EN=1
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
According to the ISA doc, this is needed for hang recovery.

This works by just avoiding putting temporaries in s0-3 unless they're
precolored there.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39720>
2026-02-16 14:33:58 +00:00
Rhys Perry
b60bff0429 aco: consider 64-bit transcendental normal valu for s_delay_alu
https://github.com/llvm/llvm-project/pull/180940

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39851>
2026-02-13 17:03:34 +00:00
Marek Olšák
d1e6a5c1c8 ac: lower load_num_workgroups in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
a9e47751d2 ac: lower load_subgroup_id for ACO in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Marek Olšák
0a9bdcac79 ac: lower load_workgroup_ids for ACO in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39638>
2026-02-13 15:33:19 +00:00
Daniel Schürmann
97f095f6e0 aco/lower_branches: Add try_rotate_latch_block() optimization
This optimization looks for unconditional back-edges and aims
to rotate the loop in a way that the final block is emitted
before the loop header, essentially turning

BB1:
  if ()
    goto BB3;
BB2:
  <loop body>
  goto BB1;
BB3:
  ...

into

  goto BB1;
BB2:
  <loop body>
BB1:
  if(!cond)
    goto BB2;
BB3:
  ...

Totals from 4969 (5.89% of 84383) affected shaders: (Navi48)

Instrs: 15253038 -> 15254019 (+0.01%); split: -0.00%, +0.01%
CodeSize: 81225300 -> 81227696 (+0.00%); split: -0.02%, +0.02%
Latency: 320796283 -> 320693480 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 51395922 -> 51376156 (-0.04%); split: -0.04%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
ade5e300ab aco/insert_delay_alu: handle loop latch block before loop body
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00
Daniel Schürmann
102aca9843 aco/assembler: emit block_kind_loop_latch before the loop header
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39519>
2026-02-13 14:49:44 +00:00