Commit graph

4331 commits

Author SHA1 Message Date
Rhys Perry
463e3643f2 nir: add and use block predecessor helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
2026-04-08 15:06:32 +00:00
Georg Lehmann
d1ed4e1774 aco/optimizer: do not try to create 3 byte constant operands
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Operand::get_const will assert.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15239
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40828>
2026-04-08 09:17:26 +00:00
Georg Lehmann
792ce7ddf6 aco/isel: optimize 16/64bit non constant valu bit test
By using the constant path we can combine the v_and and the v_cmp.

Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 2833 -> 2831 (-0.07%)
Latency: 27385 -> 27367 (-0.07%)
InvThroughput: 1712 -> 1710 (-0.12%)
VALU: 1301 -> 1299 (-0.15%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705>
2026-04-08 08:44:20 +00:00
Natalie Vock
fded5e321d aco: Nuke ACO-side prolog selection
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
b53dc3f052 aco/lower_to_hw_instr: Run p_init_scratch if the program has a call
Callees may use scratch even if the caller doesn't.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
378c9536de aco/isel: Fix stack_ptr synthesis
info.stack_ptr.is_reg is always true. We have a stack pointer to use
if and only if the program is a callee.

Also, apply_scratch_offset needs to be true in a few more places.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
31e08322d7 aco/spill_preserved: Only compute preserved registers if in a callee
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Georg Lehmann
5453419086 aco/isel: use s_bitcmp1 for 1bit ubfe
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Avoid the s_pack at the cost of having to use scc.

Foz-DB GFX1201:
Totals from 1514 (0.74% of 205032) affected shaders:
Instrs: 3443431 -> 3434096 (-0.27%); split: -0.27%, +0.00%
CodeSize: 19062100 -> 19024320 (-0.20%); split: -0.20%, +0.00%
Latency: 22343329 -> 22342802 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 4471707 -> 4471632 (-0.00%); split: -0.00%, +0.00%
Copies: 280191 -> 279645 (-0.19%); split: -0.21%, +0.01%
PreSGPRs: 71333 -> 71327 (-0.01%)
VALU: 1598064 -> 1598058 (-0.00%); split: -0.00%, +0.00%
SALU: 691458 -> 686437 (-0.73%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40707>
2026-03-31 10:42:33 +00:00
Karol Herbst
5bb3c9f69c nir: rename fsin_amd and fcos_amd to a more generic name
Nvidia implements both the same way as AMD does, so it makes sense to
allow for code sharing here.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40541>
2026-03-31 01:47:29 +02:00
Georg Lehmann
ae2968c4ec aco: allow spilling to LDS in RT shaders without stack pointer
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No Foz-DB changes because most RT shaders use function calls now.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
133ef9f94b aco: spill VGPRs to LDS if it doesn't further limit occupancy
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.

Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.

Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.

Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
17a9ee7152 aco/optimizer: apply dpp to v_dot before RA for gfx10.3
This is a bit unusual, as we otherwise only use the VOP2 codesize
optimization opcodes in the register allocator.

But unless we change the scheduler to not split v_mov_b32_dpp and
v_dot, we have no other choice.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40510>
2026-03-24 09:05:40 +00:00
Emre Cecanpunar
c60e5df798 aco: drop optimizer peephole TODO comment
The remaining items are either handled elsewhere or unlikely to be
implemented in the optimizer.

Signed-off-by: Emre Cecanpunar <emreleno@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40497>
2026-03-23 11:03:59 +00:00
Georg Lehmann
559a35dcb3 aco: skip fract for sin/cos on gfx6-8 if the src is already in range
Foz-DB Polaris10:
Totals from 1301 (1.86% of 69950) affected shaders:
Instrs: 1447217 -> 1445610 (-0.11%); split: -0.11%, +0.00%
CodeSize: 7775988 -> 7769588 (-0.08%); split: -0.08%, +0.00%
SGPRs: 101712 -> 101776 (+0.06%)
SpillSGPRs: 931 -> 927 (-0.43%)
Latency: 16119433 -> 16115293 (-0.03%); split: -0.03%, +0.01%
InvThroughput: 9605952 -> 9577042 (-0.30%); split: -0.31%, +0.01%
VClause: 24591 -> 24593 (+0.01%); split: -0.01%, +0.02%
SClause: 29656 -> 29655 (-0.00%)
Copies: 133968 -> 134001 (+0.02%); split: -0.01%, +0.03%
VALU: 1157855 -> 1156235 (-0.14%)
SALU: 124626 -> 124639 (+0.01%); split: -0.00%, +0.01%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40545>
2026-03-23 09:27:32 +00:00
Marek Olšák
2283244975 nir: change export_amd intrinsics to use target instead of base
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40415>
2026-03-23 06:10:49 +00:00
Marek Olšák
b75a3112fd nir: change export_amd intrinsics to use enabled_channels instead of write_mask
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40415>
2026-03-23 06:10:49 +00:00
Daniel Schürmann
4b238690cb aco/tests: add and lower loop continue constructs in all tests which use continues
We are going to disallow continue statements without
loop continue constructs.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
2026-03-21 07:42:55 +00:00
Rhys Perry
e2ebcba11b aco/tests: fix assembler/isel tests with LLVM 23
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40513>
2026-03-20 10:24:06 +00:00
Rhys Perry
0826685f1b aco/tests: fix assembler tests with LLVM 22
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40513>
2026-03-20 10:24:06 +00:00
Samuel Pitoiset
639207701d aco,radv,radeonsi: remove debug report support in ACO
This doesn't seem very useful since ACO will abort and print the
error messages to stderr.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40379>
2026-03-16 11:55:45 +00:00
Georg Lehmann
d7348ea501 aco/ra: don't tie definition when the operand is in a preserved reg
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
444eb3dce5 aco/ra: try to allocate registers for dot2 to allow VOPD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
788aafba2a aco/sched_vopd: create dot2acc from VOP3P dot2
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
47599b2c38 aco/opt_postRA: remove try_convert_fma_to_vop2
This is now done directly in the VOPD scheduler.

Foz-DB GFX1201:
Totals from 600 (0.52% of 114655) affected shaders:
no stats changed

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
6cef434478 aco/sched_vopd: convert fma with inline constants to fmamk/fmaak
This optimization was previously done in the post-RA optimizer,
but it is more fitting for the vopd scheduler.

Doing it here also has the benefit that we don't unnecessarily use
the constant bus when VOPD can't be used.

No Foz-DB changes on GFX12 until the next commit.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:56 +00:00
Georg Lehmann
1ae9931145 aco/scheld_vopd: make VOPDInfo more flexible by adding a swizzle
No Foz-DB changes on GFX1201.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40225>
2026-03-10 14:21:55 +00:00
Georg Lehmann
08cac48170 aco/isel: skip min/max for SALU fsat if possible
Foz-DB Navi48:
Totals from 789 (0.95% of 82636) affected shaders:
Instrs: 4144156 -> 4141345 (-0.07%); split: -0.07%, +0.00%
CodeSize: 23345212 -> 23333960 (-0.05%); split: -0.05%, +0.00%
Latency: 22988205 -> 22986666 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 4378321 -> 4377874 (-0.01%); split: -0.01%, +0.00%
Copies: 302311 -> 302313 (+0.00%); split: -0.00%, +0.00%
SALU: 647622 -> 645901 (-0.27%); split: -0.27%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Rhys Perry
82420ebc2c aco: fix PS epilog dual-source blending with only one color output
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40005>
2026-03-05 09:38:23 +00:00
Marek Olšák
fae7aef5ca ac: tidy up ac_hw_cache_flags
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40022>
2026-03-04 21:14:56 +00:00
Rhys Perry
5c3b5688a1 amd: rename ac_cu_info to ac_compiler_info
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40042>
2026-03-03 08:50:12 +00:00
Rhys Perry
8801ca188d ac/nir: don't pass radeon_info to ac_nir_set_options
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40042>
2026-03-03 08:50:10 +00:00
Rhys Perry
17b18496f6 aco: perform dce for blocks skipped for process_block()
We might need to DCE users of dead instructions removed by
process_block().

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 9e8ba10447 ("aco/vn: remove dead instructions early")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40091>
2026-03-02 13:38:16 +00:00
Marek Olšák
f22f117d1a amd: add meson variable idep_amd_generated_headers for all generated headers
group all generated header under the same variable

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40084>
2026-02-28 05:23:59 +00:00
Rhys Perry
43603f9b1d amd: add ac_cu_info::local_invocation_ids_packed
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39992>
2026-02-26 15:49:15 +00:00
Rhys Perry
29f8237d30 amd: move various flags to ac_cu_info
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39992>
2026-02-26 15:49:14 +00:00
Georg Lehmann
8f4de30d05 aco/insert_fp_mode: don't skip setting round for fract
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
fract(-FLT_MIN) is < 1.0 with rtz but 1.0 with rtne.

Fixes: 7212a75c5e ("aco/insert_fp_mode: exclude some instructions that will never round")

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40078>
2026-02-26 00:20:02 +00:00
Rhys Perry
613b4fe407 aco: resolve hazards before calls
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:55 +00:00
Rhys Perry
dfda890ae8 aco: reset all vgpr_used_by_vmem_ in resolve_all_gfx11
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 26.0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:55 +00:00
Rhys Perry
72923ad2f0 aco: fix VALUReadSGPRHazard with s_call_b64/s_swappc_b64
This probably doesn't do anything because sgpr_read_by_valu are all set
already for raytracing shaders.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39825>
2026-02-24 13:20:54 +00:00
Georg Lehmann
dd067088ef aco: allow dpp for fp8/bf8 dot4
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
a033cd95a4 aco: allow modifiers for fp16 dot
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
3238e64d3c aco/ra: create v_dot2c_f32_f16
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:53 +00:00
Georg Lehmann
237b8ca205 aco: mixed float dot product opcodes
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40003>
2026-02-24 08:55:52 +00:00
Rhys Perry
af27fb23f3 aco/ra: don't modify parallelcopies if get_reg_for_affinity fails
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes baldurs_gate_3/60c8b7ff623fbb18 with vega10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 310f588f92 ("aco/ra: move variables from affinity register to avoid waitcnt")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39986>
2026-02-20 08:40:55 +00:00
Rhys Perry
75722da909 aco: fix gfx6-8 store_scratch() with function calls
Might happen with radv_emulate_rt=true.

Fixes the_great_circle/a6079328b8df7712 with polaris10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: e006f68b11 ("aco/isel: Don't add scratch offset as gfx8- soffset if no offsets exist")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39986>
2026-02-20 08:40:55 +00:00
Rhys Perry
f81aaee7f1 aco/ra: create vectors for affinities of split definitions
For example:
a = ...
b = ...
if {
   c, d = split
}
phi(a, c)
phi(b, d)

This patch will allocate 'a' and 'b' as a vector.

fossil-db (navi31):
Totals from 2556 (3.20% of 79825) affected shaders:
MaxWaves: 59957 -> 59955 (-0.00%)
Instrs: 9170941 -> 9154954 (-0.17%); split: -0.19%, +0.02%
CodeSize: 48245956 -> 48182620 (-0.13%); split: -0.15%, +0.02%
VGPRs: 189372 -> 189900 (+0.28%); split: -0.04%, +0.32%
Latency: 85469322 -> 85262360 (-0.24%); split: -0.32%, +0.08%
InvThroughput: 14515911 -> 14486970 (-0.20%); split: -0.27%, +0.07%
VClause: 197980 -> 197959 (-0.01%); split: -0.02%, +0.01%
Copies: 787838 -> 774288 (-1.72%); split: -1.91%, +0.19%
Branches: 271810 -> 271799 (-0.00%); split: -0.01%, +0.01%
VALU: 5331813 -> 5318566 (-0.25%); split: -0.28%, +0.03%
SALU: 1133559 -> 1133054 (-0.04%); split: -0.05%, +0.01%
VOPD: 2435 -> 2418 (-0.70%); split: +0.12%, -0.82%

fossil-db (navi21):
Totals from 37513 (46.99% of 79825) affected shaders:
Instrs: 26734825 -> 26681225 (-0.20%); split: -0.23%, +0.03%
CodeSize: 141353284 -> 141144360 (-0.15%); split: -0.17%, +0.02%
VGPRs: 1556760 -> 1556384 (-0.02%); split: -0.21%, +0.18%
Latency: 146201548 -> 146156473 (-0.03%); split: -0.20%, +0.17%
InvThroughput: 33921803 -> 33867398 (-0.16%); split: -0.23%, +0.07%
VClause: 502263 -> 502209 (-0.01%); split: -0.27%, +0.26%
SClause: 593142 -> 593155 (+0.00%); split: -0.00%, +0.00%
Copies: 2600995 -> 2551257 (-1.91%); split: -2.16%, +0.25%
Branches: 857910 -> 857787 (-0.01%); split: -0.03%, +0.02%
VALU: 15674532 -> 15625013 (-0.32%); split: -0.35%, +0.04%
SALU: 4635548 -> 4634680 (-0.02%); split: -0.04%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
86f0195f5c aco/ra: prefer phi operands which don't create waitcnt
fossil-db (navi31):
Totals from 89 (0.11% of 79825) affected shaders:
Instrs: 343443 -> 343384 (-0.02%); split: -0.10%, +0.09%
CodeSize: 1792948 -> 1792668 (-0.02%); split: -0.10%, +0.08%
Latency: 2656294 -> 2656490 (+0.01%); split: -0.02%, +0.02%
InvThroughput: 517696 -> 517691 (-0.00%); split: -0.01%, +0.01%
SClause: 9213 -> 9215 (+0.02%); split: -0.01%, +0.03%
Copies: 39138 -> 39089 (-0.13%); split: -0.84%, +0.71%
Branches: 10863 -> 10872 (+0.08%); split: -0.05%, +0.13%
SALU: 49185 -> 49136 (-0.10%); split: -0.67%, +0.57%

fossil-db (navi21):
Totals from 34490 (43.21% of 79825) affected shaders:
Instrs: 23005853 -> 22956529 (-0.21%); split: -0.25%, +0.04%
CodeSize: 120532004 -> 120341412 (-0.16%); split: -0.19%, +0.03%
VGPRs: 1396928 -> 1397520 (+0.04%); split: -0.07%, +0.11%
Latency: 108740068 -> 108499644 (-0.22%); split: -0.53%, +0.30%
InvThroughput: 25286526 -> 25358695 (+0.29%); split: -0.11%, +0.39%
VClause: 421179 -> 421132 (-0.01%); split: -0.29%, +0.27%
SClause: 446414 -> 446423 (+0.00%); split: -0.00%, +0.00%
Copies: 2242236 -> 2243168 (+0.04%); split: -0.42%, +0.46%
Branches: 724556 -> 724903 (+0.05%); split: -0.02%, +0.07%
VALU: 13321078 -> 13321940 (+0.01%); split: -0.07%, +0.08%
SALU: 4069929 -> 4070580 (+0.02%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
310f588f92 aco/ra: move variables from affinity register to avoid waitcnt
If we don't use this affinity register, we're likely to end up moving the
temporary later. If it's a memory instruction destination, that's probably
more expensive than just copying the blocking variables.

fossil-db (navi31):
Totals from 504 (0.63% of 79825) affected shaders:
Instrs: 4108284 -> 4109026 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21226764 -> 21229764 (+0.01%); split: -0.01%, +0.02%
Latency: 26931635 -> 26806989 (-0.46%); split: -0.47%, +0.00%
InvThroughput: 8443520 -> 8439235 (-0.05%); split: -0.06%, +0.01%
VClause: 99209 -> 99314 (+0.11%); split: -0.00%, +0.11%
SClause: 85089 -> 85085 (-0.00%)
Copies: 340323 -> 340993 (+0.20%); split: -0.06%, +0.26%
Branches: 117225 -> 117209 (-0.01%); split: -0.02%, +0.00%
VALU: 2421859 -> 2422529 (+0.03%); split: -0.01%, +0.04%
SALU: 503465 -> 503470 (+0.00%); split: -0.00%, +0.00%

fossil-db (navi21):
Totals from 582 (0.73% of 79825) affected shaders:
Instrs: 3714908 -> 3714990 (+0.00%); split: -0.02%, +0.02%
CodeSize: 19977880 -> 19973076 (-0.02%); split: -0.04%, +0.01%
VGPRs: 40480 -> 40496 (+0.04%)
Latency: 26028895 -> 25772711 (-0.98%); split: -0.99%, +0.00%
InvThroughput: 9827389 -> 9818194 (-0.09%); split: -0.10%, +0.01%
VClause: 103702 -> 103815 (+0.11%); split: -0.02%, +0.13%
SClause: 90861 -> 90857 (-0.00%)
Copies: 335276 -> 335992 (+0.21%); split: -0.09%, +0.30%
Branches: 123912 -> 123897 (-0.01%); split: -0.02%, +0.00%
VALU: 2466032 -> 2466748 (+0.03%); split: -0.01%, +0.04%
SALU: 533658 -> 533667 (+0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
681ec4cba7 aco/ra: track cost of moving variables
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00
Rhys Perry
69bc4efa37 aco/sched_ilp: improve scheduling with VMEM/DS->VALU WaW
This improves scheduling with one side of a divergent branch writing to a
VGPR using VMEM/DS, and the other writing using VALU. At the merge block,
it will properly consider that the VGPR was written by a VMEM/DS.

fossil-db (navi31):
Totals from 1224 (1.53% of 79825) affected shaders:
Instrs: 5264815 -> 5267604 (+0.05%); split: -0.00%, +0.06%
CodeSize: 27406404 -> 27422132 (+0.06%); split: -0.00%, +0.06%
Latency: 48325204 -> 48293975 (-0.06%); split: -0.09%, +0.03%
InvThroughput: 8923880 -> 8919191 (-0.05%); split: -0.07%, +0.02%

fossil-db (navi21):
Totals from 1267 (1.59% of 79825) affected shaders:
Instrs: 4628583 -> 4629190 (+0.01%); split: -0.00%, +0.01%
CodeSize: 24974672 -> 24977188 (+0.01%); split: -0.00%, +0.01%
Latency: 45080476 -> 44998120 (-0.18%); split: -0.20%, +0.02%
InvThroughput: 12288202 -> 12269634 (-0.15%); split: -0.16%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38262>
2026-02-16 19:39:43 +00:00