Commit graph

3905 commits

Author SHA1 Message Date
Daniel Schürmann
219c53e6fc aco/ra: don't clear lateKill operands in get_reg_create_vector()
Fixes: 08f088479a ('aco/ra: set late-kill for operands of temporary p_create_vector')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36871>
2025-08-22 15:25:04 +00:00
Marek Olšák
3aadae22ad nir: make nir_block::predecessors & dom_frontier sets non-malloc'd
We can just place the set structures inside nir_block.

This reduces the number of ralloc calls by 6.7% when compiling Heaven
shaders with radeonsi+ACO using a release build (i.e. not including
nir_validate set allocations, which are also removed).

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>
2025-08-21 06:13:48 +00:00
Georg Lehmann
639b91bb48 aco/isel: fix vectorized i2i16 with 8bit vec8 source
The extract index is in dwords, not bytes.

Fixes: 92d433c54a ("aco: vectorize conversions from 8bit to 16bit")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36869>
2025-08-20 10:13:22 +00:00
Daniel Schürmann
0546ecfadb aco/scheduler: small refactor of schedule_VMEM()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:12 +00:00
Daniel Schürmann
0c590eb903 aco/scheduler: schedule VMEM store clauses during the regular forward pass
Totals from 1456 (1.82% of 79839) affected shaders: (Navi48)

MaxWaves: 37780 -> 37128 (-1.73%); split: +0.15%, -1.87%
Instrs: 3788175 -> 3788435 (+0.01%); split: -0.04%, +0.04%
CodeSize: 20468648 -> 20467432 (-0.01%); split: -0.04%, +0.03%
VGPRs: 86820 -> 91440 (+5.32%); split: -0.10%, +5.42%
Latency: 26866232 -> 26858867 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 3491741 -> 3828339 (+9.64%); split: -0.02%, +9.66%
VClause: 90413 -> 89426 (-1.09%); split: -1.27%, +0.18%
SClause: 130532 -> 130530 (-0.00%); split: -0.00%, +0.00%
Copies: 347397 -> 347806 (+0.12%); split: -0.11%, +0.23%
Branches: 117476 -> 117496 (+0.02%)
VALU: 1897427 -> 1897830 (+0.02%); split: -0.02%, +0.04%
SALU: 602365 -> 602379 (+0.00%)
VOPD: 1259 -> 1251 (-0.64%); split: +0.24%, -0.87%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:12 +00:00
Daniel Schürmann
f601eb8555 aco/scheduler: move clauses as batch
Totals from 391 (0.49% of 79839) affected shaders:

Instrs: 612478 -> 612515 (+0.01%); split: -0.06%, +0.06%
CodeSize: 3342896 -> 3343228 (+0.01%); split: -0.04%, +0.05%
Latency: 6909794 -> 6909938 (+0.00%); split: -0.03%, +0.03%
VClause: 10752 -> 10167 (-5.44%); split: -5.46%, +0.02%
Copies: 26623 -> 26627 (+0.02%); split: -0.00%, +0.02%
VALU: 377494 -> 377499 (+0.00%); split: -0.00%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:12 +00:00
Daniel Schürmann
70f0c065e8 aco/scheduler: ignore potential SMEM stalls when forming clauses
Totals from 4190 (5.25% of 79839) affected shaders: (Navi48)

MaxWaves: 117020 -> 117014 (-0.01%)
Instrs: 4801892 -> 4801547 (-0.01%); split: -0.06%, +0.05%
CodeSize: 25327632 -> 25325500 (-0.01%); split: -0.05%, +0.04%
VGPRs: 236452 -> 236488 (+0.02%)
Latency: 30569070 -> 30539464 (-0.10%); split: -0.13%, +0.04%
InvThroughput: 4891650 -> 4891062 (-0.01%); split: -0.03%, +0.01%
VClause: 119615 -> 118763 (-0.71%); split: -1.02%, +0.31%
SClause: 100482 -> 100297 (-0.18%); split: -0.44%, +0.26%
Copies: 326644 -> 326756 (+0.03%); split: -0.19%, +0.22%
Branches: 98982 -> 98980 (-0.00%)
VALU: 2712397 -> 2712534 (+0.01%); split: -0.02%, +0.03%
SALU: 591836 -> 591817 (-0.00%); split: -0.00%, +0.00%
VOPD: 993 -> 987 (-0.60%); split: +0.20%, -0.81%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:11 +00:00
Daniel Schürmann
d3a0f268b9 aco/scheduler: short-cut downwards_move_clause() when no movement is done
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:11 +00:00
Daniel Schürmann
8543b6cf2e aco/scheduler: remove DownwardsCursor::clause_demand
As we stop scheduling after forming clauses, this value
is not needed anymore.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:10 +00:00
Daniel Schürmann
5ae30deffb aco/scheduler: remove DownwardsCursor::insert_demand_clause
This partially reverts 93872270f0 ('aco/scheduler: keep track of RegisterDemand at DownwardsCursor::insert_idx{_clause}').

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:10 +00:00
Daniel Schürmann
e95d728a98 aco/scheduler: split downwards_move_clause() from downwards_move()
We will do batched moves for clauses with the next commit.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:09 +00:00
Daniel Schürmann
37299a8d1a aco/scheduler: Stop downwards scheduling after encountering the first clause
Totals from 9899 (12.40% of 79839) affected shaders: (Navi48)

MaxWaves: 276355 -> 276317 (-0.01%); split: +0.01%, -0.02%
Instrs: 8781768 -> 8766504 (-0.17%); split: -0.25%, +0.07%
CodeSize: 46297556 -> 46236104 (-0.13%); split: -0.19%, +0.06%
VGPRs: 574680 -> 574800 (+0.02%); split: -0.00%, +0.03%
Latency: 54261324 -> 54357916 (+0.18%); split: -0.14%, +0.32%
InvThroughput: 9122700 -> 9121115 (-0.02%); split: -0.07%, +0.05%
VClause: 222062 -> 218499 (-1.60%); split: -2.33%, +0.73%
SClause: 167138 -> 163233 (-2.34%); split: -2.43%, +0.09%
Copies: 602395 -> 598560 (-0.64%); split: -1.21%, +0.57%
Branches: 161939 -> 161932 (-0.00%); split: -0.01%, +0.00%
VALU: 5063999 -> 5060199 (-0.08%); split: -0.14%, +0.07%
SALU: 988254 -> 988285 (+0.00%); split: -0.02%, +0.02%
VOPD: 2478 -> 2443 (-1.41%); split: +0.40%, -1.82%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:09 +00:00
Daniel Schürmann
fb6b95517e aco/scheduler: check dependencies of entire clause upfront
and bail if any instruction of the clause can't be moved.

Totals from 4310 (5.40% of 79839) affected shaders:

MaxWaves: 115826 -> 115834 (+0.01%)
Instrs: 6256436 -> 6257599 (+0.02%); split: -0.05%, +0.07%
CodeSize: 32816488 -> 32820768 (+0.01%); split: -0.04%, +0.05%
VGPRs: 260184 -> 260172 (-0.00%)
Latency: 41207213 -> 41052150 (-0.38%); split: -0.45%, +0.07%
InvThroughput: 6822608 -> 6815208 (-0.11%); split: -0.14%, +0.03%
VClause: 148412 -> 147133 (-0.86%); split: -1.03%, +0.17%
SClause: 120854 -> 120856 (+0.00%); split: -0.01%, +0.01%
Copies: 425910 -> 427276 (+0.32%); split: -0.25%, +0.57%
VALU: 3572293 -> 3573647 (+0.04%); split: -0.03%, +0.07%
VOPD: 2803 -> 2816 (+0.46%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:08 +00:00
Daniel Schürmann
7e63251d1f aco/isel: refactor store_shared() by directly matching NIR intrinsics to ACO opcodes
Totals from 1435 (1.80% of 79839) affected shaders: (Navi48)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:15 +00:00
Daniel Schürmann
1fde289539 aco/isel: refactor load_shared() by directly matching NIR intrinsics to ACO opcodes
Totals from 3 (0.00% of 79839) affected shaders: (Navi48)

Instrs: 700 -> 698 (-0.29%)
CodeSize: 3860 -> 3852 (-0.21%)
Latency: 2351 -> 2349 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:15 +00:00
Daniel Schürmann
4632ee4c37 aco/isel: rename emit_readfirstlane() -> emit_vector_as_uniform()
Also allow to use p_as_uniform and improve vector splitting.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:14 +00:00
Daniel Schürmann
26595577b3 aco/isel: allow for large 8-bit vectors in extract_8_16_bit_sgpr_element()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:14 +00:00
Georg Lehmann
9ed94371f7 amd: stop using custom gl_access_qualifier for access type
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>
2025-08-15 08:26:10 +00:00
Georg Lehmann
f17cb6b714 amd: replace ACCESS_TYPE_SMEM with ACCESS_SMEM_AMD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>
2025-08-15 08:26:10 +00:00
Georg Lehmann
6ba462bf26 aco/disable_wqm: optimize local mask creation
Foz-DB Navi48:
Totals from 7861 (9.79% of 80287) affected shaders:
Instrs: 13276809 -> 13183483 (-0.70%)
CodeSize: 71221260 -> 70852500 (-0.52%); split: -0.52%, +0.00%
Latency: 124001421 -> 123976480 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 17820119 -> 17817551 (-0.01%); split: -0.01%, +0.00%
SALU: 1736356 -> 1666673 (-4.01%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:47 +00:00
Georg Lehmann
fc53cf146c aco: disable wqm for sampled buffer loads when not needed
Foz-DB GFX1201:
Totals from 318 (0.40% of 80287) affected shaders:
Instrs: 313039 -> 314064 (+0.33%); split: -0.00%, +0.33%
CodeSize: 1684104 -> 1688212 (+0.24%); split: -0.00%, +0.24%
VGPRs: 15120 -> 15144 (+0.16%)
Latency: 2515023 -> 2518610 (+0.14%); split: -0.06%, +0.20%
InvThroughput: 447468 -> 447615 (+0.03%); split: -0.02%, +0.05%
VClause: 4866 -> 4914 (+0.99%)
SClause: 6564 -> 6559 (-0.08%); split: -0.09%, +0.02%
Copies: 23577 -> 23673 (+0.41%); split: -0.04%, +0.45%
PreSGPRs: 16019 -> 16029 (+0.06%)
VALU: 172157 -> 172143 (-0.01%)
SALU: 52816 -> 53867 (+1.99%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:47 +00:00
Georg Lehmann
883b1ca364 aco: disable wqm for tex loads when not needed
By only executing VMEM loads for lanes where the result is used, we can save
bandwidth.

The NIR pass only handles tex for now, but those are most common anyway.
We can extend it handle image/ssbo/ubo/global loads in the future.

Foz-DB GFX1201:
Totals from 32633 (40.66% of 80251) affected shaders:
Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46%
CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81%
VGPRs: 1481868 -> 1481712 (-0.01%)
SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45%
Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30%
InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12%
VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13%
SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51%
Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96%
Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1330303 -> 1416540 (+6.48%)
PreVGPRs: 964864 -> 964863 (-0.00%)
VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01%
SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
7159fd21f8 aco: don't restrict vmem load scheduling by inserting p_end_wqm early
Foz-DB GFX1201:
Totals from 7 (0.01% of 80251) affected shaders:
Instrs: 703 -> 729 (+3.70%)
CodeSize: 4032 -> 4136 (+2.58%)
Latency: 5840 -> 4715 (-19.26%)
InvThroughput: 441 -> 405 (-8.16%)
Copies: 61 -> 67 (+9.84%)
PreSGPRs: 216 -> 218 (+0.93%)
SALU: 93 -> 113 (+21.51%)

When reordered after the next commit:
Foz-DB GFX1201:
Totals from 1609 (2.00% of 80251) affected shaders:
MaxWaves: 47984 -> 47986 (+0.00%)
Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50%
CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41%
VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30%
SpillSGPRs: 182 -> 184 (+1.10%)
Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15%
InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06%
VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32%
SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66%
Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16%
Branches: 21599 -> 21727 (+0.59%)
PreSGPRs: 69963 -> 70732 (+1.10%)
VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03%
SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
c1b29174b4 aco: use a smaller wqm section for strict_wqm sampling
It's only important that the coordinate is created in WQM,
the sample itself doesn't care.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
de4b345949 aco/insert_exec: remove per instruction wqm/exact exec handling
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
11cee3d634 aco: use new disable_wqm for p_dual_src_export_gfx11
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
8e53ba9a0a aco: use new disable_wqm for exp
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
bd6647e21e aco/builder: support new disable_wqm
Create the additional undef operands that are filled by insert_exec.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
0e66f2b2cc aco: use new disable_wqm for mimg
Foz-DB GFX1201:
Totals from 88 (0.11% of 80251) affected shaders:
Instrs: 81954 -> 82218 (+0.32%); split: -0.02%, +0.34%
CodeSize: 451824 -> 452880 (+0.23%); split: -0.02%, +0.25%
Latency: 308818 -> 308746 (-0.02%); split: -0.05%, +0.02%
VClause: 1324 -> 1318 (-0.45%)
Copies: 2795 -> 2784 (-0.39%)
PreSGPRs: 4029 -> 4035 (+0.15%)
SALU: 6563 -> 6809 (+3.75%); split: -0.15%, +3.90%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
922f559c3c aco: use new disable_wqm for flatlike
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
a4c537c5b3 aco: use new disable_wqm for mubuf/mtbuf
Foz-DB GFX1201:
Totals from 66 (0.08% of 80251) affected shaders:
Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65%
CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48%
Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02%
InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02%
VClause: 982 -> 962 (-2.04%)
Copies: 2882 -> 2808 (-2.57%)
PreSGPRs: 2564 -> 2599 (+1.37%)
SALU: 4748 -> 5010 (+5.52%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
63af48ae2e aco/insert_exec: new way to handle instructions that need wqm disabled
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
ca25553b92 aco: add a post-RA pass to disable wqm
By disabling WQM post-RA, we don't have RA/spilling mov issues with image_sample
operands that need to be computed in WQM.

We also don't restrict scheduling by inserting exec writes.
The only downside is more scalar ALU usage, but the SALU is almost always underutilized.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
34b154866f aco/insert_exec: remove p_jump_to_epilog from needs exact
p_end_wqm will always be emitted before it by isel.

No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Yonggang Luo
fc1b26f4dc aco: Fixes warning note: ambiguity is between a regular call to this operator and a call with the argument order reversed
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
../../src/amd/compiler/aco_util.h:300:9: note: ambiguity is between a regular call to this operator and a call with the argument order reversed
  300 |    bool operator==(const monotonic_buffer_resource& other) { return buffer == other.buffer; }

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36722>
2025-08-13 19:49:37 +00:00
Rhys Perry
08f088479a aco/ra: set late-kill for operands of temporary p_create_vector
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13543
Fixes: c279dd6e61 ("aco: Support vector-aligned ops fixed to defs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36469>
2025-08-06 09:44:01 +00:00
Daniel Schürmann
d3743dd7ba aco/scheduler: improve scheduling heuristic
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The heuristic we are currently using still stems from the GCN era
with the only adjustments being made for RDNA was to double (or triple)
the wave count.

This rewrite aims to detangle some concepts and provide more consistent results.

 - wave_factor: The purpose of this value is to reflect that RDNA SIMDs can
                accomodate twice as many waves as GCN SIMDs.
 - reg_file_multiple: This value accounts for the larger register file of wave32
                      and some RDNA3 families.
 - wave_minimum: Below this value, we don't sacrifice any waves. It corresponds
                 to a register demand of 64 VGPRs in wave64.
 - occupancy_factor: Depending on target_waves and wave_factor, this controls
                     the scheduling window sizes and number of moves.

The main differences from the previous heuristic is a lower wave minimum and
a slightly less aggressive reduction of waves.
It also increases SMEM_MAX_MOVES in order to mitigate some of the changes
from targeting less waves.

Totals from 62777 (78.63% of 79839) affected shaders: (Navi48)
MaxWaves: 1880983 -> 1848028 (-1.75%); split: +0.01%, -1.76%
Instrs: 40904711 -> 40800797 (-0.25%); split: -0.39%, +0.14%
CodeSize: 217132208 -> 216748832 (-0.18%); split: -0.29%, +0.12%
VGPRs: 3019304 -> 3099596 (+2.66%); split: -0.11%, +2.77%
Latency: 268857129 -> 265951122 (-1.08%); split: -1.33%, +0.25%
InvThroughput: 40960938 -> 41044533 (+0.20%); split: -0.18%, +0.39%
VClause: 794000 -> 782913 (-1.40%); split: -2.24%, +0.84%
SClause: 1192476 -> 1150831 (-3.49%); split: -3.94%, +0.45%
Copies: 2720470 -> 2700148 (-0.75%); split: -1.84%, +1.09%
Branches: 785926 -> 785951 (+0.00%); split: -0.01%, +0.01%
VALU: 22918411 -> 22890189 (-0.12%); split: -0.19%, +0.06%
SALU: 5281201 -> 5289486 (+0.16%); split: -0.21%, +0.36%
VOPD: 8790 -> 8685 (-1.19%); split: +1.08%, -2.28%

Totals from 62081 (77.77% of 79825) affected shaders: (Navi31)
MaxWaves: 1848555 -> 1812347 (-1.96%); split: +0.01%, -1.97%
Instrs: 39794460 -> 39704180 (-0.23%); split: -0.39%, +0.16%
CodeSize: 208987052 -> 208621524 (-0.17%); split: -0.31%, +0.13%
VGPRs: 3046284 -> 3135156 (+2.92%); split: -0.11%, +3.03%
Latency: 268863465 -> 265218186 (-1.36%); split: -1.59%, +0.23%
InvThroughput: 41101515 -> 41167075 (+0.16%); split: -0.22%, +0.38%
VClause: 795316 -> 774899 (-2.57%); split: -3.17%, +0.61%
SClause: 1177294 -> 1135451 (-3.55%); split: -4.06%, +0.51%
Copies: 2743254 -> 2725127 (-0.66%); split: -1.90%, +1.24%
Branches: 801395 -> 801428 (+0.00%); split: -0.01%, +0.02%
VALU: 23898938 -> 23871294 (-0.12%); split: -0.20%, +0.08%
SALU: 3908807 -> 3919130 (+0.26%); split: -0.23%, +0.50%
VOPD: 8529 -> 8500 (-0.34%); split: +1.29%, -1.63%

Totals from 44996 (71.01% of 63370) affected shaders: (Vega10)

MaxWaves: 307074 -> 304808 (-0.74%); split: +0.63%, -1.37%
Instrs: 22743534 -> 22716240 (-0.12%); split: -0.22%, +0.10%
CodeSize: 117284856 -> 117173212 (-0.10%); split: -0.19%, +0.09%
SGPRs: 3249008 -> 3330480 (+2.51%); split: -0.36%, +2.87%
VGPRs: 1901400 -> 1943880 (+2.23%); split: -0.60%, +2.83%
Latency: 224839126 -> 222878477 (-0.87%); split: -1.19%, +0.31%
InvThroughput: 114389570 -> 114316559 (-0.06%); split: -0.17%, +0.11%
VClause: 482012 -> 473304 (-1.81%); split: -2.86%, +1.05%
SClause: 757799 -> 717092 (-5.37%); split: -5.64%, +0.27%
Copies: 2182735 -> 2183598 (+0.04%); split: -1.17%, +1.21%
Branches: 396026 -> 395996 (-0.01%); split: -0.03%, +0.02%
VALU: 16740283 -> 16728098 (-0.07%); split: -0.14%, +0.07%
SALU: 2133575 -> 2145863 (+0.58%); split: -0.29%, +0.86%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30720>
2025-08-06 09:16:33 +00:00
Qiang Yu
196569b1a4 all: rename gl_shader_stage to mesa_shader_stage
It's not only for GL, change to a generic name.

Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:40 +08:00
Rhys Perry
76c96bf558 aco: fix possible scratch offset overflow
We split vector load/store, so consider that we might add to the constant
offset.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>
2025-08-04 15:06:44 +00:00
Rhys Perry
44ab4ad732 aco: align scratch size after isel
Make it safe for VGPR spilling if it's not a multiple of 4.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>
2025-08-04 15:06:43 +00:00
Rhys Perry
ab10604924 aco/gfx12: fix printing of temporal hints
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>
2025-08-04 15:06:41 +00:00
Rhys Perry
cec845079e ac/nir/lower_ps: remove barrier for end_invocation_interlock
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
SPIR-V->NIR now inserts this barrier itself.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513>
2025-08-04 09:30:06 +00:00
Daniel Schürmann
4ca3cc5a1a aco/ra: propagate precolor affinities through parallelcopies and tied definitions
Totals from 214 (0.27% of 79839) affected shaders: (Navi48)

Instrs: 65339 -> 65311 (-0.04%); split: -0.05%, +0.00%
CodeSize: 352616 -> 350952 (-0.47%); split: -0.55%, +0.07%
VGPRs: 9984 -> 9960 (-0.24%)
Latency: 207556 -> 207508 (-0.02%); split: -0.03%, +0.01%
InvThroughput: 40422 -> 40397 (-0.06%)
Copies: 3180 -> 3155 (-0.79%)
VALU: 38347 -> 38322 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Daniel Schürmann
a667d9a68d aco/ra: propagate precolor affinities through phis
Totals from 917 (1.15% of 79839) affected shaders: (Navi48)

Instrs: 3217861 -> 3216947 (-0.03%); split: -0.04%, +0.01%
CodeSize: 17427204 -> 17432264 (+0.03%); split: -0.06%, +0.09%
VGPRs: 65328 -> 65316 (-0.02%)
Latency: 35336268 -> 35335528 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 7305032 -> 7302187 (-0.04%); split: -0.04%, +0.00%
SClause: 120537 -> 120553 (+0.01%); split: -0.01%, +0.02%
Copies: 307257 -> 306852 (-0.13%); split: -0.21%, +0.08%
Branches: 115744 -> 115743 (-0.00%)
VALU: 1572522 -> 1572183 (-0.02%); split: -0.02%, +0.00%
SALU: 574229 -> 574155 (-0.01%); split: -0.05%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Daniel Schürmann
2ddd8ef0a3 aco/ra: don't optimize encodings on precolor affinity mismatch
Totals from 238 (0.30% of 79839) affected shaders: (Navi48)
Instrs: 137836 -> 137176 (-0.48%); split: -0.50%, +0.02%
CodeSize: 728616 -> 728668 (+0.01%); split: -0.06%, +0.07%
Latency: 1503248 -> 1500202 (-0.20%); split: -0.56%, +0.36%
InvThroughput: 297725 -> 296715 (-0.34%); split: -0.70%, +0.36%
Copies: 9390 -> 8825 (-6.02%); split: -6.33%, +0.31%
VALU: 89861 -> 89296 (-0.63%); split: -0.66%, +0.03%
SALU: 13166 -> 13167 (+0.01%); split: -0.05%, +0.06%

Suggested-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Daniel Schürmann
93606a19c6 aco/ra: collect register affinities for all precolored operands.
Totals from 1280 (1.60% of 79839) affected shaders: (Navi48)

Instrs: 817363 -> 812639 (-0.58%); split: -0.58%, +0.00%
CodeSize: 4262644 -> 4243540 (-0.45%); split: -0.45%, +0.00%
VGPRs: 61692 -> 61668 (-0.04%)
Latency: 4354318 -> 4347818 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 711914 -> 707698 (-0.59%); split: -0.59%, +0.00%
VClause: 14685 -> 14677 (-0.05%); split: -0.09%, +0.03%
SClause: 25623 -> 25621 (-0.01%)
Copies: 50663 -> 46242 (-8.73%); split: -8.73%, +0.00%
VALU: 427744 -> 423323 (-1.03%); split: -1.03%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Daniel Schürmann
e32eec52f0 aco/ra: generalize register affinities
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Daniel Schürmann
caa2c22d8b aco/tests: Fix p_startpgm definitions to registers
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Alyssa Rosenzweig
cc6e3b84cb treewide: use nir_def_as_*
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -nir_instr_as_alu(definition->parent_instr)
    +nir_def_as_alu(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_intrinsic(definition->parent_instr)
    +nir_def_as_intrinsic(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_phi(definition->parent_instr)
    +nir_def_as_phi(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_load_const(definition->parent_instr)
    +nir_def_as_load_const(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_deref(definition->parent_instr)
    +nir_def_as_deref(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_tex(definition->parent_instr)
    +nir_def_as_tex(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00