Commit graph

3918 commits

Author SHA1 Message Date
Marek Olšák
4c87d002e3 aco,radeonsi: expand 32-bit shader arg pointers to 64 bits for ACO
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>
2025-08-30 15:04:32 -04:00
Marek Olšák
7d5288b5b7 aco: check that global addresses are 64bit, apply_nuw_to_ssa to global_amd/smem
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>
2025-08-30 15:04:32 -04:00
Georg Lehmann
38e32e39a9 aco: never end wqm early for vmem
The remaining cases where disable_wqm isn't set are either uniform loads
or loads that influence control flow. In the first case, not ending WQM early
is free, and in the second case it's likely still better to not block scheduling.

Foz-DB GFX1201:
Totals from 483 (0.60% of 80287) affected shaders:
MaxWaves: 12654 -> 12642 (-0.09%)
Instrs: 485234 -> 484830 (-0.08%); split: -0.19%, +0.11%
CodeSize: 2630876 -> 2629184 (-0.06%); split: -0.15%, +0.08%
VGPRs: 29980 -> 30004 (+0.08%)
Latency: 4908015 -> 4813167 (-1.93%); split: -1.95%, +0.02%
InvThroughput: 751059 -> 748582 (-0.33%); split: -0.35%, +0.02%
VClause: 8723 -> 8705 (-0.21%); split: -0.30%, +0.09%
SClause: 11085 -> 10986 (-0.89%); split: -1.45%, +0.56%
Copies: 25155 -> 25183 (+0.11%); split: -0.26%, +0.37%
Branches: 6203 -> 6204 (+0.02%)
PreSGPRs: 23763 -> 23780 (+0.07%)
VALU: 296576 -> 296593 (+0.01%); split: -0.01%, +0.02%
SALU: 49095 -> 49416 (+0.65%); split: -0.04%, +0.69%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:04 +00:00
Georg Lehmann
3d190f2e9c aco: implement skip_helpers for load_global_amd
Foz-DB GFX1201:
Totals from 119 (0.15% of 80287) affected shaders:
Instrs: 212449 -> 213452 (+0.47%)
CodeSize: 1120656 -> 1124708 (+0.36%)
Latency: 2854370 -> 2855772 (+0.05%); split: -0.02%, +0.07%
InvThroughput: 586142 -> 586210 (+0.01%); split: -0.00%, +0.01%
VClause: 3556 -> 3656 (+2.81%)
SClause: 2708 -> 2710 (+0.07%)
Copies: 14410 -> 14509 (+0.69%)
PreSGPRs: 6810 -> 6850 (+0.59%); split: -0.12%, +0.70%
VALU: 135945 -> 135942 (-0.00%); split: -0.01%, +0.01%
SALU: 22147 -> 23121 (+4.40%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:04 +00:00
Georg Lehmann
ee7069f875 aco: implement skip_helpers for load_scratch
Foz-DB GFX1201:
Totals from 2 (0.00% of 80287) affected shaders:
Instrs: 4016 -> 4054 (+0.95%)
CodeSize: 22104 -> 22256 (+0.69%)
Latency: 17123 -> 17129 (+0.04%)
Copies: 406 -> 415 (+2.22%)
SALU: 323 -> 353 (+9.29%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:04 +00:00
Georg Lehmann
2bfd8918a5 aco: implement skip_helpers for load_ssbo/ubo/constant
Foz-DB GFX1201:
Totals from 6676 (8.32% of 80287) affected shaders:
Instrs: 8786161 -> 8829091 (+0.49%); split: -0.01%, +0.50%
CodeSize: 47141800 -> 47320480 (+0.38%); split: -0.01%, +0.39%
VGPRs: 376624 -> 376600 (-0.01%)
SpillSGPRs: 1251 -> 1250 (-0.08%)
Latency: 99716626 -> 99642361 (-0.07%); split: -0.11%, +0.04%
InvThroughput: 14893179 -> 14898323 (+0.03%); split: -0.01%, +0.04%
VClause: 149425 -> 153539 (+2.75%); split: -0.04%, +2.79%
SClause: 251247 -> 251842 (+0.24%); split: -0.06%, +0.30%
Copies: 580304 -> 586424 (+1.05%); split: -0.21%, +1.26%
Branches: 163014 -> 163013 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 356548 -> 357109 (+0.16%); split: -0.18%, +0.33%
VALU: 5149733 -> 5149797 (+0.00%); split: -0.00%, +0.00%
SALU: 1082176 -> 1122718 (+3.75%); split: -0.06%, +3.80%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:03 +00:00
Georg Lehmann
bdae511b18 aco: implement skip_helpers for image loads
Foz-DB GFX1201:
Totals from 5 (0.01% of 80287) affected shaders:
Instrs: 1406 -> 1417 (+0.78%)
CodeSize: 8012 -> 8056 (+0.55%)
Latency: 7279 -> 7282 (+0.04%)
Copies: 84 -> 85 (+1.19%)
SALU: 170 -> 180 (+5.88%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:02 +00:00
Georg Lehmann
bf453a7c6a aco/isel: add init_disable_wqm helper
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:01 +00:00
Georg Lehmann
635ac758c9 aco/optimizer: don't create undef copies from p_create_vector
p_create_vector allows undef operands, p_parallelcopy doesn't.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13765
Fixes: 01d20680e2 ("aco/optimizer: generalize p_create_vector of split vector opt")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36963>
2025-08-25 16:47:38 +00:00
Georg Lehmann
8903bb4618 aco/optimizer: don't apply packed clamp to v_fma_mix
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13758
Fixes: 345bf8a2f2 ("aco/optimizer: remove label_vop3p")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36963>
2025-08-25 16:47:38 +00:00
Georg Lehmann
791a57805c aco: fix ra validation for flat/global/scratch/ds load sbyte_d16
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: 18a53230eb ("aco: don't check dst_bitsize in apply_load_extract")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36964>
2025-08-25 10:09:16 +00:00
Konstantin Seurer
951b187b95 nir: Use nir_def_block in more places
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746>
2025-08-24 14:03:10 +00:00
Konstantin Seurer
9df7b48d2f nir: Use nir_def_as_* in more places
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746>
2025-08-24 14:03:09 +00:00
Daniel Schürmann
219c53e6fc aco/ra: don't clear lateKill operands in get_reg_create_vector()
Fixes: 08f088479a ('aco/ra: set late-kill for operands of temporary p_create_vector')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36871>
2025-08-22 15:25:04 +00:00
Marek Olšák
3aadae22ad nir: make nir_block::predecessors & dom_frontier sets non-malloc'd
We can just place the set structures inside nir_block.

This reduces the number of ralloc calls by 6.7% when compiling Heaven
shaders with radeonsi+ACO using a release build (i.e. not including
nir_validate set allocations, which are also removed).

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>
2025-08-21 06:13:48 +00:00
Georg Lehmann
639b91bb48 aco/isel: fix vectorized i2i16 with 8bit vec8 source
The extract index is in dwords, not bytes.

Fixes: 92d433c54a ("aco: vectorize conversions from 8bit to 16bit")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36869>
2025-08-20 10:13:22 +00:00
Daniel Schürmann
0546ecfadb aco/scheduler: small refactor of schedule_VMEM()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:12 +00:00
Daniel Schürmann
0c590eb903 aco/scheduler: schedule VMEM store clauses during the regular forward pass
Totals from 1456 (1.82% of 79839) affected shaders: (Navi48)

MaxWaves: 37780 -> 37128 (-1.73%); split: +0.15%, -1.87%
Instrs: 3788175 -> 3788435 (+0.01%); split: -0.04%, +0.04%
CodeSize: 20468648 -> 20467432 (-0.01%); split: -0.04%, +0.03%
VGPRs: 86820 -> 91440 (+5.32%); split: -0.10%, +5.42%
Latency: 26866232 -> 26858867 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 3491741 -> 3828339 (+9.64%); split: -0.02%, +9.66%
VClause: 90413 -> 89426 (-1.09%); split: -1.27%, +0.18%
SClause: 130532 -> 130530 (-0.00%); split: -0.00%, +0.00%
Copies: 347397 -> 347806 (+0.12%); split: -0.11%, +0.23%
Branches: 117476 -> 117496 (+0.02%)
VALU: 1897427 -> 1897830 (+0.02%); split: -0.02%, +0.04%
SALU: 602365 -> 602379 (+0.00%)
VOPD: 1259 -> 1251 (-0.64%); split: +0.24%, -0.87%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:12 +00:00
Daniel Schürmann
f601eb8555 aco/scheduler: move clauses as batch
Totals from 391 (0.49% of 79839) affected shaders:

Instrs: 612478 -> 612515 (+0.01%); split: -0.06%, +0.06%
CodeSize: 3342896 -> 3343228 (+0.01%); split: -0.04%, +0.05%
Latency: 6909794 -> 6909938 (+0.00%); split: -0.03%, +0.03%
VClause: 10752 -> 10167 (-5.44%); split: -5.46%, +0.02%
Copies: 26623 -> 26627 (+0.02%); split: -0.00%, +0.02%
VALU: 377494 -> 377499 (+0.00%); split: -0.00%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:12 +00:00
Daniel Schürmann
70f0c065e8 aco/scheduler: ignore potential SMEM stalls when forming clauses
Totals from 4190 (5.25% of 79839) affected shaders: (Navi48)

MaxWaves: 117020 -> 117014 (-0.01%)
Instrs: 4801892 -> 4801547 (-0.01%); split: -0.06%, +0.05%
CodeSize: 25327632 -> 25325500 (-0.01%); split: -0.05%, +0.04%
VGPRs: 236452 -> 236488 (+0.02%)
Latency: 30569070 -> 30539464 (-0.10%); split: -0.13%, +0.04%
InvThroughput: 4891650 -> 4891062 (-0.01%); split: -0.03%, +0.01%
VClause: 119615 -> 118763 (-0.71%); split: -1.02%, +0.31%
SClause: 100482 -> 100297 (-0.18%); split: -0.44%, +0.26%
Copies: 326644 -> 326756 (+0.03%); split: -0.19%, +0.22%
Branches: 98982 -> 98980 (-0.00%)
VALU: 2712397 -> 2712534 (+0.01%); split: -0.02%, +0.03%
SALU: 591836 -> 591817 (-0.00%); split: -0.00%, +0.00%
VOPD: 993 -> 987 (-0.60%); split: +0.20%, -0.81%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:11 +00:00
Daniel Schürmann
d3a0f268b9 aco/scheduler: short-cut downwards_move_clause() when no movement is done
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:11 +00:00
Daniel Schürmann
8543b6cf2e aco/scheduler: remove DownwardsCursor::clause_demand
As we stop scheduling after forming clauses, this value
is not needed anymore.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:10 +00:00
Daniel Schürmann
5ae30deffb aco/scheduler: remove DownwardsCursor::insert_demand_clause
This partially reverts 93872270f0 ('aco/scheduler: keep track of RegisterDemand at DownwardsCursor::insert_idx{_clause}').

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:10 +00:00
Daniel Schürmann
e95d728a98 aco/scheduler: split downwards_move_clause() from downwards_move()
We will do batched moves for clauses with the next commit.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:09 +00:00
Daniel Schürmann
37299a8d1a aco/scheduler: Stop downwards scheduling after encountering the first clause
Totals from 9899 (12.40% of 79839) affected shaders: (Navi48)

MaxWaves: 276355 -> 276317 (-0.01%); split: +0.01%, -0.02%
Instrs: 8781768 -> 8766504 (-0.17%); split: -0.25%, +0.07%
CodeSize: 46297556 -> 46236104 (-0.13%); split: -0.19%, +0.06%
VGPRs: 574680 -> 574800 (+0.02%); split: -0.00%, +0.03%
Latency: 54261324 -> 54357916 (+0.18%); split: -0.14%, +0.32%
InvThroughput: 9122700 -> 9121115 (-0.02%); split: -0.07%, +0.05%
VClause: 222062 -> 218499 (-1.60%); split: -2.33%, +0.73%
SClause: 167138 -> 163233 (-2.34%); split: -2.43%, +0.09%
Copies: 602395 -> 598560 (-0.64%); split: -1.21%, +0.57%
Branches: 161939 -> 161932 (-0.00%); split: -0.01%, +0.00%
VALU: 5063999 -> 5060199 (-0.08%); split: -0.14%, +0.07%
SALU: 988254 -> 988285 (+0.00%); split: -0.02%, +0.02%
VOPD: 2478 -> 2443 (-1.41%); split: +0.40%, -1.82%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:09 +00:00
Daniel Schürmann
fb6b95517e aco/scheduler: check dependencies of entire clause upfront
and bail if any instruction of the clause can't be moved.

Totals from 4310 (5.40% of 79839) affected shaders:

MaxWaves: 115826 -> 115834 (+0.01%)
Instrs: 6256436 -> 6257599 (+0.02%); split: -0.05%, +0.07%
CodeSize: 32816488 -> 32820768 (+0.01%); split: -0.04%, +0.05%
VGPRs: 260184 -> 260172 (-0.00%)
Latency: 41207213 -> 41052150 (-0.38%); split: -0.45%, +0.07%
InvThroughput: 6822608 -> 6815208 (-0.11%); split: -0.14%, +0.03%
VClause: 148412 -> 147133 (-0.86%); split: -1.03%, +0.17%
SClause: 120854 -> 120856 (+0.00%); split: -0.01%, +0.01%
Copies: 425910 -> 427276 (+0.32%); split: -0.25%, +0.57%
VALU: 3572293 -> 3573647 (+0.04%); split: -0.03%, +0.07%
VOPD: 2803 -> 2816 (+0.46%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>
2025-08-19 16:59:08 +00:00
Daniel Schürmann
7e63251d1f aco/isel: refactor store_shared() by directly matching NIR intrinsics to ACO opcodes
Totals from 1435 (1.80% of 79839) affected shaders: (Navi48)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:15 +00:00
Daniel Schürmann
1fde289539 aco/isel: refactor load_shared() by directly matching NIR intrinsics to ACO opcodes
Totals from 3 (0.00% of 79839) affected shaders: (Navi48)

Instrs: 700 -> 698 (-0.29%)
CodeSize: 3860 -> 3852 (-0.21%)
Latency: 2351 -> 2349 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:15 +00:00
Daniel Schürmann
4632ee4c37 aco/isel: rename emit_readfirstlane() -> emit_vector_as_uniform()
Also allow to use p_as_uniform and improve vector splitting.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:14 +00:00
Daniel Schürmann
26595577b3 aco/isel: allow for large 8-bit vectors in extract_8_16_bit_sgpr_element()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>
2025-08-19 14:28:14 +00:00
Georg Lehmann
9ed94371f7 amd: stop using custom gl_access_qualifier for access type
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>
2025-08-15 08:26:10 +00:00
Georg Lehmann
f17cb6b714 amd: replace ACCESS_TYPE_SMEM with ACCESS_SMEM_AMD
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>
2025-08-15 08:26:10 +00:00
Georg Lehmann
6ba462bf26 aco/disable_wqm: optimize local mask creation
Foz-DB Navi48:
Totals from 7861 (9.79% of 80287) affected shaders:
Instrs: 13276809 -> 13183483 (-0.70%)
CodeSize: 71221260 -> 70852500 (-0.52%); split: -0.52%, +0.00%
Latency: 124001421 -> 123976480 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 17820119 -> 17817551 (-0.01%); split: -0.01%, +0.00%
SALU: 1736356 -> 1666673 (-4.01%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:47 +00:00
Georg Lehmann
fc53cf146c aco: disable wqm for sampled buffer loads when not needed
Foz-DB GFX1201:
Totals from 318 (0.40% of 80287) affected shaders:
Instrs: 313039 -> 314064 (+0.33%); split: -0.00%, +0.33%
CodeSize: 1684104 -> 1688212 (+0.24%); split: -0.00%, +0.24%
VGPRs: 15120 -> 15144 (+0.16%)
Latency: 2515023 -> 2518610 (+0.14%); split: -0.06%, +0.20%
InvThroughput: 447468 -> 447615 (+0.03%); split: -0.02%, +0.05%
VClause: 4866 -> 4914 (+0.99%)
SClause: 6564 -> 6559 (-0.08%); split: -0.09%, +0.02%
Copies: 23577 -> 23673 (+0.41%); split: -0.04%, +0.45%
PreSGPRs: 16019 -> 16029 (+0.06%)
VALU: 172157 -> 172143 (-0.01%)
SALU: 52816 -> 53867 (+1.99%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:47 +00:00
Georg Lehmann
883b1ca364 aco: disable wqm for tex loads when not needed
By only executing VMEM loads for lanes where the result is used, we can save
bandwidth.

The NIR pass only handles tex for now, but those are most common anyway.
We can extend it handle image/ssbo/ubo/global loads in the future.

Foz-DB GFX1201:
Totals from 32633 (40.66% of 80251) affected shaders:
Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46%
CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81%
VGPRs: 1481868 -> 1481712 (-0.01%)
SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45%
Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30%
InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12%
VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13%
SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51%
Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96%
Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1330303 -> 1416540 (+6.48%)
PreVGPRs: 964864 -> 964863 (-0.00%)
VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01%
SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
7159fd21f8 aco: don't restrict vmem load scheduling by inserting p_end_wqm early
Foz-DB GFX1201:
Totals from 7 (0.01% of 80251) affected shaders:
Instrs: 703 -> 729 (+3.70%)
CodeSize: 4032 -> 4136 (+2.58%)
Latency: 5840 -> 4715 (-19.26%)
InvThroughput: 441 -> 405 (-8.16%)
Copies: 61 -> 67 (+9.84%)
PreSGPRs: 216 -> 218 (+0.93%)
SALU: 93 -> 113 (+21.51%)

When reordered after the next commit:
Foz-DB GFX1201:
Totals from 1609 (2.00% of 80251) affected shaders:
MaxWaves: 47984 -> 47986 (+0.00%)
Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50%
CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41%
VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30%
SpillSGPRs: 182 -> 184 (+1.10%)
Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15%
InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06%
VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32%
SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66%
Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16%
Branches: 21599 -> 21727 (+0.59%)
PreSGPRs: 69963 -> 70732 (+1.10%)
VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03%
SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
c1b29174b4 aco: use a smaller wqm section for strict_wqm sampling
It's only important that the coordinate is created in WQM,
the sample itself doesn't care.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
de4b345949 aco/insert_exec: remove per instruction wqm/exact exec handling
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
11cee3d634 aco: use new disable_wqm for p_dual_src_export_gfx11
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
8e53ba9a0a aco: use new disable_wqm for exp
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
bd6647e21e aco/builder: support new disable_wqm
Create the additional undef operands that are filled by insert_exec.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
0e66f2b2cc aco: use new disable_wqm for mimg
Foz-DB GFX1201:
Totals from 88 (0.11% of 80251) affected shaders:
Instrs: 81954 -> 82218 (+0.32%); split: -0.02%, +0.34%
CodeSize: 451824 -> 452880 (+0.23%); split: -0.02%, +0.25%
Latency: 308818 -> 308746 (-0.02%); split: -0.05%, +0.02%
VClause: 1324 -> 1318 (-0.45%)
Copies: 2795 -> 2784 (-0.39%)
PreSGPRs: 4029 -> 4035 (+0.15%)
SALU: 6563 -> 6809 (+3.75%); split: -0.15%, +3.90%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
922f559c3c aco: use new disable_wqm for flatlike
No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
a4c537c5b3 aco: use new disable_wqm for mubuf/mtbuf
Foz-DB GFX1201:
Totals from 66 (0.08% of 80251) affected shaders:
Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65%
CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48%
Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02%
InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02%
VClause: 982 -> 962 (-2.04%)
Copies: 2882 -> 2808 (-2.57%)
PreSGPRs: 2564 -> 2599 (+1.37%)
SALU: 4748 -> 5010 (+5.52%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
63af48ae2e aco/insert_exec: new way to handle instructions that need wqm disabled
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
ca25553b92 aco: add a post-RA pass to disable wqm
By disabling WQM post-RA, we don't have RA/spilling mov issues with image_sample
operands that need to be computed in WQM.

We also don't restrict scheduling by inserting exec writes.
The only downside is more scalar ALU usage, but the SALU is almost always underutilized.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
34b154866f aco/insert_exec: remove p_jump_to_epilog from needs exact
p_end_wqm will always be emitted before it by isel.

No Foz-DB changes on GFX1201.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Yonggang Luo
fc1b26f4dc aco: Fixes warning note: ambiguity is between a regular call to this operator and a call with the argument order reversed
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
../../src/amd/compiler/aco_util.h:300:9: note: ambiguity is between a regular call to this operator and a call with the argument order reversed
  300 |    bool operator==(const monotonic_buffer_resource& other) { return buffer == other.buffer; }

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36722>
2025-08-13 19:49:37 +00:00
Rhys Perry
08f088479a aco/ra: set late-kill for operands of temporary p_create_vector
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13543
Fixes: c279dd6e61 ("aco: Support vector-aligned ops fixed to defs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36469>
2025-08-06 09:44:01 +00:00
Daniel Schürmann
d3743dd7ba aco/scheduler: improve scheduling heuristic
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The heuristic we are currently using still stems from the GCN era
with the only adjustments being made for RDNA was to double (or triple)
the wave count.

This rewrite aims to detangle some concepts and provide more consistent results.

 - wave_factor: The purpose of this value is to reflect that RDNA SIMDs can
                accomodate twice as many waves as GCN SIMDs.
 - reg_file_multiple: This value accounts for the larger register file of wave32
                      and some RDNA3 families.
 - wave_minimum: Below this value, we don't sacrifice any waves. It corresponds
                 to a register demand of 64 VGPRs in wave64.
 - occupancy_factor: Depending on target_waves and wave_factor, this controls
                     the scheduling window sizes and number of moves.

The main differences from the previous heuristic is a lower wave minimum and
a slightly less aggressive reduction of waves.
It also increases SMEM_MAX_MOVES in order to mitigate some of the changes
from targeting less waves.

Totals from 62777 (78.63% of 79839) affected shaders: (Navi48)
MaxWaves: 1880983 -> 1848028 (-1.75%); split: +0.01%, -1.76%
Instrs: 40904711 -> 40800797 (-0.25%); split: -0.39%, +0.14%
CodeSize: 217132208 -> 216748832 (-0.18%); split: -0.29%, +0.12%
VGPRs: 3019304 -> 3099596 (+2.66%); split: -0.11%, +2.77%
Latency: 268857129 -> 265951122 (-1.08%); split: -1.33%, +0.25%
InvThroughput: 40960938 -> 41044533 (+0.20%); split: -0.18%, +0.39%
VClause: 794000 -> 782913 (-1.40%); split: -2.24%, +0.84%
SClause: 1192476 -> 1150831 (-3.49%); split: -3.94%, +0.45%
Copies: 2720470 -> 2700148 (-0.75%); split: -1.84%, +1.09%
Branches: 785926 -> 785951 (+0.00%); split: -0.01%, +0.01%
VALU: 22918411 -> 22890189 (-0.12%); split: -0.19%, +0.06%
SALU: 5281201 -> 5289486 (+0.16%); split: -0.21%, +0.36%
VOPD: 8790 -> 8685 (-1.19%); split: +1.08%, -2.28%

Totals from 62081 (77.77% of 79825) affected shaders: (Navi31)
MaxWaves: 1848555 -> 1812347 (-1.96%); split: +0.01%, -1.97%
Instrs: 39794460 -> 39704180 (-0.23%); split: -0.39%, +0.16%
CodeSize: 208987052 -> 208621524 (-0.17%); split: -0.31%, +0.13%
VGPRs: 3046284 -> 3135156 (+2.92%); split: -0.11%, +3.03%
Latency: 268863465 -> 265218186 (-1.36%); split: -1.59%, +0.23%
InvThroughput: 41101515 -> 41167075 (+0.16%); split: -0.22%, +0.38%
VClause: 795316 -> 774899 (-2.57%); split: -3.17%, +0.61%
SClause: 1177294 -> 1135451 (-3.55%); split: -4.06%, +0.51%
Copies: 2743254 -> 2725127 (-0.66%); split: -1.90%, +1.24%
Branches: 801395 -> 801428 (+0.00%); split: -0.01%, +0.02%
VALU: 23898938 -> 23871294 (-0.12%); split: -0.20%, +0.08%
SALU: 3908807 -> 3919130 (+0.26%); split: -0.23%, +0.50%
VOPD: 8529 -> 8500 (-0.34%); split: +1.29%, -1.63%

Totals from 44996 (71.01% of 63370) affected shaders: (Vega10)

MaxWaves: 307074 -> 304808 (-0.74%); split: +0.63%, -1.37%
Instrs: 22743534 -> 22716240 (-0.12%); split: -0.22%, +0.10%
CodeSize: 117284856 -> 117173212 (-0.10%); split: -0.19%, +0.09%
SGPRs: 3249008 -> 3330480 (+2.51%); split: -0.36%, +2.87%
VGPRs: 1901400 -> 1943880 (+2.23%); split: -0.60%, +2.83%
Latency: 224839126 -> 222878477 (-0.87%); split: -1.19%, +0.31%
InvThroughput: 114389570 -> 114316559 (-0.06%); split: -0.17%, +0.11%
VClause: 482012 -> 473304 (-1.81%); split: -2.86%, +1.05%
SClause: 757799 -> 717092 (-5.37%); split: -5.64%, +0.27%
Copies: 2182735 -> 2183598 (+0.04%); split: -1.17%, +1.21%
Branches: 396026 -> 395996 (-0.01%); split: -0.03%, +0.02%
VALU: 16740283 -> 16728098 (-0.07%); split: -0.14%, +0.07%
SALU: 2133575 -> 2145863 (+0.58%); split: -0.29%, +0.86%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30720>
2025-08-06 09:16:33 +00:00