Daniel Schürmann
37299a8d1a
aco/scheduler: Stop downwards scheduling after encountering the first clause
...
Totals from 9899 (12.40% of 79839) affected shaders: (Navi48)
MaxWaves: 276355 -> 276317 (-0.01%); split: +0.01%, -0.02%
Instrs: 8781768 -> 8766504 (-0.17%); split: -0.25%, +0.07%
CodeSize: 46297556 -> 46236104 (-0.13%); split: -0.19%, +0.06%
VGPRs: 574680 -> 574800 (+0.02%); split: -0.00%, +0.03%
Latency: 54261324 -> 54357916 (+0.18%); split: -0.14%, +0.32%
InvThroughput: 9122700 -> 9121115 (-0.02%); split: -0.07%, +0.05%
VClause: 222062 -> 218499 (-1.60%); split: -2.33%, +0.73%
SClause: 167138 -> 163233 (-2.34%); split: -2.43%, +0.09%
Copies: 602395 -> 598560 (-0.64%); split: -1.21%, +0.57%
Branches: 161939 -> 161932 (-0.00%); split: -0.01%, +0.00%
VALU: 5063999 -> 5060199 (-0.08%); split: -0.14%, +0.07%
SALU: 988254 -> 988285 (+0.00%); split: -0.02%, +0.02%
VOPD: 2478 -> 2443 (-1.41%); split: +0.40%, -1.82%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:09 +00:00
Daniel Schürmann
fb6b95517e
aco/scheduler: check dependencies of entire clause upfront
...
and bail if any instruction of the clause can't be moved.
Totals from 4310 (5.40% of 79839) affected shaders:
MaxWaves: 115826 -> 115834 (+0.01%)
Instrs: 6256436 -> 6257599 (+0.02%); split: -0.05%, +0.07%
CodeSize: 32816488 -> 32820768 (+0.01%); split: -0.04%, +0.05%
VGPRs: 260184 -> 260172 (-0.00%)
Latency: 41207213 -> 41052150 (-0.38%); split: -0.45%, +0.07%
InvThroughput: 6822608 -> 6815208 (-0.11%); split: -0.14%, +0.03%
VClause: 148412 -> 147133 (-0.86%); split: -1.03%, +0.17%
SClause: 120854 -> 120856 (+0.00%); split: -0.01%, +0.01%
Copies: 425910 -> 427276 (+0.32%); split: -0.25%, +0.57%
VALU: 3572293 -> 3573647 (+0.04%); split: -0.03%, +0.07%
VOPD: 2803 -> 2816 (+0.46%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599 >
2025-08-19 16:59:08 +00:00
Daniel Schürmann
7e63251d1f
aco/isel: refactor store_shared() by directly matching NIR intrinsics to ACO opcodes
...
Totals from 1435 (1.80% of 79839) affected shaders: (Navi48)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:15 +00:00
Daniel Schürmann
1fde289539
aco/isel: refactor load_shared() by directly matching NIR intrinsics to ACO opcodes
...
Totals from 3 (0.00% of 79839) affected shaders: (Navi48)
Instrs: 700 -> 698 (-0.29%)
CodeSize: 3860 -> 3852 (-0.21%)
Latency: 2351 -> 2349 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:15 +00:00
Daniel Schürmann
4632ee4c37
aco/isel: rename emit_readfirstlane() -> emit_vector_as_uniform()
...
Also allow to use p_as_uniform and improve vector splitting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00
Daniel Schürmann
26595577b3
aco/isel: allow for large 8-bit vectors in extract_8_16_bit_sgpr_element()
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00
Georg Lehmann
9ed94371f7
amd: stop using custom gl_access_qualifier for access type
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764 >
2025-08-15 08:26:10 +00:00
Georg Lehmann
f17cb6b714
amd: replace ACCESS_TYPE_SMEM with ACCESS_SMEM_AMD
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764 >
2025-08-15 08:26:10 +00:00
Georg Lehmann
6ba462bf26
aco/disable_wqm: optimize local mask creation
...
Foz-DB Navi48:
Totals from 7861 (9.79% of 80287) affected shaders:
Instrs: 13276809 -> 13183483 (-0.70%)
CodeSize: 71221260 -> 70852500 (-0.52%); split: -0.52%, +0.00%
Latency: 124001421 -> 123976480 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 17820119 -> 17817551 (-0.01%); split: -0.01%, +0.00%
SALU: 1736356 -> 1666673 (-4.01%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:47 +00:00
Georg Lehmann
fc53cf146c
aco: disable wqm for sampled buffer loads when not needed
...
Foz-DB GFX1201:
Totals from 318 (0.40% of 80287) affected shaders:
Instrs: 313039 -> 314064 (+0.33%); split: -0.00%, +0.33%
CodeSize: 1684104 -> 1688212 (+0.24%); split: -0.00%, +0.24%
VGPRs: 15120 -> 15144 (+0.16%)
Latency: 2515023 -> 2518610 (+0.14%); split: -0.06%, +0.20%
InvThroughput: 447468 -> 447615 (+0.03%); split: -0.02%, +0.05%
VClause: 4866 -> 4914 (+0.99%)
SClause: 6564 -> 6559 (-0.08%); split: -0.09%, +0.02%
Copies: 23577 -> 23673 (+0.41%); split: -0.04%, +0.45%
PreSGPRs: 16019 -> 16029 (+0.06%)
VALU: 172157 -> 172143 (-0.01%)
SALU: 52816 -> 53867 (+1.99%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:47 +00:00
Georg Lehmann
883b1ca364
aco: disable wqm for tex loads when not needed
...
By only executing VMEM loads for lanes where the result is used, we can save
bandwidth.
The NIR pass only handles tex for now, but those are most common anyway.
We can extend it handle image/ssbo/ubo/global loads in the future.
Foz-DB GFX1201:
Totals from 32633 (40.66% of 80251) affected shaders:
Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46%
CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81%
VGPRs: 1481868 -> 1481712 (-0.01%)
SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45%
Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30%
InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12%
VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13%
SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51%
Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96%
Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1330303 -> 1416540 (+6.48%)
PreVGPRs: 964864 -> 964863 (-0.00%)
VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01%
SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
7159fd21f8
aco: don't restrict vmem load scheduling by inserting p_end_wqm early
...
Foz-DB GFX1201:
Totals from 7 (0.01% of 80251) affected shaders:
Instrs: 703 -> 729 (+3.70%)
CodeSize: 4032 -> 4136 (+2.58%)
Latency: 5840 -> 4715 (-19.26%)
InvThroughput: 441 -> 405 (-8.16%)
Copies: 61 -> 67 (+9.84%)
PreSGPRs: 216 -> 218 (+0.93%)
SALU: 93 -> 113 (+21.51%)
When reordered after the next commit:
Foz-DB GFX1201:
Totals from 1609 (2.00% of 80251) affected shaders:
MaxWaves: 47984 -> 47986 (+0.00%)
Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50%
CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41%
VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30%
SpillSGPRs: 182 -> 184 (+1.10%)
Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15%
InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06%
VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32%
SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66%
Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16%
Branches: 21599 -> 21727 (+0.59%)
PreSGPRs: 69963 -> 70732 (+1.10%)
VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03%
SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
c1b29174b4
aco: use a smaller wqm section for strict_wqm sampling
...
It's only important that the coordinate is created in WQM,
the sample itself doesn't care.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
de4b345949
aco/insert_exec: remove per instruction wqm/exact exec handling
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
11cee3d634
aco: use new disable_wqm for p_dual_src_export_gfx11
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
8e53ba9a0a
aco: use new disable_wqm for exp
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
bd6647e21e
aco/builder: support new disable_wqm
...
Create the additional undef operands that are filled by insert_exec.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
0e66f2b2cc
aco: use new disable_wqm for mimg
...
Foz-DB GFX1201:
Totals from 88 (0.11% of 80251) affected shaders:
Instrs: 81954 -> 82218 (+0.32%); split: -0.02%, +0.34%
CodeSize: 451824 -> 452880 (+0.23%); split: -0.02%, +0.25%
Latency: 308818 -> 308746 (-0.02%); split: -0.05%, +0.02%
VClause: 1324 -> 1318 (-0.45%)
Copies: 2795 -> 2784 (-0.39%)
PreSGPRs: 4029 -> 4035 (+0.15%)
SALU: 6563 -> 6809 (+3.75%); split: -0.15%, +3.90%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
922f559c3c
aco: use new disable_wqm for flatlike
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
a4c537c5b3
aco: use new disable_wqm for mubuf/mtbuf
...
Foz-DB GFX1201:
Totals from 66 (0.08% of 80251) affected shaders:
Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65%
CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48%
Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02%
InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02%
VClause: 982 -> 962 (-2.04%)
Copies: 2882 -> 2808 (-2.57%)
PreSGPRs: 2564 -> 2599 (+1.37%)
SALU: 4748 -> 5010 (+5.52%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
63af48ae2e
aco/insert_exec: new way to handle instructions that need wqm disabled
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
ca25553b92
aco: add a post-RA pass to disable wqm
...
By disabling WQM post-RA, we don't have RA/spilling mov issues with image_sample
operands that need to be computed in WQM.
We also don't restrict scheduling by inserting exec writes.
The only downside is more scalar ALU usage, but the SALU is almost always underutilized.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
34b154866f
aco/insert_exec: remove p_jump_to_epilog from needs exact
...
p_end_wqm will always be emitted before it by isel.
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Yonggang Luo
fc1b26f4dc
aco: Fixes warning note: ambiguity is between a regular call to this operator and a call with the argument order reversed
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
../../src/amd/compiler/aco_util.h:300:9: note: ambiguity is between a regular call to this operator and a call with the argument order reversed
300 | bool operator==(const monotonic_buffer_resource& other) { return buffer == other.buffer; }
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36722 >
2025-08-13 19:49:37 +00:00
Rhys Perry
08f088479a
aco/ra: set late-kill for operands of temporary p_create_vector
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13543
Fixes: c279dd6e61 ("aco: Support vector-aligned ops fixed to defs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36469 >
2025-08-06 09:44:01 +00:00
Daniel Schürmann
d3743dd7ba
aco/scheduler: improve scheduling heuristic
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The heuristic we are currently using still stems from the GCN era
with the only adjustments being made for RDNA was to double (or triple)
the wave count.
This rewrite aims to detangle some concepts and provide more consistent results.
- wave_factor: The purpose of this value is to reflect that RDNA SIMDs can
accomodate twice as many waves as GCN SIMDs.
- reg_file_multiple: This value accounts for the larger register file of wave32
and some RDNA3 families.
- wave_minimum: Below this value, we don't sacrifice any waves. It corresponds
to a register demand of 64 VGPRs in wave64.
- occupancy_factor: Depending on target_waves and wave_factor, this controls
the scheduling window sizes and number of moves.
The main differences from the previous heuristic is a lower wave minimum and
a slightly less aggressive reduction of waves.
It also increases SMEM_MAX_MOVES in order to mitigate some of the changes
from targeting less waves.
Totals from 62777 (78.63% of 79839) affected shaders: (Navi48)
MaxWaves: 1880983 -> 1848028 (-1.75%); split: +0.01%, -1.76%
Instrs: 40904711 -> 40800797 (-0.25%); split: -0.39%, +0.14%
CodeSize: 217132208 -> 216748832 (-0.18%); split: -0.29%, +0.12%
VGPRs: 3019304 -> 3099596 (+2.66%); split: -0.11%, +2.77%
Latency: 268857129 -> 265951122 (-1.08%); split: -1.33%, +0.25%
InvThroughput: 40960938 -> 41044533 (+0.20%); split: -0.18%, +0.39%
VClause: 794000 -> 782913 (-1.40%); split: -2.24%, +0.84%
SClause: 1192476 -> 1150831 (-3.49%); split: -3.94%, +0.45%
Copies: 2720470 -> 2700148 (-0.75%); split: -1.84%, +1.09%
Branches: 785926 -> 785951 (+0.00%); split: -0.01%, +0.01%
VALU: 22918411 -> 22890189 (-0.12%); split: -0.19%, +0.06%
SALU: 5281201 -> 5289486 (+0.16%); split: -0.21%, +0.36%
VOPD: 8790 -> 8685 (-1.19%); split: +1.08%, -2.28%
Totals from 62081 (77.77% of 79825) affected shaders: (Navi31)
MaxWaves: 1848555 -> 1812347 (-1.96%); split: +0.01%, -1.97%
Instrs: 39794460 -> 39704180 (-0.23%); split: -0.39%, +0.16%
CodeSize: 208987052 -> 208621524 (-0.17%); split: -0.31%, +0.13%
VGPRs: 3046284 -> 3135156 (+2.92%); split: -0.11%, +3.03%
Latency: 268863465 -> 265218186 (-1.36%); split: -1.59%, +0.23%
InvThroughput: 41101515 -> 41167075 (+0.16%); split: -0.22%, +0.38%
VClause: 795316 -> 774899 (-2.57%); split: -3.17%, +0.61%
SClause: 1177294 -> 1135451 (-3.55%); split: -4.06%, +0.51%
Copies: 2743254 -> 2725127 (-0.66%); split: -1.90%, +1.24%
Branches: 801395 -> 801428 (+0.00%); split: -0.01%, +0.02%
VALU: 23898938 -> 23871294 (-0.12%); split: -0.20%, +0.08%
SALU: 3908807 -> 3919130 (+0.26%); split: -0.23%, +0.50%
VOPD: 8529 -> 8500 (-0.34%); split: +1.29%, -1.63%
Totals from 44996 (71.01% of 63370) affected shaders: (Vega10)
MaxWaves: 307074 -> 304808 (-0.74%); split: +0.63%, -1.37%
Instrs: 22743534 -> 22716240 (-0.12%); split: -0.22%, +0.10%
CodeSize: 117284856 -> 117173212 (-0.10%); split: -0.19%, +0.09%
SGPRs: 3249008 -> 3330480 (+2.51%); split: -0.36%, +2.87%
VGPRs: 1901400 -> 1943880 (+2.23%); split: -0.60%, +2.83%
Latency: 224839126 -> 222878477 (-0.87%); split: -1.19%, +0.31%
InvThroughput: 114389570 -> 114316559 (-0.06%); split: -0.17%, +0.11%
VClause: 482012 -> 473304 (-1.81%); split: -2.86%, +1.05%
SClause: 757799 -> 717092 (-5.37%); split: -5.64%, +0.27%
Copies: 2182735 -> 2183598 (+0.04%); split: -1.17%, +1.21%
Branches: 396026 -> 395996 (-0.01%); split: -0.03%, +0.02%
VALU: 16740283 -> 16728098 (-0.07%); split: -0.14%, +0.07%
SALU: 2133575 -> 2145863 (+0.58%); split: -0.29%, +0.86%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30720 >
2025-08-06 09:16:33 +00:00
Qiang Yu
196569b1a4
all: rename gl_shader_stage to mesa_shader_stage
...
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569 >
2025-08-06 10:28:40 +08:00
Rhys Perry
76c96bf558
aco: fix possible scratch offset overflow
...
We split vector load/store, so consider that we might add to the constant
offset.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406 >
2025-08-04 15:06:44 +00:00
Rhys Perry
44ab4ad732
aco: align scratch size after isel
...
Make it safe for VGPR spilling if it's not a multiple of 4.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406 >
2025-08-04 15:06:43 +00:00
Rhys Perry
ab10604924
aco/gfx12: fix printing of temporal hints
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406 >
2025-08-04 15:06:41 +00:00
Rhys Perry
cec845079e
ac/nir/lower_ps: remove barrier for end_invocation_interlock
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
SPIR-V->NIR now inserts this barrier itself.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513 >
2025-08-04 09:30:06 +00:00
Daniel Schürmann
4ca3cc5a1a
aco/ra: propagate precolor affinities through parallelcopies and tied definitions
...
Totals from 214 (0.27% of 79839) affected shaders: (Navi48)
Instrs: 65339 -> 65311 (-0.04%); split: -0.05%, +0.00%
CodeSize: 352616 -> 350952 (-0.47%); split: -0.55%, +0.07%
VGPRs: 9984 -> 9960 (-0.24%)
Latency: 207556 -> 207508 (-0.02%); split: -0.03%, +0.01%
InvThroughput: 40422 -> 40397 (-0.06%)
Copies: 3180 -> 3155 (-0.79%)
VALU: 38347 -> 38322 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
a667d9a68d
aco/ra: propagate precolor affinities through phis
...
Totals from 917 (1.15% of 79839) affected shaders: (Navi48)
Instrs: 3217861 -> 3216947 (-0.03%); split: -0.04%, +0.01%
CodeSize: 17427204 -> 17432264 (+0.03%); split: -0.06%, +0.09%
VGPRs: 65328 -> 65316 (-0.02%)
Latency: 35336268 -> 35335528 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 7305032 -> 7302187 (-0.04%); split: -0.04%, +0.00%
SClause: 120537 -> 120553 (+0.01%); split: -0.01%, +0.02%
Copies: 307257 -> 306852 (-0.13%); split: -0.21%, +0.08%
Branches: 115744 -> 115743 (-0.00%)
VALU: 1572522 -> 1572183 (-0.02%); split: -0.02%, +0.00%
SALU: 574229 -> 574155 (-0.01%); split: -0.05%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
2ddd8ef0a3
aco/ra: don't optimize encodings on precolor affinity mismatch
...
Totals from 238 (0.30% of 79839) affected shaders: (Navi48)
Instrs: 137836 -> 137176 (-0.48%); split: -0.50%, +0.02%
CodeSize: 728616 -> 728668 (+0.01%); split: -0.06%, +0.07%
Latency: 1503248 -> 1500202 (-0.20%); split: -0.56%, +0.36%
InvThroughput: 297725 -> 296715 (-0.34%); split: -0.70%, +0.36%
Copies: 9390 -> 8825 (-6.02%); split: -6.33%, +0.31%
VALU: 89861 -> 89296 (-0.63%); split: -0.66%, +0.03%
SALU: 13166 -> 13167 (+0.01%); split: -0.05%, +0.06%
Suggested-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
93606a19c6
aco/ra: collect register affinities for all precolored operands.
...
Totals from 1280 (1.60% of 79839) affected shaders: (Navi48)
Instrs: 817363 -> 812639 (-0.58%); split: -0.58%, +0.00%
CodeSize: 4262644 -> 4243540 (-0.45%); split: -0.45%, +0.00%
VGPRs: 61692 -> 61668 (-0.04%)
Latency: 4354318 -> 4347818 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 711914 -> 707698 (-0.59%); split: -0.59%, +0.00%
VClause: 14685 -> 14677 (-0.05%); split: -0.09%, +0.03%
SClause: 25623 -> 25621 (-0.01%)
Copies: 50663 -> 46242 (-8.73%); split: -8.73%, +0.00%
VALU: 427744 -> 423323 (-1.03%); split: -1.03%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
e32eec52f0
aco/ra: generalize register affinities
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
caa2c22d8b
aco/tests: Fix p_startpgm definitions to registers
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Alyssa Rosenzweig
cc6e3b84cb
treewide: use nir_def_as_*
...
Via Coccinelle patch:
@@
expression definition;
@@
-nir_instr_as_alu(definition->parent_instr)
+nir_def_as_alu(definition)
@@
expression definition;
@@
-nir_instr_as_intrinsic(definition->parent_instr)
+nir_def_as_intrinsic(definition)
@@
expression definition;
@@
-nir_instr_as_phi(definition->parent_instr)
+nir_def_as_phi(definition)
@@
expression definition;
@@
-nir_instr_as_load_const(definition->parent_instr)
+nir_def_as_load_const(definition)
@@
expression definition;
@@
-nir_instr_as_deref(definition->parent_instr)
+nir_def_as_deref(definition)
@@
expression definition;
@@
-nir_instr_as_tex(definition->parent_instr)
+nir_def_as_tex(definition)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489 >
2025-08-01 15:34:24 +00:00
Antonio Ospite
ddf2aa3a4d
build: avoid redefining unreachable() which is standard in C23
...
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437 >
2025-07-31 17:49:42 +00:00
Georg Lehmann
a6a6c2f691
aco/ra: convert bitwise instruction to gfx11+ 16bit on demand
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The 32bit versions are smaller, allow more optimizations and VOPD,
so only use the 16bit opcodes if nessecary.
Foz-DB Navi31:
Totals from 84 (0.10% of 80237) affected shaders:
Instrs: 176673 -> 176347 (-0.18%); split: -0.20%, +0.01%
CodeSize: 970148 -> 969716 (-0.04%); split: -0.08%, +0.03%
VGPRs: 5876 -> 5864 (-0.20%)
Latency: 2805974 -> 2805674 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 769007 -> 768738 (-0.03%); split: -0.04%, +0.01%
VClause: 2593 -> 2597 (+0.15%)
Copies: 23749 -> 23487 (-1.10%); split: -1.11%, +0.00%
VALU: 107124 -> 106862 (-0.24%); split: -0.25%, +0.00%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919 >
2025-07-31 12:07:07 +00:00
Georg Lehmann
404e1f13e8
aco/print_asm: use real true16 instr on gfx11+
...
Fake16 doesn't print opsel on v_cndmask_b16, so it looks really broken.
Restrict to LLVM20+ because older versions have incomplete tru16 support.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919 >
2025-07-31 12:07:07 +00:00
Georg Lehmann
b12db991eb
aco/gfx10: optimize subgroupRotate(x, 32) and subgroupShuffleXor(x, 32)
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We don't have v_permlane64_b32 yet, but we can still optimize it using
shared vgprs. Using the DPP16 row mask, we can even avoid writing exec.
With v0 input/output and v24/v25 as shared vgprs, this results in:
v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390 >
2025-07-29 06:33:20 +00:00
Georg Lehmann
eb4df58a3d
aco/isel: refactor shared vgpr usage
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390 >
2025-07-29 06:33:20 +00:00
Georg Lehmann
8a2aca8d6f
aco/select_alu: avoid vector get_alu_src for instructions with scalar operands
...
Foz-DB Navi21:
Totals from 1 (0.00% of 80237) affected shaders:
Instrs: 22 -> 21 (-4.55%)
CodeSize: 112 -> 108 (-3.57%)
Latency: 392 -> 386 (-1.53%)
InvThroughput: 25 -> 24 (-4.00%)
Copies: 4 -> 3 (-25.00%)
PreVGPRs: 8 -> 4 (-50.00%)
VALU: 10 -> 9 (-10.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728 >
2025-07-29 06:07:15 +00:00
Georg Lehmann
ad9c340d86
aco: insert VALU s_delay_alu for WMMA
...
This should avoid some SIMD stalls.
I think this special case was added to try to handle this case:
First Instruction: WMMA
Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C
Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction
If I read it correctly, we shouldn't need a delay if the type is the same and no
modifier is used. That's kind of complex to handle, so leave it for now.
Not inserting any delays likely hurts more than this.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
413d0d2ec8
aco/statistics: update GFX12 WMMA cost
...
Based on marketing numbers, but they seem to match RGP.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
8f61c85880
aco/statistics: add latency to WMMA
...
Assume the normal VALU latency of 4 cycles.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
004f8aa2f4
aco: optimize get_alu_src with constant source and size > 1
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Emulated FSR4, Navi31:
Totals from 14 (100.00% of 14) affected shaders:
MaxWaves: 130 -> 131 (+0.77%)
Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09%
CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07%
VGPRs: 2544 -> 2520 (-0.94%)
SpillVGPRs: 92 -> 89 (-3.26%)
Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66%
InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02%
VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80%
Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51%
PreSGPRs: 504 -> 519 (+2.98%)
PreVGPRs: 2460 -> 2448 (-0.49%)
VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05%
SALU: 1231 -> 1384 (+12.43%)
VMEM: 3807 -> 3801 (-0.16%)
VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090 >
2025-07-25 11:33:00 +00:00
Alyssa Rosenzweig
8a1a410389
treewide: use SWAP macro
...
Via Coccinelle patch + manual clean up:
@@
identifier temporary, a, b;
type T;
@@
-T temporary = a;
-a = b;
-b = temporary;
+SWAP(a, b);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297 >
2025-07-23 19:49:47 +00:00
Georg Lehmann
c80daf934c
aco: supported 64bit or vectorized bitfield_select
...
No Foz-DB changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141 >
2025-07-21 20:42:32 +00:00