Rhys Perry
3e9921f52e
radv: only call radv_should_use_wgp_mode() once
...
This will let the compiler choose between CU and WGP mode.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37791 >
2025-10-15 13:37:48 +01:00
Daniel Schürmann
eecd1c020d
amd: keep ac_shader_config::lds_size unaligned
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:09 +00:00
Daniel Schürmann
fe6ff6d1ef
aco: remove DeviceInfo::lds_encoding_granule and DeviceInfo::lds_alloc_granule
...
Use utility functions instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:08 +00:00
Daniel Schürmann
11db02d5d9
radv: calculate LDS allocation requirements independently from the compiler
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
Daniel Schürmann
b651234414
amd: change ac_shader_config::lds_size to bytes
...
We still keep it aligned to allocation granularity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
Daniel Schürmann
d0b87a0d5f
ac/nir_flag_smem_for_loads: call divergence analysis internally
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Also don't flag more SMEM instructions (in ACO) after the last
call to ac_nir_lower_mem_access_bit_sizes().
Totals from 75 (0.09% of 79839) affected shaders: (Navi48)
Instrs: 191246 -> 189960 (-0.67%)
CodeSize: 996840 -> 985976 (-1.09%)
Latency: 3066184 -> 2945500 (-3.94%)
InvThroughput: 355373 -> 353106 (-0.64%); split: -0.66%, +0.02%
SClause: 4848 -> 4699 (-3.07%)
Copies: 13827 -> 13925 (+0.71%); split: -0.07%, +0.78%
Branches: 5176 -> 5003 (-3.34%)
PreSGPRs: 6222 -> 6272 (+0.80%)
VALU: 108934 -> 108993 (+0.05%); split: -0.00%, +0.06%
SALU: 31679 -> 31210 (-1.48%); split: -1.51%, +0.03%
SMEM: 7158 -> 6739 (-5.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:12 +00:00
Daniel Schürmann
8ff44f17ef
amd/lower_mem_access_bit_sizes: also use SMEM for subdword loads
...
We can simply extract from the loaded dwords as per
nir_lower_mem_access_bit_sizes() lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:11 +00:00
Samuel Pitoiset
bc32286e5b
radv: declare a new user SGPR for dynamic descriptors
...
To move them out of push constants.
fossils-db (GFX1201):
Totals from 20700 (25.99% of 79646) affected shaders:
Instrs: 14375624 -> 14370051 (-0.04%); split: -0.07%, +0.03%
CodeSize: 76746128 -> 76723772 (-0.03%); split: -0.05%, +0.02%
Latency: 74103586 -> 74113651 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 11908817 -> 11908798 (-0.00%); split: -0.00%, +0.00%
VClause: 249605 -> 249607 (+0.00%); split: -0.00%, +0.00%
SClause: 337914 -> 337772 (-0.04%); split: -0.08%, +0.04%
Copies: 843585 -> 839233 (-0.52%); split: -0.62%, +0.10%
PreSGPRs: 836283 -> 837260 (+0.12%)
SALU: 1790713 -> 1786374 (-0.24%); split: -0.29%, +0.05%
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768 >
2025-10-14 15:34:43 +00:00
Georg Lehmann
f1cbac7a8e
aco/optimizer: unify constant labels
...
Foz-DB Navi21:
Totals from 14 (0.02% of 79789) affected shaders:
Instrs: 44868 -> 44867 (-0.00%)
CodeSize: 279132 -> 279124 (-0.00%)
Copies: 11692 -> 11691 (-0.01%)
VALU: 30353 -> 30352 (-0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:43 +00:00
Georg Lehmann
2d410cf18e
aco/optimizer: apply f2f16 conversion with the new helpers
...
Foz-DB Navi21:
Totals from 183 (0.23% of 79789) affected shaders:
Instrs: 158014 -> 157170 (-0.53%); split: -0.54%, +0.01%
CodeSize: 836444 -> 830148 (-0.75%); split: -0.76%, +0.01%
Latency: 593790 -> 592580 (-0.20%); split: -0.39%, +0.19%
InvThroughput: 150243 -> 148783 (-0.97%); split: -0.98%, +0.00%
VClause: 1301 -> 1312 (+0.85%); split: -0.31%, +1.15%
SClause: 2608 -> 2606 (-0.08%)
PreVGPRs: 8706 -> 8674 (-0.37%)
VALU: 102042 -> 101067 (-0.96%); split: -0.97%, +0.01%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:43 +00:00
Georg Lehmann
2572528d31
aco/optimizer: remove can_apply_extract
...
Foz-DB NAvi21:
Totals from 10 (0.01% of 79789) affected shaders:
Latency: 426254 -> 426256 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 81782 -> 81784 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:42 +00:00
Georg Lehmann
d90976bbad
aco/optimizer: delete apply_extract
...
Foz-DB Navi21:
Totals from 35 (0.04% of 79789) affected shaders:
Instrs: 166213 -> 166149 (-0.04%); split: -0.05%, +0.01%
CodeSize: 946604 -> 945856 (-0.08%); split: -0.08%, +0.01%
Latency: 2770103 -> 2769799 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 1312839 -> 1312685 (-0.01%)
Copies: 31100 -> 31033 (-0.22%); split: -0.26%, +0.05%
VALU: 114493 -> 114393 (-0.09%); split: -0.09%, +0.00%
SALU: 27249 -> 27285 (+0.13%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:42 +00:00
Georg Lehmann
26da5cf8d9
aco/optimizer: apply sgprs/extract with new helpers
...
Foz-DB Navi21:
Totals from 387 (0.49% of 79789) affected shaders:
MaxWaves: 7332 -> 7324 (-0.11%)
Instrs: 3156365 -> 3155691 (-0.02%); split: -0.02%, +0.00%
CodeSize: 17013948 -> 17014456 (+0.00%); split: -0.01%, +0.01%
VGPRs: 24768 -> 24776 (+0.03%)
Latency: 28569179 -> 28568183 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 6530832 -> 6530566 (-0.00%); split: -0.00%, +0.00%
VClause: 90988 -> 90989 (+0.00%); split: -0.00%, +0.00%
Copies: 269074 -> 269060 (-0.01%); split: -0.01%, +0.01%
PreSGPRs: 22503 -> 22499 (-0.02%)
PreVGPRs: 22928 -> 22935 (+0.03%)
VALU: 2100245 -> 2099560 (-0.03%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:41 +00:00
Georg Lehmann
58163f65f0
aco/optimizer: rework packed fneg opt
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:40 +00:00
Georg Lehmann
1f2a9da69c
aco/optimizer: use new helpers to propagate constants/neg/abs
...
Foz-DB Navi21:
Totals from 9128 (11.44% of 79789) affected shaders:
MaxWaves: 184074 -> 184078 (+0.00%)
Instrs: 11886941 -> 11886873 (-0.00%); split: -0.00%, +0.00%
CodeSize: 64908236 -> 64894864 (-0.02%); split: -0.02%, +0.00%
VGPRs: 535280 -> 535216 (-0.01%)
Latency: 119569574 -> 119570019 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 30393122 -> 30387378 (-0.02%); split: -0.02%, +0.00%
VClause: 214075 -> 214014 (-0.03%)
SClause: 338202 -> 338177 (-0.01%); split: -0.01%, +0.00%
Copies: 888682 -> 888592 (-0.01%); split: -0.01%, +0.00%
PreSGPRs: 506996 -> 506992 (-0.00%)
PreVGPRs: 460117 -> 460115 (-0.00%); split: -0.00%, +0.00%
VALU: 8595840 -> 8595758 (-0.00%); split: -0.00%, +0.00%
SALU: 1184728 -> 1184720 (-0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:40 +00:00
Georg Lehmann
859505d95a
aco/optimizer: use new helpers to apply literals
...
Foz-DB Navi21:
Totals from 21009 (26.33% of 79789) affected shaders:
MaxWaves: 495342 -> 495414 (+0.01%)
Instrs: 22345587 -> 22335371 (-0.05%); split: -0.05%, +0.00%
CodeSize: 122095820 -> 121795112 (-0.25%); split: -0.25%, +0.00%
VGPRs: 1025800 -> 1025480 (-0.03%)
Latency: 202876235 -> 203076272 (+0.10%); split: -0.04%, +0.14%
InvThroughput: 47599930 -> 47596113 (-0.01%); split: -0.03%, +0.02%
VClause: 475271 -> 475439 (+0.04%); split: -0.02%, +0.05%
SClause: 700679 -> 700629 (-0.01%); split: -0.01%, +0.01%
Copies: 1628498 -> 1618165 (-0.63%); split: -0.64%, +0.01%
Branches: 567199 -> 567216 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 952134 -> 952043 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 846614 -> 846272 (-0.04%)
VALU: 15572374 -> 15564050 (-0.05%); split: -0.05%, +0.00%
SALU: 2423329 -> 2421319 (-0.08%); split: -0.08%, +0.00%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Foz-DB Navi31:
Totals from 13049 (16.44% of 79395) affected shaders:
MaxWaves: 357242 -> 357268 (+0.01%)
Instrs: 19955572 -> 19944106 (-0.06%); split: -0.06%, +0.00%
CodeSize: 105689464 -> 105454348 (-0.22%); split: -0.23%, +0.00%
VGPRs: 765744 -> 764952 (-0.10%); split: -0.11%, +0.00%
Latency: 179063640 -> 179141591 (+0.04%); split: -0.02%, +0.07%
InvThroughput: 27978134 -> 27971318 (-0.02%); split: -0.03%, +0.01%
VClause: 386791 -> 386826 (+0.01%); split: -0.02%, +0.03%
SClause: 598113 -> 598106 (-0.00%); split: -0.01%, +0.01%
Copies: 1393111 -> 1383102 (-0.72%); split: -0.73%, +0.01%
Branches: 498533 -> 498535 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 573310 -> 573236 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 591459 -> 591043 (-0.07%)
VALU: 11623734 -> 11615755 (-0.07%); split: -0.07%, +0.00%
SALU: 1962055 -> 1960005 (-0.10%); split: -0.11%, +0.00%
VOPD: 3544 -> 3566 (+0.62%); split: +0.73%, -0.11%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:39 +00:00
Georg Lehmann
8de89f4ffb
aco/optimizer: add alu_opt_info helpers
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:38 +00:00
Georg Lehmann
f436844f39
aco/optimizer: add a new dce helper
...
Will be more correct for future constant use cases.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:38 +00:00
Georg Lehmann
0d8219f367
aco/tests: allow even more literals
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:37 +00:00
Georg Lehmann
6eac72088c
aco/gfx10+: only work around split execution of uniform LDS in WGP mode
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
LDS instructions from one CU won't split the execution of other LDS instruction
on the same CU.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31630 >
2025-10-13 10:22:22 +00:00
Georg Lehmann
c13caa5e5f
aco: fix global_atomic_swap offset overflow check
...
Fixes: d7dcd81c77 ("aco/gfx6: allow both constant and gpr offset for global with sgpr address")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821 >
2025-10-13 09:41:41 +00:00
Marek Olšák
3fe651f607
nir: remove load_smem_amd
...
replaced by load_global_amd + ACCESS_SMEM_AMD
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936 >
2025-10-08 08:54:11 +00:00
Rhys Perry
20af16b4d8
aco: use MTBUF for 64-bit atomic load/store
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.
From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:31 +00:00
Rhys Perry
f905acfada
aco: remove barrier acquire/release workaround
...
This existed since ccfe9813fb because NIR
had no atomic loads/stores. This is no longer the case.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:31 +00:00
Rhys Perry
271b135b03
aco: set atomic semantic for atomic load/store
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:30 +00:00
Rhys Perry
74b807cf58
aco: only workaround load tearing for atomic loads
...
For non-atomic loads, this situation would require a data race.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:30 +00:00
Georg Lehmann
d514696a0c
aco/isel: support nir_op_atomic_isub
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702 >
2025-10-07 14:07:56 +00:00
Georg Lehmann
a173e51541
aco/insert_waitcnt: don't merge waitcnts for LDS clauses
...
We form LDS clauses because heavily interleaving LDS and VALU leads to false
dependencies. But LDS is completely uncached, so splitting the clause with
waitcnts shouldn't hurt, it might even be beneficial because the first
LDS store can start earlier.
Foz-DB Navi48:
Totals from 170 (0.21% of 80287) affected shaders:
Instrs: 239633 -> 240148 (+0.21%)
CodeSize: 1276584 -> 1278532 (+0.15%)
Latency: 3788507 -> 3789876 (+0.04%); split: -0.01%, +0.04%
InvThroughput: 841637 -> 841694 (+0.01%); split: -0.01%, +0.02%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37701 >
2025-10-07 13:12:45 +00:00
Rhys Perry
dfa8ac6b91
aco: remove buffer_load_lds instructions
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
They don't exist
See https://github.com/llvm/llvm-project/pull/132916
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14041
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37716 >
2025-10-07 09:50:26 +00:00
abdelhadi
5c82a3e114
aco: fix debug info offset
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: abdelhadi <abdelhadims@icloud.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37244 >
2025-10-02 13:38:56 +00:00
Georg Lehmann
9533e7cdae
aco/optimizer: fix incorrect operand order assumption for neg(mul) opt
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The code that labels instructions doesn't care about the order either.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14013
Cc: mesa-stable
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37643 >
2025-10-01 20:52:12 +00:00
Georg Lehmann
8343e45467
aco/lower_branches: update branch hints after changing jump targets
...
Fixes: 13ad3db43f ("aco/lower_branches: implement try_remove_simple_block() in lower_branches()")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37552 >
2025-09-26 15:11:26 +00:00
Georg Lehmann
cc08786689
aco: use maximum RT vgpr_limit that doesn't reduce wave count
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
144 instead of 132 with 5 waves, in practice.
Foz-DB Navi31:
Totals from 33 (0.04% of 80273) affected shaders:
Instrs: 3266241 -> 3261329 (-0.15%)
CodeSize: 16885356 -> 16860088 (-0.15%)
VGPRs: 4356 -> 4752 (+9.09%)
SpillVGPRs: 2504 -> 1535 (-38.70%)
Scratch: 264704 -> 216320 (-18.28%)
Latency: 18445909 -> 18395904 (-0.27%)
InvThroughput: 3689182 -> 3679182 (-0.27%)
VClause: 85171 -> 84595 (-0.68%)
SClause: 59365 -> 59320 (-0.08%); split: -0.08%, +0.01%
Copies: 260528 -> 259113 (-0.54%); split: -0.59%, +0.05%
Branches: 92537 -> 92519 (-0.02%)
VALU: 1937426 -> 1935925 (-0.08%); split: -0.08%, +0.01%
SALU: 393075 -> 393047 (-0.01%); split: -0.01%, +0.01%
VMEM: 147914 -> 146003 (-1.29%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37548 >
2025-09-26 08:45:05 +00:00
Georg Lehmann
8e03505782
aco: don't insert s_sendmsg dealloc_vgprs with little vgprs allocated
...
Reduces message bus traffic when the benefit is small.
Foz-DB Navi31:
Totals from 3752 (4.67% of 80273) affected shaders:
Instrs: 1999755 -> 1992249 (-0.38%)
CodeSize: 10531824 -> 10501800 (-0.29%)
Latency: 14935247 -> 14935147 (-0.00%)
InvThroughput: 5976053 -> 5975262 (-0.01%)
Foz-DB Navi33:
Totals from 2614 (3.26% of 80273) affected shaders:
Instrs: 969475 -> 964247 (-0.54%)
CodeSize: 5171240 -> 5150328 (-0.40%)
Latency: 7891519 -> 7891434 (-0.00%)
InvThroughput: 4815008 -> 4814287 (-0.01%); split: -0.01%, +0.00%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508 >
2025-09-26 07:51:02 +00:00
Georg Lehmann
27cc6317f9
aco: dealloc vgprs if there is a pending non scratch store and no pending export
...
Because s_sendmsg dealloc_vgprs waits for every counter except vs_count,
and the message bus has limited throughput, we should only insert the dealloc
when we know that it's beneficial.
Foz-DB Navi31:
Totals from 5280 (6.58% of 80273) affected shaders:
Instrs: 4186851 -> 4197416 (+0.25%)
CodeSize: 21910004 -> 21952264 (+0.19%)
Latency: 31679067 -> 31679173 (+0.00%)
InvThroughput: 9182625 -> 9183417 (+0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508 >
2025-09-26 07:51:02 +00:00
Georg Lehmann
26e041e821
aco: remove existing dealloc_vgprs use
...
We didn't consider that s_sendmsg dealloc_vgpr waits for all counters
expect vscnt.
Foz-DB Navi31:
Totals from 74090 (92.52% of 80084) affected shaders:
Instrs: 36031071 -> 35853573 (-0.49%)
CodeSize: 189233756 -> 188523764 (-0.38%)
Latency: 222378318 -> 222374890 (-0.00%)
InvThroughput: 33366893 -> 33362457 (-0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508 >
2025-09-26 07:51:02 +00:00
Georg Lehmann
cf30742a66
radv,aco: don't end monolithic ray tracing with unconditional terminate
...
The terminate requires more code and blocks us from deallocating VGPRs early.
Foz-DB Navi31:
Totals from 63 (0.08% of 80273) affected shaders:
Instrs: 3372702 -> 3372467 (-0.01%)
CodeSize: 17441676 -> 17440736 (-0.01%)
Latency: 19763447 -> 19763288 (-0.00%)
InvThroughput: 3860502 -> 3860478 (-0.00%)
Branches: 96204 -> 96141 (-0.07%)
SALU: 406648 -> 406549 (-0.02%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37542 >
2025-09-25 15:35:55 +00:00
Daniel Schürmann
d041640b88
aco: remove excess offset handling for load/store_shared
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37453 >
2025-09-24 14:28:25 +00:00
Daniel Schürmann
dbb20a4e23
aco/optimizer: remove DS offset optimization
...
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37453 >
2025-09-24 14:28:24 +00:00
Natalie Vock
f0d3d0ad21
aco/scheduler: Bail early on unreorderable instructions
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37212 >
2025-09-22 11:13:50 +00:00
Rhys Perry
d6ed68212c
aco: fix SGPR 8-bit nir_op_vec with mixed constant and non-constant
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
For example, vec2(non_const, const)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 04e3d7ad93 ("aco: improve nir_op_vec with constant operands")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13911
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37405 >
2025-09-18 12:37:19 +00:00
Rhys Perry
8931672eef
aco: workaround load tearing for load_shared2_amd
...
This probably has the same issue as load_shared.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 04956d54ce ("aco: force uniform result for LDS load with uniform address if it can be non uniform")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37417 >
2025-09-17 11:29:21 +00:00
Rhys Perry
81df517553
aco: avoid unaligned offsets when selecting load_global_amd
...
SMEM instructions mask off the low bits for the base and offset sources
both before and after they're added. However, NIR expects ACO to only
care about the alignment of the final address.
fossil-db (gfx1201):
Totals from 21 (0.03% of 79839) affected shaders:
Instrs: 229780 -> 229876 (+0.04%)
CodeSize: 1267724 -> 1268080 (+0.03%)
Latency: 2800924 -> 2800978 (+0.00%)
InvThroughput: 520250 -> 520256 (+0.00%)
Copies: 27878 -> 27876 (-0.01%); split: -0.01%, +0.00%
SALU: 29591 -> 29643 (+0.18%)
fossil-db (polaris10):
Totals from 3 (0.00% of 62201) affected shaders:
Latency: 2651 -> 2652 (+0.04%)
InvThroughput: 662 -> 663 (+0.15%)
PreSGPRs: 51 -> 54 (+5.88%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301 >
2025-09-17 09:15:46 +00:00
Rhys Perry
6d71521ecd
aco: avoid wraparound for smem global loads with both offsets
...
fossil-db (gfx1201):
Totals from 296 (0.37% of 79839) affected shaders:
Instrs: 382593 -> 380149 (-0.64%)
CodeSize: 1981452 -> 1970988 (-0.53%); split: -0.53%, +0.00%
Latency: 1575286 -> 1574252 (-0.07%)
InvThroughput: 215839 -> 215818 (-0.01%)
SClause: 8679 -> 8677 (-0.02%); split: -0.03%, +0.01%
Copies: 19642 -> 19641 (-0.01%); split: -0.03%, +0.02%
PreSGPRs: 14521 -> 14515 (-0.04%)
SALU: 57097 -> 55718 (-2.42%)
fossil-db (polaris10):
Totals from 30 (0.05% of 62201) affected shaders:
Instrs: 23341 -> 23379 (+0.16%); split: -0.01%, +0.18%
CodeSize: 121316 -> 121516 (+0.16%); split: -0.01%, +0.17%
SGPRs: 2368 -> 2384 (+0.68%)
Latency: 235153 -> 235374 (+0.09%); split: -0.01%, +0.11%
InvThroughput: 92582 -> 92566 (-0.02%)
SClause: 616 -> 619 (+0.49%)
Copies: 2717 -> 2720 (+0.11%)
PreSGPRs: 1204 -> 1213 (+0.75%)
SALU: 3654 -> 3692 (+1.04%); split: -0.08%, +1.12%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.2
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301 >
2025-09-17 09:15:46 +00:00
Georg Lehmann
714a149396
nir: remove unsigned upper bound config
...
All config information is now either in nir->info or nir->options.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:04 +00:00
Georg Lehmann
bb67dae12d
nir/uub: remove max_workgroup_size from config
...
For most hardware, this is the same as max invocations in the workgroup.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:04 +00:00
Georg Lehmann
f3c08c9d27
nir/uub: use shader_info subgroup size
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:04 +00:00
Georg Lehmann
d029686e20
aco/isel: fix output args init stack buffer overflow
...
BITSET range functions include the end of the range.
Fixes: eb249bb18e ("aco: Only fix used variables to registers")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:03 +00:00
nihui
849344dc08
aco: set program->dev.fused_mad_mix=true for GFX940
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35655 >
2025-09-16 07:02:32 +00:00
nihui
8c4f0b1353
aco: gfx940 has no mad f32 instruction
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35655 >
2025-09-16 07:02:32 +00:00