Daniel Schürmann
eecd1c020d
amd: keep ac_shader_config::lds_size unaligned
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:09 +00:00
Daniel Schürmann
fe6ff6d1ef
aco: remove DeviceInfo::lds_encoding_granule and DeviceInfo::lds_alloc_granule
...
Use utility functions instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:08 +00:00
Daniel Schürmann
11db02d5d9
radv: calculate LDS allocation requirements independently from the compiler
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
Daniel Schürmann
b651234414
amd: change ac_shader_config::lds_size to bytes
...
We still keep it aligned to allocation granularity.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577 >
2025-10-15 11:20:07 +00:00
Daniel Schürmann
d0b87a0d5f
ac/nir_flag_smem_for_loads: call divergence analysis internally
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Also don't flag more SMEM instructions (in ACO) after the last
call to ac_nir_lower_mem_access_bit_sizes().
Totals from 75 (0.09% of 79839) affected shaders: (Navi48)
Instrs: 191246 -> 189960 (-0.67%)
CodeSize: 996840 -> 985976 (-1.09%)
Latency: 3066184 -> 2945500 (-3.94%)
InvThroughput: 355373 -> 353106 (-0.64%); split: -0.66%, +0.02%
SClause: 4848 -> 4699 (-3.07%)
Copies: 13827 -> 13925 (+0.71%); split: -0.07%, +0.78%
Branches: 5176 -> 5003 (-3.34%)
PreSGPRs: 6222 -> 6272 (+0.80%)
VALU: 108934 -> 108993 (+0.05%); split: -0.00%, +0.06%
SALU: 31679 -> 31210 (-1.48%); split: -1.51%, +0.03%
SMEM: 7158 -> 6739 (-5.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:12 +00:00
Daniel Schürmann
8ff44f17ef
amd/lower_mem_access_bit_sizes: also use SMEM for subdword loads
...
We can simply extract from the loaded dwords as per
nir_lower_mem_access_bit_sizes() lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843 >
2025-10-14 16:33:11 +00:00
Samuel Pitoiset
bc32286e5b
radv: declare a new user SGPR for dynamic descriptors
...
To move them out of push constants.
fossils-db (GFX1201):
Totals from 20700 (25.99% of 79646) affected shaders:
Instrs: 14375624 -> 14370051 (-0.04%); split: -0.07%, +0.03%
CodeSize: 76746128 -> 76723772 (-0.03%); split: -0.05%, +0.02%
Latency: 74103586 -> 74113651 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 11908817 -> 11908798 (-0.00%); split: -0.00%, +0.00%
VClause: 249605 -> 249607 (+0.00%); split: -0.00%, +0.00%
SClause: 337914 -> 337772 (-0.04%); split: -0.08%, +0.04%
Copies: 843585 -> 839233 (-0.52%); split: -0.62%, +0.10%
PreSGPRs: 836283 -> 837260 (+0.12%)
SALU: 1790713 -> 1786374 (-0.24%); split: -0.29%, +0.05%
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768 >
2025-10-14 15:34:43 +00:00
Georg Lehmann
58163f65f0
aco/optimizer: rework packed fneg opt
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:40 +00:00
Georg Lehmann
6eac72088c
aco/gfx10+: only work around split execution of uniform LDS in WGP mode
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
LDS instructions from one CU won't split the execution of other LDS instruction
on the same CU.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31630 >
2025-10-13 10:22:22 +00:00
Georg Lehmann
c13caa5e5f
aco: fix global_atomic_swap offset overflow check
...
Fixes: d7dcd81c77 ("aco/gfx6: allow both constant and gpr offset for global with sgpr address")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821 >
2025-10-13 09:41:41 +00:00
Marek Olšák
3fe651f607
nir: remove load_smem_amd
...
replaced by load_global_amd + ACCESS_SMEM_AMD
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936 >
2025-10-08 08:54:11 +00:00
Rhys Perry
20af16b4d8
aco: use MTBUF for 64-bit atomic load/store
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.
From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:31 +00:00
Rhys Perry
f905acfada
aco: remove barrier acquire/release workaround
...
This existed since ccfe9813fb because NIR
had no atomic loads/stores. This is no longer the case.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:31 +00:00
Rhys Perry
271b135b03
aco: set atomic semantic for atomic load/store
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:30 +00:00
Rhys Perry
74b807cf58
aco: only workaround load tearing for atomic loads
...
For non-atomic loads, this situation would require a data race.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602 >
2025-10-07 17:41:30 +00:00
Georg Lehmann
d514696a0c
aco/isel: support nir_op_atomic_isub
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702 >
2025-10-07 14:07:56 +00:00
Georg Lehmann
cf30742a66
radv,aco: don't end monolithic ray tracing with unconditional terminate
...
The terminate requires more code and blocks us from deallocating VGPRs early.
Foz-DB Navi31:
Totals from 63 (0.08% of 80273) affected shaders:
Instrs: 3372702 -> 3372467 (-0.01%)
CodeSize: 17441676 -> 17440736 (-0.01%)
Latency: 19763447 -> 19763288 (-0.00%)
InvThroughput: 3860502 -> 3860478 (-0.00%)
Branches: 96204 -> 96141 (-0.07%)
SALU: 406648 -> 406549 (-0.02%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37542 >
2025-09-25 15:35:55 +00:00
Daniel Schürmann
d041640b88
aco: remove excess offset handling for load/store_shared
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37453 >
2025-09-24 14:28:25 +00:00
Rhys Perry
d6ed68212c
aco: fix SGPR 8-bit nir_op_vec with mixed constant and non-constant
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
For example, vec2(non_const, const)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 04e3d7ad93 ("aco: improve nir_op_vec with constant operands")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13911
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37405 >
2025-09-18 12:37:19 +00:00
Rhys Perry
8931672eef
aco: workaround load tearing for load_shared2_amd
...
This probably has the same issue as load_shared.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 04956d54ce ("aco: force uniform result for LDS load with uniform address if it can be non uniform")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37417 >
2025-09-17 11:29:21 +00:00
Rhys Perry
81df517553
aco: avoid unaligned offsets when selecting load_global_amd
...
SMEM instructions mask off the low bits for the base and offset sources
both before and after they're added. However, NIR expects ACO to only
care about the alignment of the final address.
fossil-db (gfx1201):
Totals from 21 (0.03% of 79839) affected shaders:
Instrs: 229780 -> 229876 (+0.04%)
CodeSize: 1267724 -> 1268080 (+0.03%)
Latency: 2800924 -> 2800978 (+0.00%)
InvThroughput: 520250 -> 520256 (+0.00%)
Copies: 27878 -> 27876 (-0.01%); split: -0.01%, +0.00%
SALU: 29591 -> 29643 (+0.18%)
fossil-db (polaris10):
Totals from 3 (0.00% of 62201) affected shaders:
Latency: 2651 -> 2652 (+0.04%)
InvThroughput: 662 -> 663 (+0.15%)
PreSGPRs: 51 -> 54 (+5.88%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301 >
2025-09-17 09:15:46 +00:00
Rhys Perry
6d71521ecd
aco: avoid wraparound for smem global loads with both offsets
...
fossil-db (gfx1201):
Totals from 296 (0.37% of 79839) affected shaders:
Instrs: 382593 -> 380149 (-0.64%)
CodeSize: 1981452 -> 1970988 (-0.53%); split: -0.53%, +0.00%
Latency: 1575286 -> 1574252 (-0.07%)
InvThroughput: 215839 -> 215818 (-0.01%)
SClause: 8679 -> 8677 (-0.02%); split: -0.03%, +0.01%
Copies: 19642 -> 19641 (-0.01%); split: -0.03%, +0.02%
PreSGPRs: 14521 -> 14515 (-0.04%)
SALU: 57097 -> 55718 (-2.42%)
fossil-db (polaris10):
Totals from 30 (0.05% of 62201) affected shaders:
Instrs: 23341 -> 23379 (+0.16%); split: -0.01%, +0.18%
CodeSize: 121316 -> 121516 (+0.16%); split: -0.01%, +0.17%
SGPRs: 2368 -> 2384 (+0.68%)
Latency: 235153 -> 235374 (+0.09%); split: -0.01%, +0.11%
InvThroughput: 92582 -> 92566 (-0.02%)
SClause: 616 -> 619 (+0.49%)
Copies: 2717 -> 2720 (+0.11%)
PreSGPRs: 1204 -> 1213 (+0.75%)
SALU: 3654 -> 3692 (+1.04%); split: -0.08%, +1.12%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.2
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301 >
2025-09-17 09:15:46 +00:00
Georg Lehmann
714a149396
nir: remove unsigned upper bound config
...
All config information is now either in nir->info or nir->options.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:04 +00:00
Georg Lehmann
bb67dae12d
nir/uub: remove max_workgroup_size from config
...
For most hardware, this is the same as max invocations in the workgroup.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:04 +00:00
Georg Lehmann
f3c08c9d27
nir/uub: use shader_info subgroup size
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:04 +00:00
Georg Lehmann
d029686e20
aco/isel: fix output args init stack buffer overflow
...
BITSET range functions include the end of the range.
Fixes: eb249bb18e ("aco: Only fix used variables to registers")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361 >
2025-09-16 09:24:03 +00:00
Natalie Vock
3667a7b687
aco: Add call info
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34531 >
2025-09-15 17:16:20 +00:00
Samuel Pitoiset
decf9af472
radv/rt: only use one user SGPR for the traversal shader addr
...
All shaders are allocated in the 32-bit addr space. To avoid an issue
with alignment, and also for future work, there is an unused user SGPR.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37133 >
2025-09-03 05:53:41 +00:00
Marek Olšák
4c87d002e3
aco,radeonsi: expand 32-bit shader arg pointers to 64 bits for ACO
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101 >
2025-08-30 15:04:32 -04:00
Marek Olšák
7d5288b5b7
aco: check that global addresses are 64bit, apply_nuw_to_ssa to global_amd/smem
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101 >
2025-08-30 15:04:32 -04:00
Georg Lehmann
38e32e39a9
aco: never end wqm early for vmem
...
The remaining cases where disable_wqm isn't set are either uniform loads
or loads that influence control flow. In the first case, not ending WQM early
is free, and in the second case it's likely still better to not block scheduling.
Foz-DB GFX1201:
Totals from 483 (0.60% of 80287) affected shaders:
MaxWaves: 12654 -> 12642 (-0.09%)
Instrs: 485234 -> 484830 (-0.08%); split: -0.19%, +0.11%
CodeSize: 2630876 -> 2629184 (-0.06%); split: -0.15%, +0.08%
VGPRs: 29980 -> 30004 (+0.08%)
Latency: 4908015 -> 4813167 (-1.93%); split: -1.95%, +0.02%
InvThroughput: 751059 -> 748582 (-0.33%); split: -0.35%, +0.02%
VClause: 8723 -> 8705 (-0.21%); split: -0.30%, +0.09%
SClause: 11085 -> 10986 (-0.89%); split: -1.45%, +0.56%
Copies: 25155 -> 25183 (+0.11%); split: -0.26%, +0.37%
Branches: 6203 -> 6204 (+0.02%)
PreSGPRs: 23763 -> 23780 (+0.07%)
VALU: 296576 -> 296593 (+0.01%); split: -0.01%, +0.02%
SALU: 49095 -> 49416 (+0.65%); split: -0.04%, +0.69%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785 >
2025-08-28 06:29:04 +00:00
Georg Lehmann
3d190f2e9c
aco: implement skip_helpers for load_global_amd
...
Foz-DB GFX1201:
Totals from 119 (0.15% of 80287) affected shaders:
Instrs: 212449 -> 213452 (+0.47%)
CodeSize: 1120656 -> 1124708 (+0.36%)
Latency: 2854370 -> 2855772 (+0.05%); split: -0.02%, +0.07%
InvThroughput: 586142 -> 586210 (+0.01%); split: -0.00%, +0.01%
VClause: 3556 -> 3656 (+2.81%)
SClause: 2708 -> 2710 (+0.07%)
Copies: 14410 -> 14509 (+0.69%)
PreSGPRs: 6810 -> 6850 (+0.59%); split: -0.12%, +0.70%
VALU: 135945 -> 135942 (-0.00%); split: -0.01%, +0.01%
SALU: 22147 -> 23121 (+4.40%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785 >
2025-08-28 06:29:04 +00:00
Georg Lehmann
ee7069f875
aco: implement skip_helpers for load_scratch
...
Foz-DB GFX1201:
Totals from 2 (0.00% of 80287) affected shaders:
Instrs: 4016 -> 4054 (+0.95%)
CodeSize: 22104 -> 22256 (+0.69%)
Latency: 17123 -> 17129 (+0.04%)
Copies: 406 -> 415 (+2.22%)
SALU: 323 -> 353 (+9.29%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785 >
2025-08-28 06:29:04 +00:00
Georg Lehmann
2bfd8918a5
aco: implement skip_helpers for load_ssbo/ubo/constant
...
Foz-DB GFX1201:
Totals from 6676 (8.32% of 80287) affected shaders:
Instrs: 8786161 -> 8829091 (+0.49%); split: -0.01%, +0.50%
CodeSize: 47141800 -> 47320480 (+0.38%); split: -0.01%, +0.39%
VGPRs: 376624 -> 376600 (-0.01%)
SpillSGPRs: 1251 -> 1250 (-0.08%)
Latency: 99716626 -> 99642361 (-0.07%); split: -0.11%, +0.04%
InvThroughput: 14893179 -> 14898323 (+0.03%); split: -0.01%, +0.04%
VClause: 149425 -> 153539 (+2.75%); split: -0.04%, +2.79%
SClause: 251247 -> 251842 (+0.24%); split: -0.06%, +0.30%
Copies: 580304 -> 586424 (+1.05%); split: -0.21%, +1.26%
Branches: 163014 -> 163013 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 356548 -> 357109 (+0.16%); split: -0.18%, +0.33%
VALU: 5149733 -> 5149797 (+0.00%); split: -0.00%, +0.00%
SALU: 1082176 -> 1122718 (+3.75%); split: -0.06%, +3.80%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785 >
2025-08-28 06:29:03 +00:00
Georg Lehmann
bdae511b18
aco: implement skip_helpers for image loads
...
Foz-DB GFX1201:
Totals from 5 (0.01% of 80287) affected shaders:
Instrs: 1406 -> 1417 (+0.78%)
CodeSize: 8012 -> 8056 (+0.55%)
Latency: 7279 -> 7282 (+0.04%)
Copies: 84 -> 85 (+1.19%)
SALU: 170 -> 180 (+5.88%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785 >
2025-08-28 06:29:02 +00:00
Georg Lehmann
bf453a7c6a
aco/isel: add init_disable_wqm helper
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785 >
2025-08-28 06:29:01 +00:00
Konstantin Seurer
9df7b48d2f
nir: Use nir_def_as_* in more places
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746 >
2025-08-24 14:03:09 +00:00
Marek Olšák
3aadae22ad
nir: make nir_block::predecessors & dom_frontier sets non-malloc'd
...
We can just place the set structures inside nir_block.
This reduces the number of ralloc calls by 6.7% when compiling Heaven
shaders with radeonsi+ACO using a release build (i.e. not including
nir_validate set allocations, which are also removed).
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728 >
2025-08-21 06:13:48 +00:00
Georg Lehmann
639b91bb48
aco/isel: fix vectorized i2i16 with 8bit vec8 source
...
The extract index is in dwords, not bytes.
Fixes: 92d433c54a ("aco: vectorize conversions from 8bit to 16bit")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36869 >
2025-08-20 10:13:22 +00:00
Daniel Schürmann
7e63251d1f
aco/isel: refactor store_shared() by directly matching NIR intrinsics to ACO opcodes
...
Totals from 1435 (1.80% of 79839) affected shaders: (Navi48)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:15 +00:00
Daniel Schürmann
1fde289539
aco/isel: refactor load_shared() by directly matching NIR intrinsics to ACO opcodes
...
Totals from 3 (0.00% of 79839) affected shaders: (Navi48)
Instrs: 700 -> 698 (-0.29%)
CodeSize: 3860 -> 3852 (-0.21%)
Latency: 2351 -> 2349 (-0.09%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:15 +00:00
Daniel Schürmann
4632ee4c37
aco/isel: rename emit_readfirstlane() -> emit_vector_as_uniform()
...
Also allow to use p_as_uniform and improve vector splitting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00
Daniel Schürmann
26595577b3
aco/isel: allow for large 8-bit vectors in extract_8_16_bit_sgpr_element()
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133 >
2025-08-19 14:28:14 +00:00
Georg Lehmann
9ed94371f7
amd: stop using custom gl_access_qualifier for access type
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764 >
2025-08-15 08:26:10 +00:00
Georg Lehmann
f17cb6b714
amd: replace ACCESS_TYPE_SMEM with ACCESS_SMEM_AMD
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764 >
2025-08-15 08:26:10 +00:00
Georg Lehmann
fc53cf146c
aco: disable wqm for sampled buffer loads when not needed
...
Foz-DB GFX1201:
Totals from 318 (0.40% of 80287) affected shaders:
Instrs: 313039 -> 314064 (+0.33%); split: -0.00%, +0.33%
CodeSize: 1684104 -> 1688212 (+0.24%); split: -0.00%, +0.24%
VGPRs: 15120 -> 15144 (+0.16%)
Latency: 2515023 -> 2518610 (+0.14%); split: -0.06%, +0.20%
InvThroughput: 447468 -> 447615 (+0.03%); split: -0.02%, +0.05%
VClause: 4866 -> 4914 (+0.99%)
SClause: 6564 -> 6559 (-0.08%); split: -0.09%, +0.02%
Copies: 23577 -> 23673 (+0.41%); split: -0.04%, +0.45%
PreSGPRs: 16019 -> 16029 (+0.06%)
VALU: 172157 -> 172143 (-0.01%)
SALU: 52816 -> 53867 (+1.99%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:47 +00:00
Georg Lehmann
883b1ca364
aco: disable wqm for tex loads when not needed
...
By only executing VMEM loads for lanes where the result is used, we can save
bandwidth.
The NIR pass only handles tex for now, but those are most common anyway.
We can extend it handle image/ssbo/ubo/global loads in the future.
Foz-DB GFX1201:
Totals from 32633 (40.66% of 80251) affected shaders:
Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46%
CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81%
VGPRs: 1481868 -> 1481712 (-0.01%)
SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45%
Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30%
InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12%
VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13%
SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51%
Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96%
Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1330303 -> 1416540 (+6.48%)
PreVGPRs: 964864 -> 964863 (-0.00%)
VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01%
SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
7159fd21f8
aco: don't restrict vmem load scheduling by inserting p_end_wqm early
...
Foz-DB GFX1201:
Totals from 7 (0.01% of 80251) affected shaders:
Instrs: 703 -> 729 (+3.70%)
CodeSize: 4032 -> 4136 (+2.58%)
Latency: 5840 -> 4715 (-19.26%)
InvThroughput: 441 -> 405 (-8.16%)
Copies: 61 -> 67 (+9.84%)
PreSGPRs: 216 -> 218 (+0.93%)
SALU: 93 -> 113 (+21.51%)
When reordered after the next commit:
Foz-DB GFX1201:
Totals from 1609 (2.00% of 80251) affected shaders:
MaxWaves: 47984 -> 47986 (+0.00%)
Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50%
CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41%
VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30%
SpillSGPRs: 182 -> 184 (+1.10%)
Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15%
InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06%
VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32%
SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66%
Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16%
Branches: 21599 -> 21727 (+0.59%)
PreSGPRs: 69963 -> 70732 (+1.10%)
VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03%
SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
c1b29174b4
aco: use a smaller wqm section for strict_wqm sampling
...
It's only important that the coordinate is created in WQM,
the sample itself doesn't care.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
11cee3d634
aco: use new disable_wqm for p_dual_src_export_gfx11
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00