Commit graph

145 commits

Author SHA1 Message Date
Natalie Vock
8bc5fdef53 aco: Remove unused p_reload_preserved def
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38281>
2025-12-08 19:12:52 +00:00
Marek Olšák
2c9995a94f ac/nir: move aco_nir_op_supports_packed_math_16bit here
aco_nir_op_supports_packed_math_16bit currently can't be used by amd/common
because tests don't link with ACO, so linking would fail, but we want
to move the nir_opt_vectorize callback here that uses it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38603>
2025-11-28 20:16:10 +00:00
Georg Lehmann
0f7a1ce23e aco/optimizer: some more mul opts
Foz-DB Navi48:
Totals from 1650 (2.00% of 82419) affected shaders:
Instrs: 975716 -> 970609 (-0.52%); split: -0.53%, +0.00%
CodeSize: 4986260 -> 4982916 (-0.07%); split: -0.09%, +0.02%
Latency: 2795394 -> 2793211 (-0.08%); split: -0.09%, +0.01%
InvThroughput: 620892 -> 620914 (+0.00%); split: -0.00%, +0.01%
VClause: 18773 -> 18729 (-0.23%)
SClause: 13219 -> 13218 (-0.01%)
Copies: 53619 -> 53620 (+0.00%); split: -0.01%, +0.01%
VALU: 592094 -> 592096 (+0.00%); split: -0.00%, +0.00%
SALU: 96586 -> 93532 (-3.16%); split: -3.17%, +0.00%

Foz-DB Navi21:
Totals from 1647 (2.00% of 82387) affected shaders:
Instrs: 1104100 -> 1100149 (-0.36%); split: -0.36%, +0.00%
CodeSize: 5631092 -> 5637668 (+0.12%); split: -0.00%, +0.12%
Latency: 3503029 -> 3501621 (-0.04%); split: -0.05%, +0.01%
InvThroughput: 1088494 -> 1088495 (+0.00%); split: -0.00%, +0.00%
VClause: 20898 -> 20885 (-0.06%)
Copies: 72641 -> 72635 (-0.01%); split: -0.02%, +0.01%
VALU: 725593 -> 725592 (-0.00%); split: -0.00%, +0.00%
SALU: 139046 -> 135175 (-2.78%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38530>
2025-11-25 11:49:17 +00:00
Georg Lehmann
3a175b54a4 aco,nir: support subdword v_permlane_b16
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
2025-11-17 23:33:59 +00:00
Marek Olšák
e372365cf4 nir: rename nir_copy_prop -> nir_opt_copy_prop
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38411>
2025-11-15 02:16:38 +00:00
Konstantin Seurer
de32f9275f treewide: add & use parent instr helpers
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:

* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*

Plus nir_def_instr() where there's no more suitable helper.

Also an existing helper is renamed to unify all the names, while we're
churning the tree:

* nir_src_as_alu_instr -> nir_src_as_alu

..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.

Acked-by: Marek Olšák <maraeo@gmail.com>

---

To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.

Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
2025-11-12 21:22:13 +00:00
Daniel Schürmann
5682e39e6b amd: enable load/store_shared2_amd for GFX6
Totals from 1509 (2.43% of 62200) affected shaders: (Pitcairn)

MaxWaves: 8078 -> 8057 (-0.26%); split: +0.09%, -0.35%
Instrs: 977182 -> 951746 (-2.60%); split: -2.62%, +0.02%
CodeSize: 4951468 -> 4758192 (-3.90%); split: -3.92%, +0.01%
SGPRs: 76704 -> 76696 (-0.01%)
VGPRs: 81092 -> 81068 (-0.03%); split: -0.34%, +0.31%
Latency: 11663237 -> 11526070 (-1.18%); split: -1.19%, +0.01%
InvThroughput: 6198904 -> 6114851 (-1.36%); split: -1.43%, +0.07%
VClause: 26656 -> 26655 (-0.00%); split: -0.05%, +0.05%
SClause: 22304 -> 22307 (+0.01%); split: -0.03%, +0.04%
Copies: 107503 -> 109564 (+1.92%); split: -0.23%, +2.15%
Branches: 22917 -> 22918 (+0.00%)
PreSGPRs: 42246 -> 42242 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 64561 -> 64761 (+0.31%); split: -0.01%, +0.32%
VALU: 600285 -> 601139 (+0.14%); split: -0.26%, +0.40%
SALU: 130622 -> 130851 (+0.18%); split: -0.16%, +0.33%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37682>
2025-11-11 17:12:17 +00:00
Natalie Vock
f0c613765c aco: Add preload_preserved pseudo instruction
These are helper instructions for the spill_preserved pass to insert
reloads for registers that are preserved by the ABI, yet
clobbered by the callee shader.

There is one p_reload_preserved instruction at the end of each block.
This allows us to insert reloads early, to alleviate the high latency of
scratch reloads.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37381>
2025-11-06 12:09:39 +00:00
Samuel Pitoiset
a0d607bfdb radv,aco: wait for all VMEM loads when the prolog loads large 64-bit attributes
Not the most optimal solution but 64-bit vertex attributes are rarely
used. Could still revisit if we find a real use case that matters.

This fixes recent VKCTS coverage:

dEQP-VK.pipeline.fast_linked_library.vertex_input.component_mismatch.r64g64b64.*_to_dvec2
dEQP-VK.pipeline.shader_object_.*.vertex_input.component_mismatch.r64g64b64.*_to_dvec2

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14243
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38237>
2025-11-05 07:26:45 +00:00
Samuel Pitoiset
ba5bf81aa2 aco: fix reserving VGPRs for 64-bit attributes in VS prologs
Otherwise the fetch index would be overwritten if the attribute format
is 64-bit and more than 2 components are loaded.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14242
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38237>
2025-11-05 07:26:45 +00:00
Georg Lehmann
0f54136730 aco/isel: emit vop2 v_lshlrev_b64 for gfx12+
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156>
2025-10-31 08:31:03 +00:00
Georg Lehmann
7ac67e2711 aco/isel: emit vop2 v_max_f64 for gfx12+
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156>
2025-10-31 08:31:03 +00:00
Georg Lehmann
8397b91934 aco/isel: emit vop2 v_min_f64 for gfx12+
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156>
2025-10-31 08:31:02 +00:00
Georg Lehmann
2e120d4e26 aco/isel: emit vop2 v_mul_f64 for gfx12+
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156>
2025-10-31 08:31:01 +00:00
Georg Lehmann
86ea462f4d aco/isel: emit vop2 v_fadd_f64 for gfx12+
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38156>
2025-10-31 08:31:01 +00:00
Georg Lehmann
0c8b885e21 aco/isel: emit v_mul_f64 for fp64 fsat
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38011>
2025-10-29 17:57:52 +00:00
Georg Lehmann
9ece74ce79 aco/isel: emit v_mul_f64 with modifiers for fneg/fabs
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38011>
2025-10-29 17:57:52 +00:00
Konstantin Seurer
47ffe2ecd4 aco: Fixup out_launch_size_y in the RT prolog for 1D dispatch
launch_size_y is set to ACO_RT_CONVERTED_2D_LAUNCH_SIZE for 1D
dispatches. The prolog needs to set it to 1 so that the app shader
loads the correct value.

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37974>
2025-10-23 07:56:35 +00:00
Daniel Schürmann
eecd1c020d amd: keep ac_shader_config::lds_size unaligned
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>
2025-10-15 11:20:09 +00:00
Daniel Schürmann
fe6ff6d1ef aco: remove DeviceInfo::lds_encoding_granule and DeviceInfo::lds_alloc_granule
Use utility functions instead.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>
2025-10-15 11:20:08 +00:00
Daniel Schürmann
11db02d5d9 radv: calculate LDS allocation requirements independently from the compiler
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>
2025-10-15 11:20:07 +00:00
Daniel Schürmann
b651234414 amd: change ac_shader_config::lds_size to bytes
We still keep it aligned to allocation granularity.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>
2025-10-15 11:20:07 +00:00
Daniel Schürmann
d0b87a0d5f ac/nir_flag_smem_for_loads: call divergence analysis internally
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Also don't flag more SMEM instructions (in ACO) after the last
call to ac_nir_lower_mem_access_bit_sizes().

Totals from 75 (0.09% of 79839) affected shaders: (Navi48)

Instrs: 191246 -> 189960 (-0.67%)
CodeSize: 996840 -> 985976 (-1.09%)
Latency: 3066184 -> 2945500 (-3.94%)
InvThroughput: 355373 -> 353106 (-0.64%); split: -0.66%, +0.02%
SClause: 4848 -> 4699 (-3.07%)
Copies: 13827 -> 13925 (+0.71%); split: -0.07%, +0.78%
Branches: 5176 -> 5003 (-3.34%)
PreSGPRs: 6222 -> 6272 (+0.80%)
VALU: 108934 -> 108993 (+0.05%); split: -0.00%, +0.06%
SALU: 31679 -> 31210 (-1.48%); split: -1.51%, +0.03%
SMEM: 7158 -> 6739 (-5.85%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
2025-10-14 16:33:12 +00:00
Daniel Schürmann
8ff44f17ef amd/lower_mem_access_bit_sizes: also use SMEM for subdword loads
We can simply extract from the loaded dwords as per
nir_lower_mem_access_bit_sizes() lowering.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
2025-10-14 16:33:11 +00:00
Samuel Pitoiset
bc32286e5b radv: declare a new user SGPR for dynamic descriptors
To move them out of push constants.

fossils-db (GFX1201):
Totals from 20700 (25.99% of 79646) affected shaders:
Instrs: 14375624 -> 14370051 (-0.04%); split: -0.07%, +0.03%
CodeSize: 76746128 -> 76723772 (-0.03%); split: -0.05%, +0.02%
Latency: 74103586 -> 74113651 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 11908817 -> 11908798 (-0.00%); split: -0.00%, +0.00%
VClause: 249605 -> 249607 (+0.00%); split: -0.00%, +0.00%
SClause: 337914 -> 337772 (-0.04%); split: -0.08%, +0.04%
Copies: 843585 -> 839233 (-0.52%); split: -0.62%, +0.10%
PreSGPRs: 836283 -> 837260 (+0.12%)
SALU: 1790713 -> 1786374 (-0.24%); split: -0.29%, +0.05%

Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768>
2025-10-14 15:34:43 +00:00
Georg Lehmann
58163f65f0 aco/optimizer: rework packed fneg opt
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272>
2025-10-14 08:33:40 +00:00
Georg Lehmann
6eac72088c aco/gfx10+: only work around split execution of uniform LDS in WGP mode
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
LDS instructions from one CU won't split the execution of other LDS instruction
on the same CU.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31630>
2025-10-13 10:22:22 +00:00
Georg Lehmann
c13caa5e5f aco: fix global_atomic_swap offset overflow check
Fixes: d7dcd81c77 ("aco/gfx6: allow both constant and gpr offset for global with sgpr address")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>
2025-10-13 09:41:41 +00:00
Marek Olšák
3fe651f607 nir: remove load_smem_amd
replaced by load_global_amd + ACCESS_SMEM_AMD

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>
2025-10-08 08:54:11 +00:00
Rhys Perry
20af16b4d8 aco: use MTBUF for 64-bit atomic load/store
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.

From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:31 +00:00
Rhys Perry
f905acfada aco: remove barrier acquire/release workaround
This existed since ccfe9813fb because NIR
had no atomic loads/stores. This is no longer the case.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:31 +00:00
Rhys Perry
271b135b03 aco: set atomic semantic for atomic load/store
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Rhys Perry
74b807cf58 aco: only workaround load tearing for atomic loads
For non-atomic loads, this situation would require a data race.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Georg Lehmann
d514696a0c aco/isel: support nir_op_atomic_isub
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>
2025-10-07 14:07:56 +00:00
Georg Lehmann
cf30742a66 radv,aco: don't end monolithic ray tracing with unconditional terminate
The terminate requires more code and blocks us from deallocating VGPRs early.

Foz-DB Navi31:
Totals from 63 (0.08% of 80273) affected shaders:
Instrs: 3372702 -> 3372467 (-0.01%)
CodeSize: 17441676 -> 17440736 (-0.01%)
Latency: 19763447 -> 19763288 (-0.00%)
InvThroughput: 3860502 -> 3860478 (-0.00%)
Branches: 96204 -> 96141 (-0.07%)
SALU: 406648 -> 406549 (-0.02%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37542>
2025-09-25 15:35:55 +00:00
Daniel Schürmann
d041640b88 aco: remove excess offset handling for load/store_shared
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37453>
2025-09-24 14:28:25 +00:00
Rhys Perry
d6ed68212c aco: fix SGPR 8-bit nir_op_vec with mixed constant and non-constant
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
For example, vec2(non_const, const)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 04e3d7ad93 ("aco: improve nir_op_vec with constant operands")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13911
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37405>
2025-09-18 12:37:19 +00:00
Rhys Perry
8931672eef aco: workaround load tearing for load_shared2_amd
This probably has the same issue as load_shared.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 04956d54ce ("aco: force uniform result for LDS load with uniform address if it can be non uniform")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37417>
2025-09-17 11:29:21 +00:00
Rhys Perry
81df517553 aco: avoid unaligned offsets when selecting load_global_amd
SMEM instructions mask off the low bits for the base and offset sources
both before and after they're added. However, NIR expects ACO to only
care about the alignment of the final address.

fossil-db (gfx1201):
Totals from 21 (0.03% of 79839) affected shaders:
Instrs: 229780 -> 229876 (+0.04%)
CodeSize: 1267724 -> 1268080 (+0.03%)
Latency: 2800924 -> 2800978 (+0.00%)
InvThroughput: 520250 -> 520256 (+0.00%)
Copies: 27878 -> 27876 (-0.01%); split: -0.01%, +0.00%
SALU: 29591 -> 29643 (+0.18%)

fossil-db (polaris10):
Totals from 3 (0.00% of 62201) affected shaders:
Latency: 2651 -> 2652 (+0.04%)
InvThroughput: 662 -> 663 (+0.15%)
PreSGPRs: 51 -> 54 (+5.88%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301>
2025-09-17 09:15:46 +00:00
Rhys Perry
6d71521ecd aco: avoid wraparound for smem global loads with both offsets
fossil-db (gfx1201):
Totals from 296 (0.37% of 79839) affected shaders:
Instrs: 382593 -> 380149 (-0.64%)
CodeSize: 1981452 -> 1970988 (-0.53%); split: -0.53%, +0.00%
Latency: 1575286 -> 1574252 (-0.07%)
InvThroughput: 215839 -> 215818 (-0.01%)
SClause: 8679 -> 8677 (-0.02%); split: -0.03%, +0.01%
Copies: 19642 -> 19641 (-0.01%); split: -0.03%, +0.02%
PreSGPRs: 14521 -> 14515 (-0.04%)
SALU: 57097 -> 55718 (-2.42%)

fossil-db (polaris10):
Totals from 30 (0.05% of 62201) affected shaders:
Instrs: 23341 -> 23379 (+0.16%); split: -0.01%, +0.18%
CodeSize: 121316 -> 121516 (+0.16%); split: -0.01%, +0.17%
SGPRs: 2368 -> 2384 (+0.68%)
Latency: 235153 -> 235374 (+0.09%); split: -0.01%, +0.11%
InvThroughput: 92582 -> 92566 (-0.02%)
SClause: 616 -> 619 (+0.49%)
Copies: 2717 -> 2720 (+0.11%)
PreSGPRs: 1204 -> 1213 (+0.75%)
SALU: 3654 -> 3692 (+1.04%); split: -0.08%, +1.12%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.2
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301>
2025-09-17 09:15:46 +00:00
Georg Lehmann
714a149396 nir: remove unsigned upper bound config
All config information is now either in nir->info or nir->options.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>
2025-09-16 09:24:04 +00:00
Georg Lehmann
bb67dae12d nir/uub: remove max_workgroup_size from config
For most hardware, this is the same as max invocations in the workgroup.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>
2025-09-16 09:24:04 +00:00
Georg Lehmann
f3c08c9d27 nir/uub: use shader_info subgroup size
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>
2025-09-16 09:24:04 +00:00
Georg Lehmann
d029686e20 aco/isel: fix output args init stack buffer overflow
BITSET range functions include the end of the range.

Fixes: eb249bb18e ("aco: Only fix used variables to registers")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>
2025-09-16 09:24:03 +00:00
Natalie Vock
3667a7b687 aco: Add call info
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34531>
2025-09-15 17:16:20 +00:00
Samuel Pitoiset
decf9af472 radv/rt: only use one user SGPR for the traversal shader addr
All shaders are allocated in the 32-bit addr space. To avoid an issue
with alignment, and also for future work, there is an unused user SGPR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37133>
2025-09-03 05:53:41 +00:00
Marek Olšák
4c87d002e3 aco,radeonsi: expand 32-bit shader arg pointers to 64 bits for ACO
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>
2025-08-30 15:04:32 -04:00
Marek Olšák
7d5288b5b7 aco: check that global addresses are 64bit, apply_nuw_to_ssa to global_amd/smem
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>
2025-08-30 15:04:32 -04:00
Georg Lehmann
38e32e39a9 aco: never end wqm early for vmem
The remaining cases where disable_wqm isn't set are either uniform loads
or loads that influence control flow. In the first case, not ending WQM early
is free, and in the second case it's likely still better to not block scheduling.

Foz-DB GFX1201:
Totals from 483 (0.60% of 80287) affected shaders:
MaxWaves: 12654 -> 12642 (-0.09%)
Instrs: 485234 -> 484830 (-0.08%); split: -0.19%, +0.11%
CodeSize: 2630876 -> 2629184 (-0.06%); split: -0.15%, +0.08%
VGPRs: 29980 -> 30004 (+0.08%)
Latency: 4908015 -> 4813167 (-1.93%); split: -1.95%, +0.02%
InvThroughput: 751059 -> 748582 (-0.33%); split: -0.35%, +0.02%
VClause: 8723 -> 8705 (-0.21%); split: -0.30%, +0.09%
SClause: 11085 -> 10986 (-0.89%); split: -1.45%, +0.56%
Copies: 25155 -> 25183 (+0.11%); split: -0.26%, +0.37%
Branches: 6203 -> 6204 (+0.02%)
PreSGPRs: 23763 -> 23780 (+0.07%)
VALU: 296576 -> 296593 (+0.01%); split: -0.01%, +0.02%
SALU: 49095 -> 49416 (+0.65%); split: -0.04%, +0.69%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:04 +00:00
Georg Lehmann
3d190f2e9c aco: implement skip_helpers for load_global_amd
Foz-DB GFX1201:
Totals from 119 (0.15% of 80287) affected shaders:
Instrs: 212449 -> 213452 (+0.47%)
CodeSize: 1120656 -> 1124708 (+0.36%)
Latency: 2854370 -> 2855772 (+0.05%); split: -0.02%, +0.07%
InvThroughput: 586142 -> 586210 (+0.01%); split: -0.00%, +0.01%
VClause: 3556 -> 3656 (+2.81%)
SClause: 2708 -> 2710 (+0.07%)
Copies: 14410 -> 14509 (+0.69%)
PreSGPRs: 6810 -> 6850 (+0.59%); split: -0.12%, +0.70%
VALU: 135945 -> 135942 (-0.00%); split: -0.01%, +0.01%
SALU: 22147 -> 23121 (+4.40%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>
2025-08-28 06:29:04 +00:00