Commit graph

1331 commits

Author SHA1 Message Date
Rhys Perry
208d62430f aco/gfx12: use s_load_dwordx3 to load ray launch sizes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:49 +00:00
Rhys Perry
cbd718506b aco: add smem opcode helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34162>
2025-05-08 13:30:49 +00:00
Rhys Perry
752f5f317e aco: replace max_const_offset_plus_one with max_const_offset
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34766>
2025-05-01 09:19:02 +00:00
Rhys Perry
a85ebe16b3 aco: fix max_const_offset_plus_one overflow
smem_offset_max is UINT32_MAX on GFX7 and setting
max_const_offset_plus_one to 0 causes divisions by zero later.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: c26851b80b ("aco: increase max_const_offset_plus_one for SMEM load_global")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34766>
2025-05-01 09:19:02 +00:00
Rhys Perry
6338ed44c5 aco/gfx12: increase maximum vbuffer offset
fossil-db (gfx1201):
Totals from 301 (0.38% of 79377) affected shaders:
Instrs: 2734478 -> 2728816 (-0.21%); split: -0.21%, +0.00%
CodeSize: 14347476 -> 14306568 (-0.29%)
Latency: 15508055 -> 15502202 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 2846419 -> 2842387 (-0.14%); split: -0.14%, +0.00%
VClause: 68286 -> 68101 (-0.27%); split: -0.30%, +0.03%
SClause: 49487 -> 49500 (+0.03%)
Copies: 207179 -> 206093 (-0.52%); split: -0.57%, +0.04%
Branches: 72941 -> 72942 (+0.00%); split: -0.00%, +0.00%
VALU: 1549156 -> 1544727 (-0.29%); split: -0.29%, +0.00%
SALU: 339620 -> 338989 (-0.19%); split: -0.19%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
c26851b80b aco: increase max_const_offset_plus_one for SMEM load_global
fossil-db (gfx1201):
Totals from 1115 (1.40% of 79377) affected shaders:
Instrs: 1473805 -> 1467571 (-0.42%); split: -0.43%, +0.01%
CodeSize: 7852972 -> 7819656 (-0.42%); split: -0.44%, +0.02%
SpillSGPRs: 1632 -> 1460 (-10.54%); split: -11.27%, +0.74%
Latency: 11975762 -> 11971915 (-0.03%); split: -0.05%, +0.02%
InvThroughput: 2496961 -> 2496448 (-0.02%); split: -0.03%, +0.01%
VClause: 25213 -> 25218 (+0.02%); split: -0.00%, +0.02%
SClause: 28822 -> 28565 (-0.89%); split: -1.41%, +0.52%
Copies: 106377 -> 105715 (-0.62%); split: -1.23%, +0.61%
Branches: 27497 -> 27473 (-0.09%)
PreSGPRs: 52071 -> 51310 (-1.46%)
VALU: 871051 -> 870694 (-0.04%); split: -0.04%, +0.00%
SALU: 186090 -> 181811 (-2.30%); split: -2.32%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
f390893a64 aco/gfx12: use s_sub_u64
fossil-db (gfx1201):
Totals from 2 (0.00% of 79377) affected shaders:
Instrs: 243999 -> 243993 (-0.00%)
CodeSize: 1288176 -> 1288152 (-0.00%)
Latency: 1894093 -> 1894091 (-0.00%)
InvThroughput: 378819 -> 378818 (-0.00%)
SALU: 33048 -> 33044 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Rhys Perry
5b4813c4f0 aco/gfx12: use s_add_u64
fossil-db (gfx1201):
Totals from 122 (0.15% of 79377) affected shaders:
Instrs: 3640138 -> 3637577 (-0.07%); split: -0.07%, +0.00%
CodeSize: 19133796 -> 19120080 (-0.07%); split: -0.08%, +0.01%
SpillSGPRs: 666 -> 650 (-2.40%)
SpillVGPRs: 2147 -> 2159 (+0.56%)
Scratch: 254208 -> 255232 (+0.40%)
Latency: 21529337 -> 21522317 (-0.03%); split: -0.04%, +0.00%
InvThroughput: 4048519 -> 4047233 (-0.03%); split: -0.03%, +0.00%
VClause: 90453 -> 90455 (+0.00%)
SClause: 67846 -> 67674 (-0.25%); split: -0.28%, +0.03%
Copies: 287449 -> 287476 (+0.01%); split: -0.04%, +0.05%
Branches: 104526 -> 104530 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 9795 -> 9723 (-0.74%); split: -0.78%, +0.04%
VALU: 2004219 -> 2003031 (-0.06%); split: -0.06%, +0.00%
SALU: 492651 -> 491737 (-0.19%); split: -0.19%, +0.00%
VMEM: 161317 -> 161341 (+0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34730>
2025-04-29 17:44:41 +00:00
Georg Lehmann
8f3489f351 aco/isel: create WMMA with constant C matrix if possible
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>
2025-04-22 16:08:57 +00:00
Georg Lehmann
6d7e67d986 nir,amd: add neg_lo/hi modifiers to cmat_matmul_amd
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>
2025-04-22 16:08:55 +00:00
Georg Lehmann
b0c8f31600 aco: set opsel_hi to 1 for WMMA
This is ignored by the hardware but LLVM requires it to disassemble GFX12 WMMA.

Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396>
2025-04-22 16:08:54 +00:00
Konstantin Seurer
978e9b670e aco,nir: Add support for new GFX12 ray tracing instructions
Adds image_bvh_dual_intersect_ray and image_bvh8_intersect_ray which can
handle the new BVH format. Both instructions write up to 10 VGPRs so
they need to use a vec16 definition in nir.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273>
2025-04-17 20:20:40 +00:00
Natalie Vock
b9e506afd4 aco: Add support for multiple definitions in emit_mimg
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273>
2025-04-17 20:20:40 +00:00
Natalie Vock
3d8db3cbbb aco: Make private_segment_buffer/scratch_offset per-resume
We need different Temps for each resume shader, because registers aren't
preserved across resume boundaries.

This was likely fine in practice because arg registers are the same for
each shader, but resulted in invalid IR and asserts.

Fixes crashes in Indiana Jones RT with assertions enabled on GFX8.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34114>
2025-04-09 14:21:37 +00:00
Georg Lehmann
c70dcd1451 aco/gfx9+: use d16 global/scratch/buffer loads
Full register loads are not nessecary and prevent packing optimizations.
Global/Scratch is GFX9+ so D16 loads are always supported.
We already used LDS D16 loads.

Foz-DB Navi31(mostly RA noise):
Totals from 716 (0.90% of 79789) affected shaders:
Instrs: 3854176 -> 3854238 (+0.00%); split: -0.00%, +0.00%
CodeSize: 20034440 -> 20035220 (+0.00%); split: -0.00%, +0.00%
Latency: 24410951 -> 24411120 (+0.00%)
InvThroughput: 5181276 -> 5181301 (+0.00%)
Copies: 320258 -> 320317 (+0.02%)
VALU: 2207307 -> 2207366 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34346>
2025-04-04 16:20:39 +00:00
Georg Lehmann
7631b10984 aco: implement mul24_relaxed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:16 +00:00
Daniel Schürmann
afc605bc9b aco: Remove empty exec skipping after demote
Totals from 858 (1.08% of 79377) affected shaders: (Navi31)

Instrs: 678713 -> 677694 (-0.15%); split: -0.15%, +0.00%
CodeSize: 3732576 -> 3729104 (-0.09%); split: -0.10%, +0.01%
Latency: 4199397 -> 4198632 (-0.02%); split: -0.06%, +0.04%
InvThroughput: 691391 -> 691122 (-0.04%); split: -0.04%, +0.00%
SClause: 14593 -> 14605 (+0.08%)
Copies: 41279 -> 41288 (+0.02%); split: -0.04%, +0.06%
Branches: 13575 -> 13452 (-0.91%)
PreSGPRs: 29069 -> 29039 (-0.10%)
VALU: 426261 -> 426215 (-0.01%); split: -0.01%, +0.00%
SALU: 60458 -> 60471 (+0.02%); split: -0.02%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
90faadae72 aco/insert_exec_mask: don't disable dead quads on demote in divergent CF
Also force-enalbe helpers in case of demote in divergent CF.

Totals from 1305 (1.64% of 79377) affected shaders: (Navi31)

Instrs: 926923 -> 922516 (-0.48%); split: -0.48%, +0.00%
CodeSize: 5045292 -> 5027408 (-0.35%); split: -0.36%, +0.00%
Latency: 6176577 -> 6174708 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 931603 -> 931583 (-0.00%); split: -0.00%, +0.00%
SClause: 22816 -> 22855 (+0.17%); split: -0.17%, +0.34%
Copies: 57347 -> 55170 (-3.80%); split: -3.81%, +0.01%
Branches: 18990 -> 18974 (-0.08%)
PreSGPRs: 42734 -> 43248 (+1.20%)
SALU: 90511 -> 86153 (-4.81%); split: -4.85%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
69dcd5be3a aco: don't assume that demote doesn't cause an empty exec mask
Totals from 188 (0.24% of 79377) affected shaders: (Navi31)
Instrs: 209239 -> 209473 (+0.11%); split: -0.01%, +0.12%
CodeSize: 1101124 -> 1101744 (+0.06%); split: -0.02%, +0.07%
Latency: 1672182 -> 1672748 (+0.03%); split: -0.11%, +0.14%
InvThroughput: 237276 -> 237546 (+0.11%); split: -0.00%, +0.12%
SClause: 5694 -> 5690 (-0.07%); split: -0.28%, +0.21%
Copies: 21685 -> 21682 (-0.01%); split: -0.12%, +0.10%
Branches: 5740 -> 5863 (+2.14%)
PreSGPRs: 7004 -> 7034 (+0.43%)
VALU: 123595 -> 123641 (+0.04%); split: -0.00%, +0.04%
SALU: 28418 -> 28411 (-0.02%); split: -0.09%, +0.06%

Fixes: f35e229fae ('aco: skip code if exec is empty')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Georg Lehmann
3b5e537b09 aco/gfx11.5: remove vinterp ddx/ddy path
While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.

Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969>
2025-03-12 11:31:54 +00:00
Georg Lehmann
20dd6dfa12 aco/isel: use s_mul_i32 instead of s_cselect_b32 for a ? b : 0
It doesn't require SCC and this is more consistent with b2f.

Foz-DB Navi21:
Totals from 2107 (2.64% of 79789) affected shaders:
Instrs: 6619774 -> 6619280 (-0.01%); split: -0.01%, +0.00%
CodeSize: 36754448 -> 36752396 (-0.01%); split: -0.01%, +0.00%
Latency: 62207779 -> 62206422 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 13090494 -> 13090204 (-0.00%); split: -0.00%, +0.00%
VClause: 171572 -> 171573 (+0.00%)
SClause: 257528 -> 257530 (+0.00%)
Copies: 607680 -> 607204 (-0.08%); split: -0.10%, +0.02%
VALU: 4189422 -> 4189418 (-0.00%)
SALU: 1001750 -> 1001264 (-0.05%); split: -0.07%, +0.02%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>
2025-03-04 21:36:17 +00:00
Ivan Avdeev
ff6504d4c0 radv: add experimental support for AMD BC-250 board
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.

It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html

Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
2025-03-04 08:07:31 +00:00
Daniel Schürmann
52253da783 aco: unify get_addr_sgpr_from_waves() and get_addr_vgpr_from_waves() into one function
which returns the limit as RegisterDemand.

Also remove the unused get_extra_sgprs() from aco_ir.h.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Rhys Perry
539f9b4ba6 nir,aco,radv: add align_mul/offset to buffer_amd intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Daniel Schürmann
1a8a643bbd aco/isel: track control flow divergence in loops more accurately
We introduce two new variables, cf_context::in_divergent_cf and
cf_context::parent_loop.has_divergent_break, in order to determine
whether there is any other invocations on a different CF path.

Totals from 1305 (1.64% of 79395) affected shaders: (Navi31)

Instrs: 659211 -> 657815 (-0.21%); split: -0.22%, +0.01%
CodeSize: 3483228 -> 3477960 (-0.15%); split: -0.16%, +0.01%
VGPRs: 68820 -> 48048 (-30.18%)
Latency: 14197750 -> 14170767 (-0.19%); split: -0.26%, +0.07%
InvThroughput: 1619103 -> 1619826 (+0.04%); split: -0.02%, +0.07%
VClause: 12384 -> 12350 (-0.27%)
SClause: 26693 -> 26844 (+0.57%); split: -0.01%, +0.57%
Copies: 44994 -> 43535 (-3.24%); split: -3.26%, +0.02%
PreSGPRs: 49007 -> 48907 (-0.20%)
PreVGPRs: 32171 -> 32121 (-0.16%)
VALU: 349984 -> 349857 (-0.04%); split: -0.04%, +0.00%
SALU: 84252 -> 83988 (-0.31%); split: -0.32%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
583c3586fe aco/isel: remove loop nest information from exec_info
Since we never enter loops with an empty exec mask, and the
control flow is structured, we don't need to consider the
loop nest depth.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
a77258346c aco/isel: fix assumptions about potential empty exec mask in nested control flow
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
44216e035f aco/isel: add and use exec_info::empty() helper
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
8e8398832c aco/isel: use cf_context in loop_context to restore cf information
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
8b9c9fb904 aco/isel: use cf_context in if_context to restore cf information
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
c2bfc05d71 aco/isel: rename cf_context::has_divergent_branch
Make it more consistent with cf_context::has_branch.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
0c5a91b9f2 aco/isel: move cf_info into separate struct cf_context
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
61fa007e48 aco/isel: fix empty exec tracking for uniform branches
Totals from 5 (0.01% of 79395) affected shaders: (Navi31)

Instrs: 54730 -> 54715 (-0.03%)
CodeSize: 276928 -> 276852 (-0.03%)
Latency: 215212 -> 214874 (-0.16%)
InvThroughput: 40154 -> 40150 (-0.01%)
Copies: 6824 -> 6821 (-0.04%); split: -0.06%, +0.01%
Branches: 1625 -> 1615 (-0.62%)
SALU: 5682 -> 5678 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Konstantin Seurer
60a20bcf3d nir: Stop using instructions for debug info
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.

The old approach also stops worrking if passes reorder instructions.

This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
2025-01-30 20:14:01 +00:00
Marek Olšák
f98613d47c aco: implement replacement of sample_mask_in with helper_invocation in PS prolog
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
a842f198d7 aco: simplify how broadcast_last_cbuf is implemented in PS epilog
So PS epilogs only need a single bool flag that determines whether all
enabled color buffers should be written.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
5c4f737b84 aco: implement replacing frag_coord with pixel_coord in PS prolog
This adds an option to replace frag_coord.xy with pixel_coord when sample
shading is disabled, which is most of the time. This reduces the number of
input VGPRs.

It's already implement in ac_nir_lower_ps_early for monolithic shaders.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák
d7d4d56f5b ac,aco,radeonsi: replace SampleMaskIn with 1 << SampleID if full sample shading
Since the sample mask is always 1 << sample_id with full sample shading,
just use that instead of loading sample_mask_in. Set it to 0 if it's
a helper invocation. This removes the sample mask input VGPR.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:25 -05:00
Marek Olšák
d160252270 ac: use Z_EXPORT_FORMAT=32_AR for Z + Alpha mrtz exports
This should be faster than 32_ABGR.

Also, stencil exports are changed from UINT16_ABGR to 32_GR,
which should have no effect on performance.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046>
2025-01-16 02:58:03 +00:00
Timur Kristóf
50035f0316 ac/nir: Move all ac_nir_* files to a new folder.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:46:30 +01:00
Samuel Pitoiset
10e424f586 aco: always use ds_bpermute for shuffle/rotate on GFX12
ds_bpermute supports both 32 and 64 lanes now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32974>
2025-01-13 08:33:38 +00:00
Samuel Pitoiset
f94bd67b82 aco: fix VS prologs on GFX12
MTBUF/MUBUF instructions must use zero for SOFFSET, use const_offset
instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32904>
2025-01-07 13:44:32 +00:00
Marek Olšák
0d5b03f2b9 ac/nir: split local_invocation_ids to 3 separate VGPR inputs
so that we can set the upper range per VGPR.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
ceb6f8fc32 amd: lower load_tess_rel_patch_id/primitive_id/tess_coord and overwrite.. in NIR
The overwrite instruction complicates it a little, which is why these
intrinsics are lowered together.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
61bfb4fa06 amd: lower load_subgroup_invocation in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
e69f47faee amd: lower load_local_invocation_index in NIR
This is the last intrinsic that needed the LS VGPR bug workaround in ACO
and ac_nir_to_llvm.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
342dcbdc8b amd: lower load_vertex_id/instance_id and overwrite_vs_arguments in NIR
2 things complicate this:
- overwrite_vs_arguments_amd
- the LS VGPR bug workaround

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
66dd70adc5 amd: lower load_gs_wave_id_amd in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
923f59c971 amd: lower load_barycentric_at_offset in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák
16ab05fad1 amd: lower load_barycentric_pixel/centroid/sample in NIR
radeonsi needs to preserve interp_mode in the arg load.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00