Commit graph

4369 commits

Author SHA1 Message Date
Rhys Perry
6e06012825 radv,ac: make rembrandt and vangogh cache compatible
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41340>
2026-05-06 17:41:31 +00:00
Rhys Perry
ec59b59b97 nir: rename nir_src_parent_instr to nir_src_use_instr
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`

There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used

Clarify that the parent here is where the src's def is used, not where
it's defined.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
2026-05-06 17:09:22 +00:00
Rhys Perry
6f50dda648 aco/gfx11.7: fix v_pk_min_f16/v_pk_max_f16 opcode numbers
Apparently the opcode numbers in LLVM were wrong:
https://github.com/llvm/llvm-project/pull/195180

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 58debf726c ("aco/gfx11.7: add opcode numbers")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41311>
2026-05-04 11:11:52 +00:00
Marek Olšák
1b409ff681 aco/tests: update ACO tests for ac_nir_lower_tex_coords refactoring
For: ac/nir/lower_tex_coords: move input loads instead of cloning them

Suggested by: Georg Lehmann

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41173>
2026-05-01 02:37:16 +00:00
Marek Olšák
0684976de8 ac/nir: add ac_nir_assign_fs_input_locations to set PS input locations in stone
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No intended functional change.

This prevents possible breakage due to DCE removing input loads followed
by nir_shader_gather_info updating input masks and changing the result of
ac_nir_get_io_driver_location after PS input register contents are already
determined.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41175>
2026-04-27 21:05:53 +00:00
Marek Olšák
bfb6c41b64 amd: remove unnecessary and transitive #includes
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reported by clang tools.
See: https://clangd.llvm.org/guides/include-cleaner

struct ac_cmdbuf had to be moved to ac_cmdbuf_base.h because we can't
include ac_cmdbuf.h->sid.h->amdgfxregs.h in radeon_winsys.h for r300.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41091>
2026-04-24 21:53:07 +00:00
Liu, Mengyang
40fa195cd0 aco: fix broken VGPRs reservation for 64-bit attributes in VS prologs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
After 8e6bff4caa, the large attribute counts as two slots in
`num_attributes` if the vertex shader consumes more than two
channels of it, even though `misaligned_mask` marks only the
lower slot.

Fixes: 8e6bff4caa ("radv: Lower 64-bit VS inputs to 32-bit")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41071>
2026-04-23 08:20:47 +00:00
Rhys Perry
bddd8b36a6 aco: use RegisterDemand::operator[] more
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39690>
2026-04-21 11:16:26 +00:00
Rhys Perry
176b075129 aco: prefer spilling smaller temporaries if it finishes spilling
fossil-db stats seemed less positive when updating process_block() too.

fossil-db (navi31):
Totals from 41 (0.05% of 84369) affected shaders:
Instrs: 294758 -> 294694 (-0.02%); split: -0.11%, +0.09%
CodeSize: 1566136 -> 1564392 (-0.11%); split: -0.21%, +0.10%
SpillSGPRs: 2306 -> 2143 (-7.07%); split: -8.37%, +1.30%
Latency: 3877251 -> 3868194 (-0.23%); split: -0.29%, +0.05%
InvThroughput: 881747 -> 882352 (+0.07%); split: -0.01%, +0.08%
SClause: 6498 -> 6494 (-0.06%); split: -0.09%, +0.03%
Copies: 33582 -> 33900 (+0.95%); split: -0.23%, +1.18%
Branches: 6799 -> 6801 (+0.03%)
VALU: 192977 -> 192646 (-0.17%); split: -0.21%, +0.04%
SALU: 28082 -> 28395 (+1.11%); split: -0.27%, +1.39%
VOPD: 1939 -> 1959 (+1.03%); split: +1.19%, -0.15%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39690>
2026-04-21 11:16:26 +00:00
Rhys Perry
0ffbc30d7f aco: refactor spiller to use spills_needed variable
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39690>
2026-04-21 11:16:26 +00:00
Rhys Perry
ee78bea393 aco/gfx11.7: claim support
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40917>
2026-04-17 11:33:00 +00:00
Rhys Perry
f67b861f78 aco/gfx11.7: allow any src VGPR for VOPD with two v_dual_mov_B32
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40917>
2026-04-17 11:32:59 +00:00
Rhys Perry
ad5be681bd aco/gfx11.7: don't use v_pack_b32_f16 in do_pack_2x16
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40917>
2026-04-17 11:32:58 +00:00
Rhys Perry
a1d9fec91f aco/gfx11.7: don't create v_dot2c_f32_f16
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40917>
2026-04-17 11:32:57 +00:00
Rhys Perry
efb863173e aco: adjust some gfx_level checks for gfx11.7
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40917>
2026-04-17 11:32:56 +00:00
Rhys Perry
58debf726c aco/gfx11.7: add opcode numbers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40917>
2026-04-17 11:32:56 +00:00
Rhys Perry
7b1a1fcf5e ac: add gfx11.7 enums
This is just enough to compile future patches and run tests.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40917>
2026-04-17 11:32:55 +00:00
Marek Olšák
a96c854234 ac,radv,radeonsi: don't use nir_intrinsic_base for FS inputs
This all is needed to switch to the new helper.

Acked-by: Pierre-Eric
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40556>
2026-04-15 18:12:11 +00:00
Rhys Perry
984aa6e085 aco/ra: create s_bitset
fossil-db (navi31):
Totals from 33092 (16.35% of 202426) affected shaders:
Instrs: 16722717 -> 16696250 (-0.16%); split: -0.16%, +0.00%
CodeSize: 90003664 -> 89779940 (-0.25%); split: -0.25%, +0.00%
Latency: 123990480 -> 123934891 (-0.04%); split: -0.05%, +0.00%
InvThroughput: 20972033 -> 20971140 (-0.00%); split: -0.01%, +0.00%

fossil-db (navi21):
Totals from 6776 (3.35% of 202427) affected shaders:
Instrs: 11167123 -> 11166438 (-0.01%)
CodeSize: 62605436 -> 62573220 (-0.05%); split: -0.05%, +0.00%
Latency: 238610061 -> 238603224 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 55148639 -> 55148624 (-0.00%); split: -0.00%, +0.00%
Copies: 1211216 -> 1210612 (-0.05%); split: -0.05%, +0.00%
SALU: 1436679 -> 1435997 (-0.05%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40889>
2026-04-14 13:19:55 +00:00
Rhys Perry
88dcda1078 aco: support s_bitset
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40889>
2026-04-14 13:19:55 +00:00
Samuel Pitoiset
cebac5a427 radv/rt: declare shader arguments for resource/sampler heaps
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39483>
2026-04-14 10:10:24 +00:00
Georg Lehmann
51ba9b956a aco: use no contract/reassoc instead of exact
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40872>
2026-04-12 17:10:29 +00:00
Georg Lehmann
b0194d0416 aco/tests: fix med3 NaN tests
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40872>
2026-04-12 17:10:28 +00:00
Rhys Perry
1619288a19 aco: ignore copykill+latekill operands in get_temp_reg_changes
This is possible with two vectors which share a temporary, though I don't
think it currently happens in practice.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40825>
2026-04-10 10:34:45 +00:00
Daniel Schürmann
8cb8c710fb aco: remove remaining occurences of block_kind_continue
It has no purpose anymore.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Daniel Schürmann
74661ccec2 aco/lower_branches: remove handling of block_kind_continue
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Daniel Schürmann
7f0709cff5 aco/opt_value_numbering: remove handling of block_kind_continue
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Daniel Schürmann
a8c4b9f100 aco/lower_phis: remove handling of block_kind_continue
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Daniel Schürmann
16396f2ce6 aco/insert_exec_mask: remove handling of loop continues
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Daniel Schürmann
495c7271a3 aco/isel: remove handling of nir_jump_continue
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Daniel Schürmann
5e89be331f aco/lower_branches: Fix try_rotate_latch_block()
Found by inspection.

Fixes: 97f095f6e0 ('aco/lower_branches: Add try_rotate_latch_block() optimization')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Daniel Schürmann
60b3e5b3f0 aco/lower_branches: Don't remove branches which jump over loops
Entering a loop with empty exec mask might lead to
not be able to execute the break condition and
lead to infinite loops.

Totals from 81 (0.04% of 202440) affected shaders: (Navi48)
Instrs: 3040566 -> 3040716 (+0.00%)
CodeSize: 17506768 -> 17507188 (+0.00%)
Latency: 16342966 -> 16345166 (+0.01%)
InvThroughput: 3112932 -> 3113286 (+0.01%)
Branches: 82229 -> 82365 (+0.17%)

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40628>
2026-04-10 08:51:39 +00:00
Olle Lögdahl
c69da756d1 aco/isel: added test-case for iterative cf visitor
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
isel.cf.deep_traversal is a new ACO test that verifies
that the iterative nir cf visitor allows arbitrary depth.
A depth of 10000 would cause a stack overflow on x86-64 linux
(4096 kB stack) for the old recursive code. This test
is by default not enabled.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
2026-04-09 13:46:23 +00:00
Olle Lögdahl
aa49d69ea0 aco/isel: use iterative visitor during traversal
When iterating control-flow recursively, we always run the
risk of causing a stack overflow if the control-flow
depth is too large. This patch resolves this by visiting
control-flow nodes in an iterative way, managing an explicit
stack on the heap.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
2026-04-09 13:46:23 +00:00
Daniel Schürmann
37e2deab74 aco/isel: Remove if_context* parameter from begin_if() / end_if() helper functions
We can transparently create the context inside the functions, now.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
2026-04-09 13:46:23 +00:00
Daniel Schürmann
53836320a9 aco/isel: Remove loop_context* parameter from begin_loop() / end_loop() helper functions
We can transparently create the context inside the functions, now.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
2026-04-09 13:46:22 +00:00
Olle Lögdahl
5c1dea7ee4 aco/isel: move if_context and loop_context to heap
if_context and loop_context are large structs and may cause
stack overflows during CF traversal. This fix moves them to
the heap.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364>
2026-04-09 13:46:22 +00:00
Georg Lehmann
44a061a034 aco/spill: fix mixed lds+scratch spill/reload
We shouldn't increment the scratch offset while accessing LDS.

Fixes: 133ef9f94b ("aco: spill VGPRs to LDS if it doesn't further limit occupancy")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40855>
2026-04-09 07:51:52 +00:00
Rhys Perry
463e3643f2 nir: add and use block predecessor helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
2026-04-08 15:06:32 +00:00
Georg Lehmann
d1ed4e1774 aco/optimizer: do not try to create 3 byte constant operands
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Operand::get_const will assert.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15239
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40828>
2026-04-08 09:17:26 +00:00
Georg Lehmann
792ce7ddf6 aco/isel: optimize 16/64bit non constant valu bit test
By using the constant path we can combine the v_and and the v_cmp.

Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 2833 -> 2831 (-0.07%)
Latency: 27385 -> 27367 (-0.07%)
InvThroughput: 1712 -> 1710 (-0.12%)
VALU: 1301 -> 1299 (-0.15%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705>
2026-04-08 08:44:20 +00:00
Natalie Vock
fded5e321d aco: Nuke ACO-side prolog selection
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
b53dc3f052 aco/lower_to_hw_instr: Run p_init_scratch if the program has a call
Callees may use scratch even if the caller doesn't.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
378c9536de aco/isel: Fix stack_ptr synthesis
info.stack_ptr.is_reg is always true. We have a stack pointer to use
if and only if the program is a callee.

Also, apply_scratch_offset needs to be true in a few more places.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Natalie Vock
31e08322d7 aco/spill_preserved: Only compute preserved registers if in a callee
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40008>
2026-04-07 11:28:05 +00:00
Georg Lehmann
5453419086 aco/isel: use s_bitcmp1 for 1bit ubfe
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Avoid the s_pack at the cost of having to use scc.

Foz-DB GFX1201:
Totals from 1514 (0.74% of 205032) affected shaders:
Instrs: 3443431 -> 3434096 (-0.27%); split: -0.27%, +0.00%
CodeSize: 19062100 -> 19024320 (-0.20%); split: -0.20%, +0.00%
Latency: 22343329 -> 22342802 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 4471707 -> 4471632 (-0.00%); split: -0.00%, +0.00%
Copies: 280191 -> 279645 (-0.19%); split: -0.21%, +0.01%
PreSGPRs: 71333 -> 71327 (-0.01%)
VALU: 1598064 -> 1598058 (-0.00%); split: -0.00%, +0.00%
SALU: 691458 -> 686437 (-0.73%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40707>
2026-03-31 10:42:33 +00:00
Karol Herbst
5bb3c9f69c nir: rename fsin_amd and fcos_amd to a more generic name
Nvidia implements both the same way as AMD does, so it makes sense to
allow for code sharing here.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40541>
2026-03-31 01:47:29 +02:00
Georg Lehmann
ae2968c4ec aco: allow spilling to LDS in RT shaders without stack pointer
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No Foz-DB changes because most RT shaders use function calls now.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
133ef9f94b aco: spill VGPRs to LDS if it doesn't further limit occupancy
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.

Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.

Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.

Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
17a9ee7152 aco/optimizer: apply dpp to v_dot before RA for gfx10.3
This is a bit unusual, as we otherwise only use the VOP2 codesize
optimization opcodes in the register allocator.

But unless we change the scheduler to not split v_mov_b32_dpp and
v_dot, we have no other choice.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40510>
2026-03-24 09:05:40 +00:00