Rhys Perry
de896234d8
aco: improve spilling of clobbered operands
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We can ignore live_changes for these.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34699 >
2025-04-28 10:04:43 +00:00
Rhys Perry
7fe84024cb
aco: fix get_temp_reg_changes with clobbered operands
...
The spiller might have tried to spill a live-through first or second
s_fmac_f32 operand, but this wouldn't have reduced the SGPRs if the third
operand wasn't killed
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13038
Fixes: d6cb45dbb0 ("aco/spill: Allow spilling live-through operands")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34699 >
2025-04-28 10:04:43 +00:00
Rhys Perry
3c021b79b4
aco/ra: use a correct stride for subdword get_reg_impl
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
fossil-db (gfx1201):
Totals from 3 (0.00% of 79377) affected shaders:
Instrs: 1312 -> 1308 (-0.30%)
CodeSize: 7112 -> 7096 (-0.22%)
Latency: 5381 -> 5382 (+0.02%)
InvThroughput: 753 -> 752 (-0.13%)
Copies: 69 -> 68 (-1.45%)
VALU: 836 -> 835 (-0.12%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34679 >
2025-04-25 14:43:41 +00:00
Rhys Perry
ae6d4f1195
aco/ra: update_renames() before add_subdword_definition()
...
The register file tests here should be done after update_renames().
Normally, get_reg() wouldn't have to move anything to make space for a 1-3
byte definition. This was encountered with skip_optimistic_path=true and a
get_reg_impl() bug (fixed in a later commit) which caused suboptimal
register assignment.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34679 >
2025-04-25 14:43:41 +00:00
Rhys Perry
b03e071583
aco/gfx11: create waitcnt for workgroup vmem barriers
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
It seems this is necessary on GFX11.
Similar to 576a2e798c
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34634 >
2025-04-25 10:41:52 +00:00
Georg Lehmann
65411350ac
aco/insert_exec: disable empty quads when leaving divergent control, even if not top level
...
We don't restore exec after uniform top level branches, so nothing disabled
empty quads after a demote in divergent control flow in a uniform branch.
Foz-DB Navi31:
Totals from 17 (0.02% of 79789) affected shaders:
Instrs: 34573 -> 34572 (-0.00%)
CodeSize: 186876 -> 186872 (-0.00%)
Latency: 324145 -> 324141 (-0.00%)
Copies: 1467 -> 1458 (-0.61%)
PreSGPRs: 802 -> 800 (-0.25%)
SALU: 2531 -> 2530 (-0.04%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34670 >
2025-04-24 15:43:09 +00:00
Rhys Perry
62e50de5d0
aco: use v_perm_b32 for byte swaps within a VGPR on gfx10
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636 >
2025-04-23 18:23:18 +00:00
Rhys Perry
a43783fd76
aco: use v_perm_b32 for do_pack_2x16 on gfx10+
...
fossil-db (gfx1201);
Totals from 93 (0.12% of 79377) affected shaders:
Instrs: 373212 -> 372761 (-0.12%)
CodeSize: 2062752 -> 2063704 (+0.05%); split: -0.00%, +0.05%
Latency: 4172059 -> 4171993 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1299144 -> 1299093 (-0.00%)
Copies: 51268 -> 50831 (-0.85%)
Branches: 10980 -> 10979 (-0.01%)
VALU: 220192 -> 219756 (-0.20%)
VOPD: 48 -> 47 (-2.08%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636 >
2025-04-23 18:23:18 +00:00
Georg Lehmann
dd3e1190a2
aco/insert_exec: reset temporary when recreating wqm mask from exact mask
...
The old, now incorrect temporary was still used for invert blocks and loop masks.
Foz-DB Navi31:
Totals from 379 (0.48% of 79789) affected shaders:
Instrs: 399471 -> 399897 (+0.11%); split: -0.00%, +0.11%
CodeSize: 2197292 -> 2198908 (+0.07%); split: -0.00%, +0.08%
Latency: 2500636 -> 2500895 (+0.01%); split: -0.00%, +0.01%
SClause: 7912 -> 7918 (+0.08%); split: -0.04%, +0.11%
Copies: 25687 -> 26068 (+1.48%); split: -0.04%, +1.53%
PreSGPRs: 15648 -> 15562 (-0.55%)
SALU: 35125 -> 35517 (+1.12%)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12901
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13019
Fixes: b872ff6ef2 ("aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34659 >
2025-04-23 09:37:50 +00:00
Georg Lehmann
13f6be262a
aco/insert_exec: only restore wqm mask after control flow if necessary
...
The next commit will make this not free, so we should avoid it if possible.
Foz-DB Navi31:
Totals from 3933 (4.93% of 79789) affected shaders:
Instrs: 5726914 -> 5727295 (+0.01%); split: -0.00%, +0.01%
CodeSize: 31307100 -> 31308884 (+0.01%); split: -0.00%, +0.01%
SpillSGPRs: 1797 -> 1793 (-0.22%); split: -0.33%, +0.11%
Latency: 58973929 -> 58974343 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 8591893 -> 8591911 (+0.00%); split: -0.00%, +0.00%
SClause: 209074 -> 209115 (+0.02%); split: -0.00%, +0.02%
Copies: 423965 -> 432420 (+1.99%)
Branches: 149976 -> 149979 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 200175 -> 200663 (+0.24%)
VALU: 3440165 -> 3440156 (-0.00%); split: -0.00%, +0.00%
SALU: 555727 -> 556143 (+0.07%); split: -0.00%, +0.08%
Fixes: b872ff6ef2 ("aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34659 >
2025-04-23 09:37:50 +00:00
Georg Lehmann
8f3489f351
aco/isel: create WMMA with constant C matrix if possible
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396 >
2025-04-22 16:08:57 +00:00
Georg Lehmann
4fa3fb87c7
aco/insert_NOPs: allow WMMA with constant C matrix
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396 >
2025-04-22 16:08:56 +00:00
Georg Lehmann
6d7e67d986
nir,amd: add neg_lo/hi modifiers to cmat_matmul_amd
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396 >
2025-04-22 16:08:55 +00:00
Georg Lehmann
b0c8f31600
aco: set opsel_hi to 1 for WMMA
...
This is ignored by the hardware but LLVM requires it to disassemble GFX12 WMMA.
Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396 >
2025-04-22 16:08:54 +00:00
Eric Engestrom
2bcb55f3f6
aco: help clang 20 do some additions and subtractions
...
clang 20 complains:
../src/amd/compiler/aco_assembler.cpp:837:28: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
837 | vaddr[num_vaddr + i] = reg(ctx, instr->operands.back(), 8) + i + 1;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/amd/compiler/aco_assembler.cpp:832:12: note: at offset 5 into destination object ‘vaddr’ of size 5
832 | uint8_t vaddr[5] = {0, 0, 0, 0, 0};
| ^~~~~
../src/amd/compiler/aco_assembler.cpp:837:28: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
837 | vaddr[num_vaddr + i] = reg(ctx, instr->operands.back(), 8) + i + 1;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/amd/compiler/aco_assembler.cpp:832:12: note: at offset 6 into destination object ‘vaddr’ of size 5
832 | uint8_t vaddr[5] = {0, 0, 0, 0, 0};
| ^~~~~
../src/amd/compiler/aco_assembler.cpp:837:28: error: writing 1 byte into a region of size 0 [-Werror=stringop-overflow=]
837 | vaddr[num_vaddr + i] = reg(ctx, instr->operands.back(), 8) + i + 1;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/amd/compiler/aco_assembler.cpp:832:12: note: at offset 7 into destination object ‘vaddr’ of size 5
832 | uint8_t vaddr[5] = {0, 0, 0, 0, 0};
| ^~~~~
But `i < MIN2(instr->operands.back().size() - 1, 5 - num_vaddr)` means `i` is
at most `5 - num_vaddr - 1`, which means `vaddr[num_vaddr + i]` =>
`vaddr[num_vaddr + 5 - num_vaddr - 1]` => `vaddr[5 - 1]` => `vaddr[4]` which
is within the valid indices.
For some reason, using signed `int` instead allows clang to figure this
out, so let's do that since we don't need the extra range.
While at it, use ARRAY_SIZE(vaddr) instead of hard-coding the same `5`
in several places.
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34625 >
2025-04-21 15:16:02 +00:00
Konstantin Seurer
978e9b670e
aco,nir: Add support for new GFX12 ray tracing instructions
...
Adds image_bvh_dual_intersect_ray and image_bvh8_intersect_ray which can
handle the new BVH format. Both instructions write up to 10 VGPRs so
they need to use a vec16 definition in nir.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273 >
2025-04-17 20:20:40 +00:00
Natalie Vock
ee0f784858
aco/ra: Don't consider precolored ops/defs in get_reg_impl
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273 >
2025-04-17 20:20:40 +00:00
Natalie Vock
b9e506afd4
aco: Add support for multiple definitions in emit_mimg
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273 >
2025-04-17 20:20:40 +00:00
Natalie Vock
f309d76aab
aco: Add support for multiple ops fixed to defs
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34273 >
2025-04-17 20:20:40 +00:00
Rhys Perry
427479c040
aco: remove va_vdst/vm_vsrc/sa_sdst variables
...
Use the "wait" variable instead.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529 >
2025-04-17 17:28:22 +00:00
Rhys Perry
3d6fa6996c
aco: init vm_vsrc/sa_sdst from depctr_wait
...
fossil-db (navi31):
Totals from 5805 (7.31% of 79377) affected shaders:
Instrs: 14229621 -> 14207115 (-0.16%); split: -0.16%, +0.00%
CodeSize: 75358724 -> 75268624 (-0.12%); split: -0.12%, +0.00%
Latency: 133637034 -> 133624262 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 22067819 -> 22066213 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529 >
2025-04-17 17:28:22 +00:00
Rhys Perry
ce2be5ab8e
aco: combine VALU lanemask hazard into VALUMaskWriteHazard
...
This is now basically the same as the original VALUMaskWriteHazard, except
it now considers both VALU and SALU writes.
Now that it's a part of VALUMaskWriteHazard, differences from the original
VALU lanemask workaround are:
- it includes SALU reads after the write
- it includes VALU writes and SALU/VALU reads after the write which are
not lanemasks
- it combines s_waitcnt_depctr instructions when it's a read after both a
SALU write and a VALU write
- non-exec VALU SGPR reads reset the SGPRs read by VALU as a lanemask
- exec SGPRs are ignored
resolve_all_gfx11() is also finished.
fossil-db (navi31):
Totals from 21538 (27.13% of 79377) affected shaders:
Instrs: 27628855 -> 27552972 (-0.27%); split: -0.30%, +0.03%
CodeSize: 145968448 -> 145667616 (-0.21%); split: -0.23%, +0.02%
Latency: 209537805 -> 209509519 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 36304270 -> 36301624 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12623
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11480
Backport-to: 25.0
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529 >
2025-04-17 17:28:22 +00:00
Rhys Perry
4fcf2eb1d7
aco/gfx12: VOPD src0/1 are src bank compatible if they are the same vgpr
...
fossil-db (gfx1201):
Totals from 66518 (83.80% of 79377) affected shaders:
Instrs: 36939667 -> 36656685 (-0.77%); split: -0.79%, +0.02%
CodeSize: 220575208 -> 220201764 (-0.17%); split: -0.21%, +0.04%
Latency: 258919732 -> 258137974 (-0.30%); split: -0.35%, +0.05%
InvThroughput: 49911351 -> 49643836 (-0.54%); split: -0.55%, +0.02%
VClause: 788661 -> 788430 (-0.03%); split: -0.04%, +0.01%
SClause: 1176416 -> 1176263 (-0.01%); split: -0.02%, +0.01%
VALU: 18014058 -> 17818119 (-1.09%); split: -1.10%, +0.01%
VOPD: 4926983 -> 5122922 (+3.98%); split: +4.01%, -0.04%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Rhys Perry
3446f2059d
aco/gfx12: assume VOPD with two v_mov_b32 are src bank compatible
...
fossil-db (gfx1201):
Totals from 10576 (13.32% of 79377) affected shaders:
(no stats changed)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Rhys Perry
1bd5ae7b14
aco: refactor can_use_vopd so that it returns flags
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Rhys Perry
d4b418bbb9
aco: add are_src_banks_compatible helper for VOPD creation
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Rhys Perry
4b0da5b51f
aco: rename is_opy_only to can_be_opx
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Rhys Perry
408fa33c09
aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR
...
fossil-db (gfx1201):
Totals from 38908 (49.02% of 79377) affected shaders:
Instrs: 30268107 -> 30268131 (+0.00%); split: -0.00%, +0.00%
CodeSize: 180843648 -> 180843640 (-0.00%); split: -0.00%, +0.00%
Latency: 224905962 -> 224906072 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 44322988 -> 44323004 (+0.00%)
VALU: 15124145 -> 15124167 (+0.00%)
VOPD: 4018504 -> 4018482 (-0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Natalie Vock
3d8db3cbbb
aco: Make private_segment_buffer/scratch_offset per-resume
...
We need different Temps for each resume shader, because registers aren't
preserved across resume boundaries.
This was likely fine in practice because arg registers are the same for
each shader, but resulted in invalid IR and asserts.
Fixes crashes in Indiana Jones RT with assertions enabled on GFX8.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34114 >
2025-04-09 14:21:37 +00:00
Natalie Vock
d1ff9e951a
aco: Fix RT VGPR limit on Navi31/32, GFX11.5, GFX12
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since 128 is not a multiple of the VGPR allocation granule, we will
actually allocate 134 VGPRs. No reason not to use the extra 6.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34265 >
2025-04-09 10:02:52 +00:00
Georg Lehmann
64cae5c48d
aco: form mixed MTBUF/MUBUF clauses
...
This should be one clause (all of the instructions load from the same vertex buffer)
s_clause 0x2 ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2 ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005
Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379 >
2025-04-08 09:22:04 +00:00
Georg Lehmann
babe7f3e12
aco/gfx10: simpler solution to avoid store instructions in clauses
...
Foz-DB Navi21 has no changes.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379 >
2025-04-08 09:22:04 +00:00
Georg Lehmann
c70dcd1451
aco/gfx9+: use d16 global/scratch/buffer loads
...
Full register loads are not nessecary and prevent packing optimizations.
Global/Scratch is GFX9+ so D16 loads are always supported.
We already used LDS D16 loads.
Foz-DB Navi31(mostly RA noise):
Totals from 716 (0.90% of 79789) affected shaders:
Instrs: 3854176 -> 3854238 (+0.00%); split: -0.00%, +0.00%
CodeSize: 20034440 -> 20035220 (+0.00%); split: -0.00%, +0.00%
Latency: 24410951 -> 24411120 (+0.00%)
InvThroughput: 5181276 -> 5181301 (+0.00%)
Copies: 320258 -> 320317 (+0.02%)
VALU: 2207307 -> 2207366 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34346 >
2025-04-04 16:20:39 +00:00
Georg Lehmann
de45676efd
aco/insert_exec: reset exec temporary after combined p_demote + p_end_wqm
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Otherwise the next divergent merge block might re-enable demoted invocations.
Fixes: 90faadae72 ("aco/insert_exec_mask: don't disable dead quads on demote in divergent CF")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12898
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12912
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34278 >
2025-03-31 06:43:22 +00:00
Georg Lehmann
7631b10984
aco: implement mul24_relaxed
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871 >
2025-03-27 06:24:16 +00:00
Natalie Vock
d6cb45dbb0
aco/spill: Allow spilling live-through operands
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
416a016127
aco: Add RegisterDemand(Temp) constructor
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
ca7ce1fb33
aco/spill: Invert reloads map
...
So we can quickly look up if an operand was reloaded without having to
check renames.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
39413ef78f
aco: Add get_temp_reg_changes helper
...
Similar to get_live_changes, but considers live temporary registers
as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Daniel Schürmann
afc605bc9b
aco: Remove empty exec skipping after demote
...
Totals from 858 (1.08% of 79377) affected shaders: (Navi31)
Instrs: 678713 -> 677694 (-0.15%); split: -0.15%, +0.00%
CodeSize: 3732576 -> 3729104 (-0.09%); split: -0.10%, +0.01%
Latency: 4199397 -> 4198632 (-0.02%); split: -0.06%, +0.04%
InvThroughput: 691391 -> 691122 (-0.04%); split: -0.04%, +0.00%
SClause: 14593 -> 14605 (+0.08%)
Copies: 41279 -> 41288 (+0.02%); split: -0.04%, +0.06%
Branches: 13575 -> 13452 (-0.91%)
PreSGPRs: 29069 -> 29039 (-0.10%)
VALU: 426261 -> 426215 (-0.01%); split: -0.01%, +0.00%
SALU: 60458 -> 60471 (+0.02%); split: -0.02%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
90faadae72
aco/insert_exec_mask: don't disable dead quads on demote in divergent CF
...
Also force-enalbe helpers in case of demote in divergent CF.
Totals from 1305 (1.64% of 79377) affected shaders: (Navi31)
Instrs: 926923 -> 922516 (-0.48%); split: -0.48%, +0.00%
CodeSize: 5045292 -> 5027408 (-0.35%); split: -0.36%, +0.00%
Latency: 6176577 -> 6174708 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 931603 -> 931583 (-0.00%); split: -0.00%, +0.00%
SClause: 22816 -> 22855 (+0.17%); split: -0.17%, +0.34%
Copies: 57347 -> 55170 (-3.80%); split: -3.81%, +0.01%
Branches: 18990 -> 18974 (-0.08%)
PreSGPRs: 42734 -> 43248 (+1.20%)
SALU: 90511 -> 86153 (-4.81%); split: -4.85%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
b872ff6ef2
aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF
...
Totals from 4740 (5.97% of 79377) affected shaders: (Navi31)
Instrs: 6273963 -> 6273410 (-0.01%); split: -0.01%, +0.00%
CodeSize: 34306560 -> 34304284 (-0.01%); split: -0.01%, +0.00%
SpillSGPRs: 1793 -> 1797 (+0.22%); split: -0.11%, +0.33%
Latency: 62599300 -> 62598714 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 9117199 -> 9117189 (-0.00%); split: -0.00%, +0.00%
SClause: 223548 -> 223529 (-0.01%); split: -0.02%, +0.01%
Copies: 464248 -> 454711 (-2.05%); split: -2.06%, +0.00%
Branches: 161446 -> 161443 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 226278 -> 225608 (-0.30%)
VALU: 3793235 -> 3793244 (+0.00%); split: -0.00%, +0.00%
SALU: 606184 -> 605759 (-0.07%); split: -0.08%, +0.01%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
69dcd5be3a
aco: don't assume that demote doesn't cause an empty exec mask
...
Totals from 188 (0.24% of 79377) affected shaders: (Navi31)
Instrs: 209239 -> 209473 (+0.11%); split: -0.01%, +0.12%
CodeSize: 1101124 -> 1101744 (+0.06%); split: -0.02%, +0.07%
Latency: 1672182 -> 1672748 (+0.03%); split: -0.11%, +0.14%
InvThroughput: 237276 -> 237546 (+0.11%); split: -0.00%, +0.12%
SClause: 5694 -> 5690 (-0.07%); split: -0.28%, +0.21%
Copies: 21685 -> 21682 (-0.01%); split: -0.12%, +0.10%
Branches: 5740 -> 5863 (+2.14%)
PreSGPRs: 7004 -> 7034 (+0.43%)
VALU: 123595 -> 123641 (+0.04%); split: -0.00%, +0.04%
SALU: 28418 -> 28411 (-0.02%); split: -0.09%, +0.06%
Fixes: f35e229fae ('aco: skip code if exec is empty')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
c1b124ab6c
aco/lower_branches: properly consider exec mask needs of branch targets
...
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:11 +00:00
Rhys Perry
80fef30531
aco/ra: fix free register counting when moving variables
...
info.bounds might be smaller than the bounds available for the moved
variables.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 626aa7b648 ("aco: workaround GFX9 hardware bug for D16 image instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34158 >
2025-03-25 15:14:16 +00:00
Georg Lehmann
d1dca26941
aco/ra: disallow vcc definitions for pseudo scalar trans instrs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB GFX1201:
Totals from 30 (0.04% of 79600) affected shaders:
Instrs: 58843 -> 58820 (-0.04%); split: -0.10%, +0.06%
CodeSize: 302228 -> 301944 (-0.09%); split: -0.13%, +0.04%
Latency: 204566 -> 204432 (-0.07%); split: -0.09%, +0.02%
InvThroughput: 136918 -> 136919 (+0.00%); split: -0.00%, +0.00%
SClause: 1241 -> 1249 (+0.64%); split: -0.56%, +1.21%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34006 >
2025-03-14 13:53:55 +00:00
Samuel Pitoiset
f46830912e
aco: do not apply OMOD/CLAMP for pseudo scalar trans instrs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.
This fixes a GPU hang when launching Enshrouded on GFX1201.
No fossils db changes on GFX1201.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027 >
2025-03-13 11:22:10 +00:00
Georg Lehmann
cac4287aab
aco/validate: fix scalar source validation for DPP and gfx11+ VINTERP
...
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969 >
2025-03-12 11:31:54 +00:00
Georg Lehmann
3b5e537b09
aco/gfx11.5: remove vinterp ddx/ddy path
...
While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969 >
2025-03-12 11:31:54 +00:00
Georg Lehmann
5bfd1547d2
aco: don't assume that v_interp_mov_f32 flushes denorms
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 3 (0.00% of 79789) affected shaders:
Instrs: 1708 -> 1722 (+0.82%)
CodeSize: 9416 -> 9460 (+0.47%)
Latency: 12094 -> 12371 (+2.29%); split: -0.02%, +2.31%
InvThroughput: 1967 -> 1992 (+1.27%)
Copies: 105 -> 106 (+0.95%)
PreVGPRs: 131 -> 132 (+0.76%)
VALU: 1155 -> 1169 (+1.21%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33974 >
2025-03-11 09:51:39 +00:00