Rhys Perry
05364ea153
aco/ra: fix repeated compact_linear_vgprs() in get_reg()
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: b7738de4f9 ("aco/ra: rework linear VGPR allocation")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13431
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35838 >
(cherry picked from commit dce1d4ad4c )
2025-07-16 16:23:07 +02:00
Rhys Perry
27d61a6fd2
aco: update ctx.block when inserting discard block
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13432
Backport-to: 25.1
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35833 >
(cherry picked from commit 21c4400278 )
2025-07-02 16:19:32 +02:00
Rhys Perry
be6ede15eb
aco/lower_branches: keep blocks with multiple logical successors
...
It might be the case that both the branch and exec mask write in a
divergent branch block are removed. try_remove_simple_block() might then
try to remove it, but fail because it has multiple logical successors.
Instead, just skip these blocks.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202 >
(cherry picked from commit 5344abbc56 )
2025-07-02 16:19:29 +02:00
Georg Lehmann
fc89f9e60d
aco: do not use v_cvt_pk_u8_f32 for f2u8
...
The ISA docs don't mention this, but instead of always truncating
like other integer conversions, this opcode actually uses the single
precision rounding mode.
We could continue to use the opcode and set the rounding mode to rtz
in lower_to_hw_instrs, but I think I should just concede that f2u8
isn't worth the effort.
Fixes: 9bb10b58 ("aco: use v_cvt_pk_u8_f32 for f2u8")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35391 >
(cherry picked from commit d95e90ab5f )
2025-06-18 17:55:45 +02:00
Rhys Perry
b03ee0a308
aco/gfx12: fix VALUReadSGPRHazard with carry-out
...
fossil-db (gfx1201):
Totals from 370 (0.46% of 79653) affected shaders:
Instrs: 3933639 -> 3935914 (+0.06%)
CodeSize: 20743448 -> 20752068 (+0.04%); split: -0.00%, +0.04%
Latency: 26261246 -> 26261921 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 5363675 -> 5363760 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 65f95ae74e ("aco/insert_NOPs: implement VALU -> VALU case for VALUReadSGPRHazard on GFX12")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35387 >
(cherry picked from commit a714a19e16 )
2025-06-18 17:55:45 +02:00
Rhys Perry
aad33936b6
aco: set vmem_types for args_pending_vmem
...
fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:
fossil-db (navi31):
Totals from 11 (0.01% of 79653) affected shaders:
Instrs: 4543 -> 4554 (+0.24%)
CodeSize: 23256 -> 23300 (+0.19%)
fossil-db (navi21):
Totals from 8 (0.01% of 79653) affected shaders:
Instrs: 2333 -> 2341 (+0.34%)
CodeSize: 12328 -> 12360 (+0.26%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978 >
(cherry picked from commit 62a9b4b976 )
2025-06-18 17:55:44 +02:00
Georg Lehmann
250f423546
aco: clamp exponent of 16bit ldexp
...
The hw uses only a 16bit int, but NIR's src is 32bit.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34073 >
(cherry picked from commit a6675f35b2 )
2025-06-04 15:52:49 +02:00
Georg Lehmann
4fa2dbdf55
aco: assume sram ecc is enabled on Vega20
...
There are D16 load issues on Vega20 that are expected if sram ecc is enabled.
It's a professional class chip and I found mentions of it supporting ecc,
so assume it's enabled.
Maybe this could be improved by querying ecc info from the kernel, but
I'm not sure which query should be used.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13189
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12393
Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35045 >
(cherry picked from commit 0257644130 )
2025-05-20 20:18:08 +02:00
Rhys Perry
799685659d
aco/gfx115: consider point sample acceleration
...
Like 15428e0d786939a5c7629a9978947c8a9112ce96 in LLVM.
fossil-db (gfx1150):
Totals from 909 (1.14% of 79653) affected shaders:
Instrs: 5840489 -> 5840705 (+0.00%); split: -0.00%, +0.00%
CodeSize: 31133460 -> 31134296 (+0.00%); split: -0.00%, +0.00%
Latency: 52982280 -> 53438577 (+0.86%); split: -0.00%, +0.86%
InvThroughput: 10841454 -> 10942682 (+0.93%); split: -0.00%, +0.93%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34935 >
(cherry picked from commit 171920ceed )
2025-05-20 20:18:07 +02:00
Rhys Perry
77e7fd0dee
aco: swap the correct v_mov_b32 if there are two of them
...
Previously, this function tried to swap the instruction which is not
v_mov_b32, so that it doesn't introduce any new OPY-only instructions. If
both were v_mov_b32, it swapped Y. Since this makes Y opy-only, this can't
be done if X is also opy-only.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 408fa33c09 ("aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13101
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34841 >
(cherry picked from commit 9ca71b52aa )
2025-05-07 09:04:39 +02:00
Rhys Perry
87902dca71
aco: fix get_temp_reg_changes with clobbered operands
...
The spiller might have tried to spill a live-through first or second
s_fmac_f32 operand, but this wouldn't have reduced the SGPRs if the third
operand wasn't killed
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13038
Fixes: d6cb45dbb0 ("aco/spill: Allow spilling live-through operands")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34699 >
(cherry picked from commit 7fe84024cb )
2025-04-30 14:15:47 +02:00
Rhys Perry
e1f06788f5
aco/gfx11: create waitcnt for workgroup vmem barriers
...
It seems this is necessary on GFX11.
Similar to 576a2e798c
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34634 >
(cherry picked from commit b03e071583 )
2025-04-27 11:45:27 +02:00
Georg Lehmann
3d9ac270e2
aco/insert_exec: reset temporary when recreating wqm mask from exact mask
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The old, now incorrect temporary was still used for invert blocks and loop masks.
Foz-DB Navi31:
Totals from 379 (0.48% of 79789) affected shaders:
Instrs: 399471 -> 399897 (+0.11%); split: -0.00%, +0.11%
CodeSize: 2197292 -> 2198908 (+0.07%); split: -0.00%, +0.08%
Latency: 2500636 -> 2500895 (+0.01%); split: -0.00%, +0.01%
SClause: 7912 -> 7918 (+0.08%); split: -0.04%, +0.11%
Copies: 25687 -> 26068 (+1.48%); split: -0.04%, +1.53%
PreSGPRs: 15648 -> 15562 (-0.55%)
SALU: 35125 -> 35517 (+1.12%)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12901
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13019
Fixes: b872ff6ef2 ("aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34659 >
(cherry picked from commit dd3e1190a2 )
2025-04-23 12:21:56 +02:00
Georg Lehmann
4fb4880183
aco/insert_exec: only restore wqm mask after control flow if necessary
...
The next commit will make this not free, so we should avoid it if possible.
Foz-DB Navi31:
Totals from 3933 (4.93% of 79789) affected shaders:
Instrs: 5726914 -> 5727295 (+0.01%); split: -0.00%, +0.01%
CodeSize: 31307100 -> 31308884 (+0.01%); split: -0.00%, +0.01%
SpillSGPRs: 1797 -> 1793 (-0.22%); split: -0.33%, +0.11%
Latency: 58973929 -> 58974343 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 8591893 -> 8591911 (+0.00%); split: -0.00%, +0.00%
SClause: 209074 -> 209115 (+0.02%); split: -0.00%, +0.02%
Copies: 423965 -> 432420 (+1.99%)
Branches: 149976 -> 149979 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 200175 -> 200663 (+0.24%)
VALU: 3440165 -> 3440156 (-0.00%); split: -0.00%, +0.00%
SALU: 555727 -> 556143 (+0.07%); split: -0.00%, +0.08%
Fixes: b872ff6ef2 ("aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34659 >
(cherry picked from commit 13f6be262a )
2025-04-23 12:21:56 +02:00
Georg Lehmann
d3285fe971
aco: set opsel_hi to 1 for WMMA
...
This is ignored by the hardware but LLVM requires it to disassemble GFX12 WMMA.
Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34396 >
(cherry picked from commit b0c8f31600 )
2025-04-23 12:21:56 +02:00
Rhys Perry
76db8496a9
aco: combine VALU lanemask hazard into VALUMaskWriteHazard
...
This is now basically the same as the original VALUMaskWriteHazard, except
it now considers both VALU and SALU writes.
Now that it's a part of VALUMaskWriteHazard, differences from the original
VALU lanemask workaround are:
- it includes SALU reads after the write
- it includes VALU writes and SALU/VALU reads after the write which are
not lanemasks
- it combines s_waitcnt_depctr instructions when it's a read after both a
SALU write and a VALU write
- non-exec VALU SGPR reads reset the SGPRs read by VALU as a lanemask
- exec SGPRs are ignored
resolve_all_gfx11() is also finished.
fossil-db (navi31):
Totals from 21538 (27.13% of 79377) affected shaders:
Instrs: 27628855 -> 27552972 (-0.27%); split: -0.30%, +0.03%
CodeSize: 145968448 -> 145667616 (-0.21%); split: -0.23%, +0.02%
Latency: 209537805 -> 209509519 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 36304270 -> 36301624 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12623
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11480
Backport-to: 25.0
Backport-to: 25.1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529 >
(cherry picked from commit ce2be5ab8e )
2025-04-22 01:24:39 +02:00
Rhys Perry
dd304bfd80
aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR
...
fossil-db (gfx1201):
Totals from 38908 (49.02% of 79377) affected shaders:
Instrs: 30268107 -> 30268131 (+0.00%); split: -0.00%, +0.00%
CodeSize: 180843648 -> 180843640 (-0.00%); split: -0.00%, +0.00%
Latency: 224905962 -> 224906072 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 44322988 -> 44323004 (+0.00%)
VALU: 15124145 -> 15124167 (+0.00%)
VOPD: 4018504 -> 4018482 (-0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
(cherry picked from commit 408fa33c09 )
2025-04-22 01:24:31 +02:00
Natalie Vock
3d8db3cbbb
aco: Make private_segment_buffer/scratch_offset per-resume
...
We need different Temps for each resume shader, because registers aren't
preserved across resume boundaries.
This was likely fine in practice because arg registers are the same for
each shader, but resulted in invalid IR and asserts.
Fixes crashes in Indiana Jones RT with assertions enabled on GFX8.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34114 >
2025-04-09 14:21:37 +00:00
Natalie Vock
d1ff9e951a
aco: Fix RT VGPR limit on Navi31/32, GFX11.5, GFX12
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since 128 is not a multiple of the VGPR allocation granule, we will
actually allocate 134 VGPRs. No reason not to use the extra 6.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34265 >
2025-04-09 10:02:52 +00:00
Georg Lehmann
64cae5c48d
aco: form mixed MTBUF/MUBUF clauses
...
This should be one clause (all of the instructions load from the same vertex buffer)
s_clause 0x2 ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2 ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005
Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379 >
2025-04-08 09:22:04 +00:00
Georg Lehmann
babe7f3e12
aco/gfx10: simpler solution to avoid store instructions in clauses
...
Foz-DB Navi21 has no changes.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379 >
2025-04-08 09:22:04 +00:00
Georg Lehmann
c70dcd1451
aco/gfx9+: use d16 global/scratch/buffer loads
...
Full register loads are not nessecary and prevent packing optimizations.
Global/Scratch is GFX9+ so D16 loads are always supported.
We already used LDS D16 loads.
Foz-DB Navi31(mostly RA noise):
Totals from 716 (0.90% of 79789) affected shaders:
Instrs: 3854176 -> 3854238 (+0.00%); split: -0.00%, +0.00%
CodeSize: 20034440 -> 20035220 (+0.00%); split: -0.00%, +0.00%
Latency: 24410951 -> 24411120 (+0.00%)
InvThroughput: 5181276 -> 5181301 (+0.00%)
Copies: 320258 -> 320317 (+0.02%)
VALU: 2207307 -> 2207366 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34346 >
2025-04-04 16:20:39 +00:00
Georg Lehmann
de45676efd
aco/insert_exec: reset exec temporary after combined p_demote + p_end_wqm
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Otherwise the next divergent merge block might re-enable demoted invocations.
Fixes: 90faadae72 ("aco/insert_exec_mask: don't disable dead quads on demote in divergent CF")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12898
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12912
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34278 >
2025-03-31 06:43:22 +00:00
Georg Lehmann
7631b10984
aco: implement mul24_relaxed
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871 >
2025-03-27 06:24:16 +00:00
Natalie Vock
d6cb45dbb0
aco/spill: Allow spilling live-through operands
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
416a016127
aco: Add RegisterDemand(Temp) constructor
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
ca7ce1fb33
aco/spill: Invert reloads map
...
So we can quickly look up if an operand was reloaded without having to
check renames.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
39413ef78f
aco: Add get_temp_reg_changes helper
...
Similar to get_live_changes, but considers live temporary registers
as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Daniel Schürmann
afc605bc9b
aco: Remove empty exec skipping after demote
...
Totals from 858 (1.08% of 79377) affected shaders: (Navi31)
Instrs: 678713 -> 677694 (-0.15%); split: -0.15%, +0.00%
CodeSize: 3732576 -> 3729104 (-0.09%); split: -0.10%, +0.01%
Latency: 4199397 -> 4198632 (-0.02%); split: -0.06%, +0.04%
InvThroughput: 691391 -> 691122 (-0.04%); split: -0.04%, +0.00%
SClause: 14593 -> 14605 (+0.08%)
Copies: 41279 -> 41288 (+0.02%); split: -0.04%, +0.06%
Branches: 13575 -> 13452 (-0.91%)
PreSGPRs: 29069 -> 29039 (-0.10%)
VALU: 426261 -> 426215 (-0.01%); split: -0.01%, +0.00%
SALU: 60458 -> 60471 (+0.02%); split: -0.02%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
90faadae72
aco/insert_exec_mask: don't disable dead quads on demote in divergent CF
...
Also force-enalbe helpers in case of demote in divergent CF.
Totals from 1305 (1.64% of 79377) affected shaders: (Navi31)
Instrs: 926923 -> 922516 (-0.48%); split: -0.48%, +0.00%
CodeSize: 5045292 -> 5027408 (-0.35%); split: -0.36%, +0.00%
Latency: 6176577 -> 6174708 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 931603 -> 931583 (-0.00%); split: -0.00%, +0.00%
SClause: 22816 -> 22855 (+0.17%); split: -0.17%, +0.34%
Copies: 57347 -> 55170 (-3.80%); split: -3.81%, +0.01%
Branches: 18990 -> 18974 (-0.08%)
PreSGPRs: 42734 -> 43248 (+1.20%)
SALU: 90511 -> 86153 (-4.81%); split: -4.85%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
b872ff6ef2
aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF
...
Totals from 4740 (5.97% of 79377) affected shaders: (Navi31)
Instrs: 6273963 -> 6273410 (-0.01%); split: -0.01%, +0.00%
CodeSize: 34306560 -> 34304284 (-0.01%); split: -0.01%, +0.00%
SpillSGPRs: 1793 -> 1797 (+0.22%); split: -0.11%, +0.33%
Latency: 62599300 -> 62598714 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 9117199 -> 9117189 (-0.00%); split: -0.00%, +0.00%
SClause: 223548 -> 223529 (-0.01%); split: -0.02%, +0.01%
Copies: 464248 -> 454711 (-2.05%); split: -2.06%, +0.00%
Branches: 161446 -> 161443 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 226278 -> 225608 (-0.30%)
VALU: 3793235 -> 3793244 (+0.00%); split: -0.00%, +0.00%
SALU: 606184 -> 605759 (-0.07%); split: -0.08%, +0.01%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
69dcd5be3a
aco: don't assume that demote doesn't cause an empty exec mask
...
Totals from 188 (0.24% of 79377) affected shaders: (Navi31)
Instrs: 209239 -> 209473 (+0.11%); split: -0.01%, +0.12%
CodeSize: 1101124 -> 1101744 (+0.06%); split: -0.02%, +0.07%
Latency: 1672182 -> 1672748 (+0.03%); split: -0.11%, +0.14%
InvThroughput: 237276 -> 237546 (+0.11%); split: -0.00%, +0.12%
SClause: 5694 -> 5690 (-0.07%); split: -0.28%, +0.21%
Copies: 21685 -> 21682 (-0.01%); split: -0.12%, +0.10%
Branches: 5740 -> 5863 (+2.14%)
PreSGPRs: 7004 -> 7034 (+0.43%)
VALU: 123595 -> 123641 (+0.04%); split: -0.00%, +0.04%
SALU: 28418 -> 28411 (-0.02%); split: -0.09%, +0.06%
Fixes: f35e229fae ('aco: skip code if exec is empty')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
c1b124ab6c
aco/lower_branches: properly consider exec mask needs of branch targets
...
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:11 +00:00
Rhys Perry
80fef30531
aco/ra: fix free register counting when moving variables
...
info.bounds might be smaller than the bounds available for the moved
variables.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 626aa7b648 ("aco: workaround GFX9 hardware bug for D16 image instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34158 >
2025-03-25 15:14:16 +00:00
Georg Lehmann
d1dca26941
aco/ra: disallow vcc definitions for pseudo scalar trans instrs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB GFX1201:
Totals from 30 (0.04% of 79600) affected shaders:
Instrs: 58843 -> 58820 (-0.04%); split: -0.10%, +0.06%
CodeSize: 302228 -> 301944 (-0.09%); split: -0.13%, +0.04%
Latency: 204566 -> 204432 (-0.07%); split: -0.09%, +0.02%
InvThroughput: 136918 -> 136919 (+0.00%); split: -0.00%, +0.00%
SClause: 1241 -> 1249 (+0.64%); split: -0.56%, +1.21%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34006 >
2025-03-14 13:53:55 +00:00
Samuel Pitoiset
f46830912e
aco: do not apply OMOD/CLAMP for pseudo scalar trans instrs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.
This fixes a GPU hang when launching Enshrouded on GFX1201.
No fossils db changes on GFX1201.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027 >
2025-03-13 11:22:10 +00:00
Georg Lehmann
cac4287aab
aco/validate: fix scalar source validation for DPP and gfx11+ VINTERP
...
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969 >
2025-03-12 11:31:54 +00:00
Georg Lehmann
3b5e537b09
aco/gfx11.5: remove vinterp ddx/ddy path
...
While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969 >
2025-03-12 11:31:54 +00:00
Georg Lehmann
5bfd1547d2
aco: don't assume that v_interp_mov_f32 flushes denorms
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 3 (0.00% of 79789) affected shaders:
Instrs: 1708 -> 1722 (+0.82%)
CodeSize: 9416 -> 9460 (+0.47%)
Latency: 12094 -> 12371 (+2.29%); split: -0.02%, +2.31%
InvThroughput: 1967 -> 1992 (+1.27%)
Copies: 105 -> 106 (+0.95%)
PreVGPRs: 131 -> 132 (+0.76%)
VALU: 1155 -> 1169 (+1.21%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33974 >
2025-03-11 09:51:39 +00:00
Samuel Pitoiset
dd2e9c11af
aco/tests: use GFX1201 instead of GFX1200
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33970 >
2025-03-11 06:50:49 +00:00
Rhys Perry
0ec174afd5
aco: insert dependency waits in certain situations
...
This seems to fix some artifacts, but we're not sure why, so it might not
be a correct or optimal solution.
fossil-db (navi31):
Totals from 28424 (35.81% of 79377) affected shaders:
Instrs: 30112910 -> 30348977 (+0.78%); split: -0.00%, +0.78%
CodeSize: 159542980 -> 160485336 (+0.59%); split: -0.00%, +0.59%
Latency: 221438396 -> 221500856 (+0.03%); split: -0.00%, +0.03%
InvThroughput: 38154231 -> 38159984 (+0.02%); split: -0.00%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33853 >
2025-03-05 16:22:54 +00:00
Georg Lehmann
20dd6dfa12
aco/isel: use s_mul_i32 instead of s_cselect_b32 for a ? b : 0
...
It doesn't require SCC and this is more consistent with b2f.
Foz-DB Navi21:
Totals from 2107 (2.64% of 79789) affected shaders:
Instrs: 6619774 -> 6619280 (-0.01%); split: -0.01%, +0.00%
CodeSize: 36754448 -> 36752396 (-0.01%); split: -0.01%, +0.00%
Latency: 62207779 -> 62206422 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 13090494 -> 13090204 (-0.00%); split: -0.00%, +0.00%
VClause: 171572 -> 171573 (+0.00%)
SClause: 257528 -> 257530 (+0.00%)
Copies: 607680 -> 607204 (-0.08%); split: -0.10%, +0.02%
VALU: 4189422 -> 4189418 (-0.00%)
SALU: 1001750 -> 1001264 (-0.05%); split: -0.07%, +0.02%
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
2d68efd9f3
aco/opt_postRA: remove scc == 0 for more opcodes
...
Convert special case to s_cselect
Foz-DB Navi21:
Totals from 42 (0.05% of 79789) affected shaders:
Instrs: 91826 -> 91690 (-0.15%)
CodeSize: 496304 -> 495680 (-0.13%)
Latency: 1631974 -> 1631948 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 278772 -> 278766 (-0.00%)
SALU: 10627 -> 10491 (-1.28%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
83247ffa30
aco/opt_postRA: remove scc != 0 with multiple uses
...
These can always be removed.
Foz-DB Navi21:
Totals from 39 (0.05% of 79789) affected shaders:
Instrs: 138352 -> 138299 (-0.04%)
CodeSize: 710424 -> 710272 (-0.02%)
Latency: 468276 -> 468254 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 108970 -> 108973 (+0.00%)
SALU: 18785 -> 18732 (-0.28%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
6445ba0f05
aco/opt_postRA: allow try_optimize_scc_nocompare for all instructions
...
If the old SCC source worked, the new one will too.
Foz-DB Navi21:
Totals from 106 (0.13% of 79789) affected shaders:
Instrs: 255233 -> 254825 (-0.16%)
CodeSize: 1337308 -> 1335692 (-0.12%)
Latency: 1455208 -> 1454524 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 385624 -> 385612 (-0.00%); split: -0.00%, +0.00%
SALU: 53976 -> 53568 (-0.76%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
3386ea09d4
aco/opt_postRA: split try_optimize_scc_nocompare in two functions
...
These are two independent steps, no real reason why they should be in the same
function.
No FOZ-DB changes.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Ivan Avdeev
ff6504d4c0
radv: add experimental support for AMD BC-250 board
...
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.
It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html
Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116 >
2025-03-04 08:07:31 +00:00
Georg Lehmann
7eb43c3b1c
aco/optimizer: delete combine_and_subbrev
...
This is now done in NIR. No Foz-DB changes on Navi21.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761 >
2025-03-01 07:49:28 +00:00
Natalie Vock
d5a2666ad9
aco/ra: Assert operands only clear their own id
...
This is useful for debugging register assignment, as this case would
usually result in RA silently assigning the same register to multiple
temps at the same time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Natalie Vock
1967b0f0c4
aco/tests: Add tests for precolored operands in different regs
...
The first test verifies that, if possible, we don't emit unnecessary
renames/copies for temporaries where it's possible for them to stay
in their current register (if an operand is precolored to the register
the temporary is currently residing in).
The second test verifies that we correctly choose a non-clobbered
operand even if there is one fixed to the temporary's current register.
To minimize copies, we'll want to have the live copy of
%tmp0 in v[2] there, because v[0-1] gets overwritten.
The third test verifies that we add a copy to another free register and
rename if all possible precolored operands are clobbered.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00