Rhys Perry
dd304bfd80
aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR
...
fossil-db (gfx1201):
Totals from 38908 (49.02% of 79377) affected shaders:
Instrs: 30268107 -> 30268131 (+0.00%); split: -0.00%, +0.00%
CodeSize: 180843648 -> 180843640 (-0.00%); split: -0.00%, +0.00%
Latency: 224905962 -> 224906072 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 44322988 -> 44323004 (+0.00%)
VALU: 15124145 -> 15124167 (+0.00%)
VOPD: 4018504 -> 4018482 (-0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
(cherry picked from commit 408fa33c09 )
2025-04-22 01:24:31 +02:00
Natalie Vock
3d8db3cbbb
aco: Make private_segment_buffer/scratch_offset per-resume
...
We need different Temps for each resume shader, because registers aren't
preserved across resume boundaries.
This was likely fine in practice because arg registers are the same for
each shader, but resulted in invalid IR and asserts.
Fixes crashes in Indiana Jones RT with assertions enabled on GFX8.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34114 >
2025-04-09 14:21:37 +00:00
Natalie Vock
d1ff9e951a
aco: Fix RT VGPR limit on Navi31/32, GFX11.5, GFX12
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since 128 is not a multiple of the VGPR allocation granule, we will
actually allocate 134 VGPRs. No reason not to use the extra 6.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34265 >
2025-04-09 10:02:52 +00:00
Georg Lehmann
64cae5c48d
aco: form mixed MTBUF/MUBUF clauses
...
This should be one clause (all of the instructions load from the same vertex buffer)
s_clause 0x2 ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2 ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005
Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379 >
2025-04-08 09:22:04 +00:00
Georg Lehmann
babe7f3e12
aco/gfx10: simpler solution to avoid store instructions in clauses
...
Foz-DB Navi21 has no changes.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379 >
2025-04-08 09:22:04 +00:00
Georg Lehmann
c70dcd1451
aco/gfx9+: use d16 global/scratch/buffer loads
...
Full register loads are not nessecary and prevent packing optimizations.
Global/Scratch is GFX9+ so D16 loads are always supported.
We already used LDS D16 loads.
Foz-DB Navi31(mostly RA noise):
Totals from 716 (0.90% of 79789) affected shaders:
Instrs: 3854176 -> 3854238 (+0.00%); split: -0.00%, +0.00%
CodeSize: 20034440 -> 20035220 (+0.00%); split: -0.00%, +0.00%
Latency: 24410951 -> 24411120 (+0.00%)
InvThroughput: 5181276 -> 5181301 (+0.00%)
Copies: 320258 -> 320317 (+0.02%)
VALU: 2207307 -> 2207366 (+0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34346 >
2025-04-04 16:20:39 +00:00
Georg Lehmann
de45676efd
aco/insert_exec: reset exec temporary after combined p_demote + p_end_wqm
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Otherwise the next divergent merge block might re-enable demoted invocations.
Fixes: 90faadae72 ("aco/insert_exec_mask: don't disable dead quads on demote in divergent CF")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12898
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12912
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34278 >
2025-03-31 06:43:22 +00:00
Georg Lehmann
7631b10984
aco: implement mul24_relaxed
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871 >
2025-03-27 06:24:16 +00:00
Natalie Vock
d6cb45dbb0
aco/spill: Allow spilling live-through operands
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
416a016127
aco: Add RegisterDemand(Temp) constructor
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
ca7ce1fb33
aco/spill: Invert reloads map
...
So we can quickly look up if an operand was reloaded without having to
check renames.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Natalie Vock
39413ef78f
aco: Add get_temp_reg_changes helper
...
Similar to get_live_changes, but considers live temporary registers
as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730 >
2025-03-26 19:18:30 +00:00
Daniel Schürmann
afc605bc9b
aco: Remove empty exec skipping after demote
...
Totals from 858 (1.08% of 79377) affected shaders: (Navi31)
Instrs: 678713 -> 677694 (-0.15%); split: -0.15%, +0.00%
CodeSize: 3732576 -> 3729104 (-0.09%); split: -0.10%, +0.01%
Latency: 4199397 -> 4198632 (-0.02%); split: -0.06%, +0.04%
InvThroughput: 691391 -> 691122 (-0.04%); split: -0.04%, +0.00%
SClause: 14593 -> 14605 (+0.08%)
Copies: 41279 -> 41288 (+0.02%); split: -0.04%, +0.06%
Branches: 13575 -> 13452 (-0.91%)
PreSGPRs: 29069 -> 29039 (-0.10%)
VALU: 426261 -> 426215 (-0.01%); split: -0.01%, +0.00%
SALU: 60458 -> 60471 (+0.02%); split: -0.02%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
90faadae72
aco/insert_exec_mask: don't disable dead quads on demote in divergent CF
...
Also force-enalbe helpers in case of demote in divergent CF.
Totals from 1305 (1.64% of 79377) affected shaders: (Navi31)
Instrs: 926923 -> 922516 (-0.48%); split: -0.48%, +0.00%
CodeSize: 5045292 -> 5027408 (-0.35%); split: -0.36%, +0.00%
Latency: 6176577 -> 6174708 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 931603 -> 931583 (-0.00%); split: -0.00%, +0.00%
SClause: 22816 -> 22855 (+0.17%); split: -0.17%, +0.34%
Copies: 57347 -> 55170 (-3.80%); split: -3.81%, +0.01%
Branches: 18990 -> 18974 (-0.08%)
PreSGPRs: 42734 -> 43248 (+1.20%)
SALU: 90511 -> 86153 (-4.81%); split: -4.85%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
b872ff6ef2
aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF
...
Totals from 4740 (5.97% of 79377) affected shaders: (Navi31)
Instrs: 6273963 -> 6273410 (-0.01%); split: -0.01%, +0.00%
CodeSize: 34306560 -> 34304284 (-0.01%); split: -0.01%, +0.00%
SpillSGPRs: 1793 -> 1797 (+0.22%); split: -0.11%, +0.33%
Latency: 62599300 -> 62598714 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 9117199 -> 9117189 (-0.00%); split: -0.00%, +0.00%
SClause: 223548 -> 223529 (-0.01%); split: -0.02%, +0.01%
Copies: 464248 -> 454711 (-2.05%); split: -2.06%, +0.00%
Branches: 161446 -> 161443 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 226278 -> 225608 (-0.30%)
VALU: 3793235 -> 3793244 (+0.00%); split: -0.00%, +0.00%
SALU: 606184 -> 605759 (-0.07%); split: -0.08%, +0.01%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
69dcd5be3a
aco: don't assume that demote doesn't cause an empty exec mask
...
Totals from 188 (0.24% of 79377) affected shaders: (Navi31)
Instrs: 209239 -> 209473 (+0.11%); split: -0.01%, +0.12%
CodeSize: 1101124 -> 1101744 (+0.06%); split: -0.02%, +0.07%
Latency: 1672182 -> 1672748 (+0.03%); split: -0.11%, +0.14%
InvThroughput: 237276 -> 237546 (+0.11%); split: -0.00%, +0.12%
SClause: 5694 -> 5690 (-0.07%); split: -0.28%, +0.21%
Copies: 21685 -> 21682 (-0.01%); split: -0.12%, +0.10%
Branches: 5740 -> 5863 (+2.14%)
PreSGPRs: 7004 -> 7034 (+0.43%)
VALU: 123595 -> 123641 (+0.04%); split: -0.00%, +0.04%
SALU: 28418 -> 28411 (-0.02%); split: -0.09%, +0.06%
Fixes: f35e229fae ('aco: skip code if exec is empty')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:12 +00:00
Daniel Schürmann
c1b124ab6c
aco/lower_branches: properly consider exec mask needs of branch targets
...
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619 >
2025-03-26 08:45:11 +00:00
Rhys Perry
80fef30531
aco/ra: fix free register counting when moving variables
...
info.bounds might be smaller than the bounds available for the moved
variables.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 626aa7b648 ("aco: workaround GFX9 hardware bug for D16 image instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34158 >
2025-03-25 15:14:16 +00:00
Georg Lehmann
d1dca26941
aco/ra: disallow vcc definitions for pseudo scalar trans instrs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB GFX1201:
Totals from 30 (0.04% of 79600) affected shaders:
Instrs: 58843 -> 58820 (-0.04%); split: -0.10%, +0.06%
CodeSize: 302228 -> 301944 (-0.09%); split: -0.13%, +0.04%
Latency: 204566 -> 204432 (-0.07%); split: -0.09%, +0.02%
InvThroughput: 136918 -> 136919 (+0.00%); split: -0.00%, +0.00%
SClause: 1241 -> 1249 (+0.64%); split: -0.56%, +1.21%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34006 >
2025-03-14 13:53:55 +00:00
Samuel Pitoiset
f46830912e
aco: do not apply OMOD/CLAMP for pseudo scalar trans instrs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.
This fixes a GPU hang when launching Enshrouded on GFX1201.
No fossils db changes on GFX1201.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027 >
2025-03-13 11:22:10 +00:00
Georg Lehmann
cac4287aab
aco/validate: fix scalar source validation for DPP and gfx11+ VINTERP
...
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969 >
2025-03-12 11:31:54 +00:00
Georg Lehmann
3b5e537b09
aco/gfx11.5: remove vinterp ddx/ddy path
...
While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.
Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969 >
2025-03-12 11:31:54 +00:00
Georg Lehmann
5bfd1547d2
aco: don't assume that v_interp_mov_f32 flushes denorms
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 3 (0.00% of 79789) affected shaders:
Instrs: 1708 -> 1722 (+0.82%)
CodeSize: 9416 -> 9460 (+0.47%)
Latency: 12094 -> 12371 (+2.29%); split: -0.02%, +2.31%
InvThroughput: 1967 -> 1992 (+1.27%)
Copies: 105 -> 106 (+0.95%)
PreVGPRs: 131 -> 132 (+0.76%)
VALU: 1155 -> 1169 (+1.21%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33974 >
2025-03-11 09:51:39 +00:00
Samuel Pitoiset
dd2e9c11af
aco/tests: use GFX1201 instead of GFX1200
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33970 >
2025-03-11 06:50:49 +00:00
Rhys Perry
0ec174afd5
aco: insert dependency waits in certain situations
...
This seems to fix some artifacts, but we're not sure why, so it might not
be a correct or optimal solution.
fossil-db (navi31):
Totals from 28424 (35.81% of 79377) affected shaders:
Instrs: 30112910 -> 30348977 (+0.78%); split: -0.00%, +0.78%
CodeSize: 159542980 -> 160485336 (+0.59%); split: -0.00%, +0.59%
Latency: 221438396 -> 221500856 (+0.03%); split: -0.00%, +0.03%
InvThroughput: 38154231 -> 38159984 (+0.02%); split: -0.00%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33853 >
2025-03-05 16:22:54 +00:00
Georg Lehmann
20dd6dfa12
aco/isel: use s_mul_i32 instead of s_cselect_b32 for a ? b : 0
...
It doesn't require SCC and this is more consistent with b2f.
Foz-DB Navi21:
Totals from 2107 (2.64% of 79789) affected shaders:
Instrs: 6619774 -> 6619280 (-0.01%); split: -0.01%, +0.00%
CodeSize: 36754448 -> 36752396 (-0.01%); split: -0.01%, +0.00%
Latency: 62207779 -> 62206422 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 13090494 -> 13090204 (-0.00%); split: -0.00%, +0.00%
VClause: 171572 -> 171573 (+0.00%)
SClause: 257528 -> 257530 (+0.00%)
Copies: 607680 -> 607204 (-0.08%); split: -0.10%, +0.02%
VALU: 4189422 -> 4189418 (-0.00%)
SALU: 1001750 -> 1001264 (-0.05%); split: -0.07%, +0.02%
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
2d68efd9f3
aco/opt_postRA: remove scc == 0 for more opcodes
...
Convert special case to s_cselect
Foz-DB Navi21:
Totals from 42 (0.05% of 79789) affected shaders:
Instrs: 91826 -> 91690 (-0.15%)
CodeSize: 496304 -> 495680 (-0.13%)
Latency: 1631974 -> 1631948 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 278772 -> 278766 (-0.00%)
SALU: 10627 -> 10491 (-1.28%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
83247ffa30
aco/opt_postRA: remove scc != 0 with multiple uses
...
These can always be removed.
Foz-DB Navi21:
Totals from 39 (0.05% of 79789) affected shaders:
Instrs: 138352 -> 138299 (-0.04%)
CodeSize: 710424 -> 710272 (-0.02%)
Latency: 468276 -> 468254 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 108970 -> 108973 (+0.00%)
SALU: 18785 -> 18732 (-0.28%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
6445ba0f05
aco/opt_postRA: allow try_optimize_scc_nocompare for all instructions
...
If the old SCC source worked, the new one will too.
Foz-DB Navi21:
Totals from 106 (0.13% of 79789) affected shaders:
Instrs: 255233 -> 254825 (-0.16%)
CodeSize: 1337308 -> 1335692 (-0.12%)
Latency: 1455208 -> 1454524 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 385624 -> 385612 (-0.00%); split: -0.00%, +0.00%
SALU: 53976 -> 53568 (-0.76%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Georg Lehmann
3386ea09d4
aco/opt_postRA: split try_optimize_scc_nocompare in two functions
...
These are two independent steps, no real reason why they should be in the same
function.
No FOZ-DB changes.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734 >
2025-03-04 21:36:17 +00:00
Ivan Avdeev
ff6504d4c0
radv: add experimental support for AMD BC-250 board
...
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.
It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html
Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116 >
2025-03-04 08:07:31 +00:00
Georg Lehmann
7eb43c3b1c
aco/optimizer: delete combine_and_subbrev
...
This is now done in NIR. No Foz-DB changes on Navi21.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761 >
2025-03-01 07:49:28 +00:00
Natalie Vock
d5a2666ad9
aco/ra: Assert operands only clear their own id
...
This is useful for debugging register assignment, as this case would
usually result in RA silently assigning the same register to multiple
temps at the same time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Natalie Vock
1967b0f0c4
aco/tests: Add tests for precolored operands in different regs
...
The first test verifies that, if possible, we don't emit unnecessary
renames/copies for temporaries where it's possible for them to stay
in their current register (if an operand is precolored to the register
the temporary is currently residing in).
The second test verifies that we correctly choose a non-clobbered
operand even if there is one fixed to the temporary's current register.
To minimize copies, we'll want to have the live copy of
%tmp0 in v[2] there, because v[0-1] gets overwritten.
The third test verifies that we add a copy to another free register and
rename if all possible precolored operands are clobbered.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Natalie Vock
b8bcc8e5c5
aco/ra: Handle temps fixed to different regs in different operands
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Natalie Vock
7a4775b396
aco/ra: Add option to skip renaming for parallelcopies
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Natalie Vock
b339bcfa38
aco/ra: Use struct for parallelcopies
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Natalie Vock
3f182bc1fa
aco/ra: Use iterators for linear VGPR copy extraction
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Daniel Schürmann
3c27a9f0e2
aco/tests: add more tests for chained branches
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762 >
2025-02-27 10:40:01 +00:00
Daniel Schürmann
713396ec8e
aco/assembler: Don't insert chained branches into otherwise empty blocks
...
No fossil changes, but keeps block offsets of the empty blocks intact.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762 >
2025-02-27 10:40:01 +00:00
Daniel Schürmann
6659db285a
aco/assembler: Fix short jumps over chained branches
...
If we insert
<code>
s_branch 1
s_branch Target
at the end of some block, and later hide an additional chained branch
after the existing one, then we have to update the 's_branch 1' to
also jump over the newly added branch.
Fixes: cab5639a09 ('aco/assembler: chain branches instead of emitting long jumps')
Closes : #12673
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762 >
2025-02-27 10:40:01 +00:00
Alyssa Rosenzweig
9a58a8257e
treewide: Switch to nir_progress
...
Via the Coccinelle patch at the end of the commit message, followed by
sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')
ninja -C ~/mesa/build clang-format
cd ~/mesa/src/compiler/nir && clang-format -i *.c
agxfmt
@@
identifier prog;
expression impl, metadata;
@@
-if (prog) {
-nir_metadata_preserve(impl, metadata);
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
-return prog;
+return nir_progress(prog, impl, metadata);
@@
expression prog_expr, impl, metadata;
@@
-if (prog_expr) {
-nir_metadata_preserve(impl, metadata);
-return true;
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-return false;
-}
+bool progress = prog_expr;
+return nir_progress(progress, impl, metadata);
@@
identifier prog;
expression impl, metadata;
@@
-nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
-return prog;
+return nir_progress(prog, impl, metadata);
@@
identifier prog;
expression impl, metadata;
@@
-nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
+nir_progress(prog, impl, metadata);
@@
expression impl, metadata;
@@
-nir_metadata_preserve(impl, metadata);
-return true;
+return nir_progress(true, impl, metadata);
@@
expression impl;
@@
-nir_metadata_preserve(impl, nir_metadata_all);
-return false;
+return nir_no_progress(impl);
@@
identifier other_prog, prog;
expression impl, metadata;
@@
-if (prog) {
-nir_metadata_preserve(impl, metadata);
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
-other_prog |= prog;
+other_prog = other_prog | nir_progress(prog, impl, metadata);
@@
identifier prog;
expression impl, metadata;
@@
-if (prog) {
-nir_metadata_preserve(impl, metadata);
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
+nir_progress(prog, impl, metadata);
@@
identifier other_prog, prog;
expression impl, metadata;
@@
-if (prog) {
-nir_metadata_preserve(impl, metadata);
-other_prog = true;
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
+other_prog = other_prog | nir_progress(prog, impl, metadata);
@@
expression prog_expr, impl, metadata;
identifier prog;
@@
-if (prog_expr) {
-nir_metadata_preserve(impl, metadata);
-prog = true;
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
+bool impl_progress = prog_expr;
+prog = prog | nir_progress(impl_progress, impl, metadata);
@@
identifier other_prog, prog;
expression impl, metadata;
@@
-if (prog) {
-other_prog = true;
-nir_metadata_preserve(impl, metadata);
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
+other_prog = other_prog | nir_progress(prog, impl, metadata);
@@
expression prog_expr, impl, metadata;
identifier prog;
@@
-if (prog_expr) {
-prog = true;
-nir_metadata_preserve(impl, metadata);
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
+bool impl_progress = prog_expr;
+prog = prog | nir_progress(impl_progress, impl, metadata);
@@
expression prog_expr, impl, metadata;
@@
-if (prog_expr) {
-nir_metadata_preserve(impl, metadata);
-} else {
-nir_metadata_preserve(impl, nir_metadata_all);
-}
+bool impl_progress = prog_expr;
+nir_progress(impl_progress, impl, metadata);
@@
identifier prog;
expression impl, metadata;
@@
-nir_metadata_preserve(impl, metadata);
-prog = true;
+prog = nir_progress(true, impl, metadata);
@@
identifier prog;
expression impl, metadata;
@@
-if (prog) {
-nir_metadata_preserve(impl, metadata);
-}
-return prog;
+return nir_progress(prog, impl, metadata);
@@
identifier prog;
expression impl, metadata;
@@
-if (prog) {
-nir_metadata_preserve(impl, metadata);
-}
+nir_progress(prog, impl, metadata);
@@
expression impl;
@@
-nir_metadata_preserve(impl, nir_metadata_all);
+nir_no_progress(impl);
@@
expression impl, metadata;
@@
-nir_metadata_preserve(impl, metadata);
+nir_progress(true, impl, metadata);
squashme! sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722 >
2025-02-26 15:19:53 +00:00
Georg Lehmann
c249556bf4
aco/insert_exec: fix continue_or_break on gfx6-7
...
s_cmp_lg_u64 is gfx8+
Fixes: 115ff5f95b ("aco/insert_exec_mask: don't restore exec in continue_or_break blocks")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33715 >
2025-02-24 20:41:17 +00:00
Daniel Schürmann
ea765162c3
aco/ssa_elimination: create a single parallelcopy instruction for linear and logical phis
...
Totals from 6651 (8.38% of 79377) affected shaders: (Navi31)
Instrs: 14722896 -> 14722290 (-0.00%); split: -0.01%, +0.00%
CodeSize: 77992072 -> 77989284 (-0.00%); split: -0.01%, +0.00%
Latency: 160542885 -> 160541215 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 24543177 -> 24542710 (-0.00%); split: -0.00%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Daniel Schürmann
0e98388614
aco/ssa_elimination: refactor scratch_sgpr handling
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Daniel Schürmann
302678df91
aco/ssa_elimination: insert parallelcopies for p_phi immediately before branch
...
Totals from 2499 (3.15% of 79377) affected shaders: (Navi31)
Instrs: 6011729 -> 6011761 (+0.00%); split: -0.00%, +0.00%
CodeSize: 31573216 -> 31574236 (+0.00%); split: -0.00%, +0.00%
Latency: 83364734 -> 83365781 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 13545643 -> 13545783 (+0.00%); split: -0.00%, +0.00%
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Daniel Schürmann
794c2b7e2f
aco/lower_branches: allow other instructions after s_andn2 in break blocks
...
We are about to insert parallelcopies from phis there.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Daniel Schürmann
115ff5f95b
aco/insert_exec_mask: don't restore exec in continue_or_break blocks
...
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Daniel Schürmann
7f7c1d463a
aco/insert_exec_mask: Don't immediately set exec to zero in break/continue blocks
...
Instead, only indicate that exec should be zero and do
so in the successive helper block. This allows to insert
the parallelcopies from logical phis directly before the
branch in break and continue blocks.
Totals from 56 (0.07% of 79377) affected shaders: (Navi31)
Latency: 2472367 -> 2472422 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 253053 -> 253055 (+0.00%); split: -0.00%, +0.00%
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527 >
2025-02-24 13:11:20 +00:00
Daniel Schürmann
df2697c9ab
aco/scheduler: remove unused include of unordered_set
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644 >
2025-02-21 13:49:41 +00:00