Commit graph

3620 commits

Author SHA1 Message Date
Rhys Perry
dd304bfd80 aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR
fossil-db (gfx1201):
Totals from 38908 (49.02% of 79377) affected shaders:
Instrs: 30268107 -> 30268131 (+0.00%); split: -0.00%, +0.00%
CodeSize: 180843648 -> 180843640 (-0.00%); split: -0.00%, +0.00%
Latency: 224905962 -> 224906072 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 44322988 -> 44323004 (+0.00%)
VALU: 15124145 -> 15124167 (+0.00%)
VOPD: 4018504 -> 4018482 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246>
(cherry picked from commit 408fa33c09)
2025-04-22 01:24:31 +02:00
Natalie Vock
3d8db3cbbb aco: Make private_segment_buffer/scratch_offset per-resume
We need different Temps for each resume shader, because registers aren't
preserved across resume boundaries.

This was likely fine in practice because arg registers are the same for
each shader, but resulted in invalid IR and asserts.

Fixes crashes in Indiana Jones RT with assertions enabled on GFX8.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34114>
2025-04-09 14:21:37 +00:00
Natalie Vock
d1ff9e951a aco: Fix RT VGPR limit on Navi31/32, GFX11.5, GFX12
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Since 128 is not a multiple of the VGPR allocation granule, we will
actually allocate 134 VGPRs. No reason not to use the extra 6.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34265>
2025-04-09 10:02:52 +00:00
Georg Lehmann
64cae5c48d aco: form mixed MTBUF/MUBUF clauses
This should be one clause (all of the instructions load from the same vertex buffer)

s_clause 0x2                                                ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2                                                ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen           ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005

Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379>
2025-04-08 09:22:04 +00:00
Georg Lehmann
babe7f3e12 aco/gfx10: simpler solution to avoid store instructions in clauses
Foz-DB Navi21 has no changes.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379>
2025-04-08 09:22:04 +00:00
Georg Lehmann
c70dcd1451 aco/gfx9+: use d16 global/scratch/buffer loads
Full register loads are not nessecary and prevent packing optimizations.
Global/Scratch is GFX9+ so D16 loads are always supported.
We already used LDS D16 loads.

Foz-DB Navi31(mostly RA noise):
Totals from 716 (0.90% of 79789) affected shaders:
Instrs: 3854176 -> 3854238 (+0.00%); split: -0.00%, +0.00%
CodeSize: 20034440 -> 20035220 (+0.00%); split: -0.00%, +0.00%
Latency: 24410951 -> 24411120 (+0.00%)
InvThroughput: 5181276 -> 5181301 (+0.00%)
Copies: 320258 -> 320317 (+0.02%)
VALU: 2207307 -> 2207366 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34346>
2025-04-04 16:20:39 +00:00
Georg Lehmann
de45676efd aco/insert_exec: reset exec temporary after combined p_demote + p_end_wqm
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Otherwise the next divergent merge block might re-enable demoted invocations.

Fixes: 90faadae72 ("aco/insert_exec_mask: don't disable dead quads on demote in divergent CF")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12898
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12912
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34278>
2025-03-31 06:43:22 +00:00
Georg Lehmann
7631b10984 aco: implement mul24_relaxed
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33871>
2025-03-27 06:24:16 +00:00
Natalie Vock
d6cb45dbb0 aco/spill: Allow spilling live-through operands
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730>
2025-03-26 19:18:30 +00:00
Natalie Vock
416a016127 aco: Add RegisterDemand(Temp) constructor
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730>
2025-03-26 19:18:30 +00:00
Natalie Vock
ca7ce1fb33 aco/spill: Invert reloads map
So we can quickly look up if an operand was reloaded without having to
check renames.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730>
2025-03-26 19:18:30 +00:00
Natalie Vock
39413ef78f aco: Add get_temp_reg_changes helper
Similar to get_live_changes, but considers live temporary registers
as well.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29730>
2025-03-26 19:18:30 +00:00
Daniel Schürmann
afc605bc9b aco: Remove empty exec skipping after demote
Totals from 858 (1.08% of 79377) affected shaders: (Navi31)

Instrs: 678713 -> 677694 (-0.15%); split: -0.15%, +0.00%
CodeSize: 3732576 -> 3729104 (-0.09%); split: -0.10%, +0.01%
Latency: 4199397 -> 4198632 (-0.02%); split: -0.06%, +0.04%
InvThroughput: 691391 -> 691122 (-0.04%); split: -0.04%, +0.00%
SClause: 14593 -> 14605 (+0.08%)
Copies: 41279 -> 41288 (+0.02%); split: -0.04%, +0.06%
Branches: 13575 -> 13452 (-0.91%)
PreSGPRs: 29069 -> 29039 (-0.10%)
VALU: 426261 -> 426215 (-0.01%); split: -0.01%, +0.00%
SALU: 60458 -> 60471 (+0.02%); split: -0.02%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
90faadae72 aco/insert_exec_mask: don't disable dead quads on demote in divergent CF
Also force-enalbe helpers in case of demote in divergent CF.

Totals from 1305 (1.64% of 79377) affected shaders: (Navi31)

Instrs: 926923 -> 922516 (-0.48%); split: -0.48%, +0.00%
CodeSize: 5045292 -> 5027408 (-0.35%); split: -0.36%, +0.00%
Latency: 6176577 -> 6174708 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 931603 -> 931583 (-0.00%); split: -0.00%, +0.00%
SClause: 22816 -> 22855 (+0.17%); split: -0.17%, +0.34%
Copies: 57347 -> 55170 (-3.80%); split: -3.81%, +0.01%
Branches: 18990 -> 18974 (-0.08%)
PreSGPRs: 42734 -> 43248 (+1.20%)
SALU: 90511 -> 86153 (-4.81%); split: -4.85%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
b872ff6ef2 aco/insert_exec_mask: if applicable, use s_wqm to restore exec after divergent CF
Totals from 4740 (5.97% of 79377) affected shaders: (Navi31)

Instrs: 6273963 -> 6273410 (-0.01%); split: -0.01%, +0.00%
CodeSize: 34306560 -> 34304284 (-0.01%); split: -0.01%, +0.00%
SpillSGPRs: 1793 -> 1797 (+0.22%); split: -0.11%, +0.33%
Latency: 62599300 -> 62598714 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 9117199 -> 9117189 (-0.00%); split: -0.00%, +0.00%
SClause: 223548 -> 223529 (-0.01%); split: -0.02%, +0.01%
Copies: 464248 -> 454711 (-2.05%); split: -2.06%, +0.00%
Branches: 161446 -> 161443 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 226278 -> 225608 (-0.30%)
VALU: 3793235 -> 3793244 (+0.00%); split: -0.00%, +0.00%
SALU: 606184 -> 605759 (-0.07%); split: -0.08%, +0.01%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
69dcd5be3a aco: don't assume that demote doesn't cause an empty exec mask
Totals from 188 (0.24% of 79377) affected shaders: (Navi31)
Instrs: 209239 -> 209473 (+0.11%); split: -0.01%, +0.12%
CodeSize: 1101124 -> 1101744 (+0.06%); split: -0.02%, +0.07%
Latency: 1672182 -> 1672748 (+0.03%); split: -0.11%, +0.14%
InvThroughput: 237276 -> 237546 (+0.11%); split: -0.00%, +0.12%
SClause: 5694 -> 5690 (-0.07%); split: -0.28%, +0.21%
Copies: 21685 -> 21682 (-0.01%); split: -0.12%, +0.10%
Branches: 5740 -> 5863 (+2.14%)
PreSGPRs: 7004 -> 7034 (+0.43%)
VALU: 123595 -> 123641 (+0.04%); split: -0.00%, +0.04%
SALU: 28418 -> 28411 (-0.02%); split: -0.09%, +0.06%

Fixes: f35e229fae ('aco: skip code if exec is empty')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:12 +00:00
Daniel Schürmann
c1b124ab6c aco/lower_branches: properly consider exec mask needs of branch targets
No fossil changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33619>
2025-03-26 08:45:11 +00:00
Rhys Perry
80fef30531 aco/ra: fix free register counting when moving variables
info.bounds might be smaller than the bounds available for the moved
variables.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 626aa7b648 ("aco: workaround GFX9 hardware bug for D16 image instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34158>
2025-03-25 15:14:16 +00:00
Georg Lehmann
d1dca26941 aco/ra: disallow vcc definitions for pseudo scalar trans instrs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB GFX1201:
Totals from 30 (0.04% of 79600) affected shaders:
Instrs: 58843 -> 58820 (-0.04%); split: -0.10%, +0.06%
CodeSize: 302228 -> 301944 (-0.09%); split: -0.13%, +0.04%
Latency: 204566 -> 204432 (-0.07%); split: -0.09%, +0.02%
InvThroughput: 136918 -> 136919 (+0.00%); split: -0.00%, +0.00%
SClause: 1241 -> 1249 (+0.64%); split: -0.56%, +1.21%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34006>
2025-03-14 13:53:55 +00:00
Samuel Pitoiset
f46830912e aco: do not apply OMOD/CLAMP for pseudo scalar trans instrs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.

This fixes a GPU hang when launching Enshrouded on GFX1201.

No fossils db changes on GFX1201.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027>
2025-03-13 11:22:10 +00:00
Georg Lehmann
cac4287aab aco/validate: fix scalar source validation for DPP and gfx11+ VINTERP
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969>
2025-03-12 11:31:54 +00:00
Georg Lehmann
3b5e537b09 aco/gfx11.5: remove vinterp ddx/ddy path
While the idea to take advantage of the higher throughput wasn't bad,
the hardware wasn't design with this in mind and doesn't behave like expected
with constant sources.

Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33969>
2025-03-12 11:31:54 +00:00
Georg Lehmann
5bfd1547d2 aco: don't assume that v_interp_mov_f32 flushes denorms
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi21:
Totals from 3 (0.00% of 79789) affected shaders:
Instrs: 1708 -> 1722 (+0.82%)
CodeSize: 9416 -> 9460 (+0.47%)
Latency: 12094 -> 12371 (+2.29%); split: -0.02%, +2.31%
InvThroughput: 1967 -> 1992 (+1.27%)
Copies: 105 -> 106 (+0.95%)
PreVGPRs: 131 -> 132 (+0.76%)
VALU: 1155 -> 1169 (+1.21%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33974>
2025-03-11 09:51:39 +00:00
Samuel Pitoiset
dd2e9c11af aco/tests: use GFX1201 instead of GFX1200
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33970>
2025-03-11 06:50:49 +00:00
Rhys Perry
0ec174afd5 aco: insert dependency waits in certain situations
This seems to fix some artifacts, but we're not sure why, so it might not
be a correct or optimal solution.

fossil-db (navi31):
Totals from 28424 (35.81% of 79377) affected shaders:
Instrs: 30112910 -> 30348977 (+0.78%); split: -0.00%, +0.78%
CodeSize: 159542980 -> 160485336 (+0.59%); split: -0.00%, +0.59%
Latency: 221438396 -> 221500856 (+0.03%); split: -0.00%, +0.03%
InvThroughput: 38154231 -> 38159984 (+0.02%); split: -0.00%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 25.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33853>
2025-03-05 16:22:54 +00:00
Georg Lehmann
20dd6dfa12 aco/isel: use s_mul_i32 instead of s_cselect_b32 for a ? b : 0
It doesn't require SCC and this is more consistent with b2f.

Foz-DB Navi21:
Totals from 2107 (2.64% of 79789) affected shaders:
Instrs: 6619774 -> 6619280 (-0.01%); split: -0.01%, +0.00%
CodeSize: 36754448 -> 36752396 (-0.01%); split: -0.01%, +0.00%
Latency: 62207779 -> 62206422 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 13090494 -> 13090204 (-0.00%); split: -0.00%, +0.00%
VClause: 171572 -> 171573 (+0.00%)
SClause: 257528 -> 257530 (+0.00%)
Copies: 607680 -> 607204 (-0.08%); split: -0.10%, +0.02%
VALU: 4189422 -> 4189418 (-0.00%)
SALU: 1001750 -> 1001264 (-0.05%); split: -0.07%, +0.02%

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>
2025-03-04 21:36:17 +00:00
Georg Lehmann
2d68efd9f3 aco/opt_postRA: remove scc == 0 for more opcodes
Convert special case to s_cselect

Foz-DB Navi21:
Totals from 42 (0.05% of 79789) affected shaders:
Instrs: 91826 -> 91690 (-0.15%)
CodeSize: 496304 -> 495680 (-0.13%)
Latency: 1631974 -> 1631948 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 278772 -> 278766 (-0.00%)
SALU: 10627 -> 10491 (-1.28%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>
2025-03-04 21:36:17 +00:00
Georg Lehmann
83247ffa30 aco/opt_postRA: remove scc != 0 with multiple uses
These can always be removed.

Foz-DB Navi21:
Totals from 39 (0.05% of 79789) affected shaders:
Instrs: 138352 -> 138299 (-0.04%)
CodeSize: 710424 -> 710272 (-0.02%)
Latency: 468276 -> 468254 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 108970 -> 108973 (+0.00%)
SALU: 18785 -> 18732 (-0.28%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>
2025-03-04 21:36:17 +00:00
Georg Lehmann
6445ba0f05 aco/opt_postRA: allow try_optimize_scc_nocompare for all instructions
If the old SCC source worked, the new one will too.

Foz-DB Navi21:
Totals from 106 (0.13% of 79789) affected shaders:
Instrs: 255233 -> 254825 (-0.16%)
CodeSize: 1337308 -> 1335692 (-0.12%)
Latency: 1455208 -> 1454524 (-0.05%); split: -0.05%, +0.00%
InvThroughput: 385624 -> 385612 (-0.00%); split: -0.00%, +0.00%
SALU: 53976 -> 53568 (-0.76%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>
2025-03-04 21:36:17 +00:00
Georg Lehmann
3386ea09d4 aco/opt_postRA: split try_optimize_scc_nocompare in two functions
These are two independent steps, no real reason why they should be in the same
function.

No FOZ-DB changes.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33734>
2025-03-04 21:36:17 +00:00
Ivan Avdeev
ff6504d4c0 radv: add experimental support for AMD BC-250 board
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.

It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html

Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
2025-03-04 08:07:31 +00:00
Georg Lehmann
7eb43c3b1c aco/optimizer: delete combine_and_subbrev
This is now done in NIR. No Foz-DB changes on Navi21.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Natalie Vock
d5a2666ad9 aco/ra: Assert operands only clear their own id
This is useful for debugging register assignment, as this case would
usually result in RA silently assigning the same register to multiple
temps at the same time.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Natalie Vock
1967b0f0c4 aco/tests: Add tests for precolored operands in different regs
The first test verifies that, if possible, we don't emit unnecessary
renames/copies for temporaries where it's possible for them to stay
in their current register (if an operand is precolored to the register
the temporary is currently residing in).

The second test verifies that we correctly choose a non-clobbered
operand even if there is one fixed to the temporary's current register.
To minimize copies, we'll want to have the live copy of
%tmp0 in v[2] there, because v[0-1] gets overwritten.

The third test verifies that we add a copy to another free register and
rename if all possible precolored operands are clobbered.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Natalie Vock
b8bcc8e5c5 aco/ra: Handle temps fixed to different regs in different operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Natalie Vock
7a4775b396 aco/ra: Add option to skip renaming for parallelcopies
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Natalie Vock
b339bcfa38 aco/ra: Use struct for parallelcopies
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Natalie Vock
3f182bc1fa aco/ra: Use iterators for linear VGPR copy extraction
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Daniel Schürmann
3c27a9f0e2 aco/tests: add more tests for chained branches
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762>
2025-02-27 10:40:01 +00:00
Daniel Schürmann
713396ec8e aco/assembler: Don't insert chained branches into otherwise empty blocks
No fossil changes, but keeps block offsets of the empty blocks intact.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762>
2025-02-27 10:40:01 +00:00
Daniel Schürmann
6659db285a aco/assembler: Fix short jumps over chained branches
If we insert

   <code>
   s_branch 1
   s_branch Target

at the end of some block, and later hide an additional chained branch
after the existing one, then we have to update the 's_branch 1' to
also jump over the newly added branch.

Fixes: cab5639a09 ('aco/assembler: chain branches instead of emitting long jumps')
Closes: #12673
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762>
2025-02-27 10:40:01 +00:00
Alyssa Rosenzweig
9a58a8257e treewide: Switch to nir_progress
Via the Coccinelle patch at the end of the commit message, followed by

sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')
ninja -C ~/mesa/build clang-format
cd ~/mesa/src/compiler/nir && clang-format -i *.c
agxfmt

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -return true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    -}
    +bool progress = prog_expr;
    +return nir_progress(progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    +nir_progress(prog, impl, metadata);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -return true;
    +return nir_progress(true, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    +return nir_no_progress(impl);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -other_prog |= prog;
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -other_prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -other_prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +nir_progress(impl_progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    +prog = nir_progress(true, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    +nir_no_progress(impl);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    +nir_progress(true, impl, metadata);

squashme! sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
2025-02-26 15:19:53 +00:00
Georg Lehmann
c249556bf4 aco/insert_exec: fix continue_or_break on gfx6-7
s_cmp_lg_u64 is gfx8+

Fixes: 115ff5f95b ("aco/insert_exec_mask: don't restore exec in continue_or_break blocks")

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33715>
2025-02-24 20:41:17 +00:00
Daniel Schürmann
ea765162c3 aco/ssa_elimination: create a single parallelcopy instruction for linear and logical phis
Totals from 6651 (8.38% of 79377) affected shaders: (Navi31)

Instrs: 14722896 -> 14722290 (-0.00%); split: -0.01%, +0.00%
CodeSize: 77992072 -> 77989284 (-0.00%); split: -0.01%, +0.00%
Latency: 160542885 -> 160541215 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 24543177 -> 24542710 (-0.00%); split: -0.00%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527>
2025-02-24 13:11:20 +00:00
Daniel Schürmann
0e98388614 aco/ssa_elimination: refactor scratch_sgpr handling
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527>
2025-02-24 13:11:20 +00:00
Daniel Schürmann
302678df91 aco/ssa_elimination: insert parallelcopies for p_phi immediately before branch
Totals from 2499 (3.15% of 79377) affected shaders: (Navi31)
Instrs: 6011729 -> 6011761 (+0.00%); split: -0.00%, +0.00%
CodeSize: 31573216 -> 31574236 (+0.00%); split: -0.00%, +0.00%
Latency: 83364734 -> 83365781 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 13545643 -> 13545783 (+0.00%); split: -0.00%, +0.00%

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527>
2025-02-24 13:11:20 +00:00
Daniel Schürmann
794c2b7e2f aco/lower_branches: allow other instructions after s_andn2 in break blocks
We are about to insert parallelcopies from phis there.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527>
2025-02-24 13:11:20 +00:00
Daniel Schürmann
115ff5f95b aco/insert_exec_mask: don't restore exec in continue_or_break blocks
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527>
2025-02-24 13:11:20 +00:00
Daniel Schürmann
7f7c1d463a aco/insert_exec_mask: Don't immediately set exec to zero in break/continue blocks
Instead, only indicate that exec should be zero and do
so in the successive helper block. This allows to insert
the parallelcopies from logical phis directly before the
branch in break and continue blocks.

Totals from 56 (0.07% of 79377) affected shaders: (Navi31)
Latency: 2472367 -> 2472422 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 253053 -> 253055 (+0.00%); split: -0.00%, +0.00%

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33527>
2025-02-24 13:11:20 +00:00
Daniel Schürmann
df2697c9ab aco/scheduler: remove unused include of unordered_set
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00