Commit graph

395 commits

Author SHA1 Message Date
Natalie Vock
c515f1fd58 aco: Use vector-aligned operands for image_bvh8_intersect_ray
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:38 +00:00
Rhys Perry
0094e6c32a aco: optimize lds-only or vmem-only flat access
fossil-db (polaris10):
Totals from 138 (0.22% of 62070) affected shaders:
Instrs: 233452 -> 234436 (+0.42%)
CodeSize: 1209392 -> 1213220 (+0.32%)
Latency: 3934496 -> 3928089 (-0.16%); split: -0.17%, +0.00%
InvThroughput: 3040782 -> 3038562 (-0.07%); split: -0.07%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>
2025-07-11 12:15:08 +00:00
Rhys Perry
d705b6198c aco: simplify waitcnt insertion for flat access
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>
2025-07-11 12:15:08 +00:00
Rhys Perry
34f1a8f707 aco: handle FPAtomicToDenormModeHazard
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is quite unlikely to happen, but I guess it might be possible and
it's relatively simple to work around.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35884>
2025-07-07 13:02:43 +00:00
Rhys Perry
2cfd2d3b1d aco/tests: add lower_branches tests
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202>
2025-06-19 10:58:39 +00:00
Rhys Perry
86ccceb4de aco: don't consider gfx1153 to have point sample acceleration
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
f10b49781d aco: make all wait entries linear
If we remove exec skips, then we can wait for an entry on all paths in the
linear cfg, but not the logical cfg.

fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 1586 (1.99% of 79653) affected shaders:
Instrs: 5118897 -> 5113206 (-0.11%); split: -0.11%, +0.00%
CodeSize: 28365852 -> 28343696 (-0.08%); split: -0.08%, +0.00%
Latency: 47820341 -> 47799532 (-0.04%); split: -0.09%, +0.05%
InvThroughput: 9904391 -> 9908653 (+0.04%); split: -0.02%, +0.06%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
1088ac49db aco: sometimes join linear wait entries on logical edges
fossil-db (gfx1201):
Totals from 1303 (1.64% of 79653) affected shaders:
Instrs: 6920949 -> 6917692 (-0.05%); split: -0.06%, +0.01%
CodeSize: 37112404 -> 37095728 (-0.04%); split: -0.05%, +0.01%
Latency: 70471343 -> 70365986 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 11515673 -> 11504666 (-0.10%); split: -0.10%, +0.01%

fossil-db (navi31):
Totals from 1293 (1.62% of 79653) affected shaders:
Instrs: 6500186 -> 6496761 (-0.05%); split: -0.06%, +0.01%
CodeSize: 34562712 -> 34549236 (-0.04%); split: -0.04%, +0.01%
Latency: 68604746 -> 68666532 (+0.09%); split: -0.15%, +0.24%
InvThroughput: 11276591 -> 11284914 (+0.07%); split: -0.10%, +0.17%

fossil-db (navi21):
Totals from 811 (1.02% of 79653) affected shaders:
Instrs: 4110953 -> 4108788 (-0.05%); split: -0.05%, +0.00%
CodeSize: 22955984 -> 22948064 (-0.03%); split: -0.03%, +0.00%
Latency: 35070231 -> 35064448 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6945610 -> 6945053 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
c1f8537131 aco: skip waitcnt between two vmem writing different lanes
fossil-db (gfx1201):
Totals from 1382 (1.74% of 79653) affected shaders:
Instrs: 6531704 -> 6523935 (-0.12%); split: -0.12%, +0.00%
CodeSize: 34992076 -> 34933568 (-0.17%); split: -0.17%, +0.01%
Latency: 70183360 -> 69616066 (-0.81%); split: -0.81%, +0.00%
InvThroughput: 11155445 -> 11068667 (-0.78%); split: -0.78%, +0.00%

fossil-db (navi31):
Totals from 46 (0.06% of 79653) affected shaders:
Instrs: 1833768 -> 1833732 (-0.00%)
CodeSize: 9468788 -> 9468716 (-0.00%)
Latency: 11683092 -> 11667865 (-0.13%)
InvThroughput: 2274377 -> 2272872 (-0.07%)

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
9649deb50e aco: skip waitcnt between two vmem writing different halves
fossil-db (gfx1201):
Totals from 4 (0.01% of 79653) affected shaders:
Instrs: 41374 -> 41380 (+0.01%); split: -0.01%, +0.02%
CodeSize: 238912 -> 238924 (+0.01%); split: -0.01%, +0.01%
Latency: 706714 -> 706410 (-0.04%)
InvThroughput: 352269 -> 352118 (-0.04%)
VClause: 803 -> 798 (-0.62%)

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13028
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
c50f9541e4 aco/tests: Add tests for vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
6aabcb02a1 aco/print_ir: only print 'lateKill' if requested via print_kill flag
Also only print lateKill for actually killed operands.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:16 +00:00
Rhys Perry
072e6d1ab5 aco/tests: add tests for tied definitions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Some of these would have failed before the rewrite.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34700>
2025-05-20 15:40:47 +00:00
Rhys Perry
171920ceed aco/gfx115: consider point sample acceleration
Like 15428e0d786939a5c7629a9978947c8a9112ce96 in LLVM.

fossil-db (gfx1150):
Totals from 909 (1.14% of 79653) affected shaders:
Instrs: 5840489 -> 5840705 (+0.00%); split: -0.00%, +0.00%
CodeSize: 31133460 -> 31134296 (+0.00%); split: -0.00%, +0.00%
Latency: 52982280 -> 53438577 (+0.86%); split: -0.00%, +0.86%
InvThroughput: 10841454 -> 10942682 (+0.93%); split: -0.00%, +0.93%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34935>
2025-05-14 11:22:13 +00:00
Daniel Schürmann
2b0536e921 aco: remove block_kind_continue_or_break workaround and tests
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479>
2025-05-09 17:20:29 +00:00
Rhys Perry
20279c28c8 aco/tests: add pseudo-scalar transcendental and fallback path RA tests
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34343>
2025-04-29 15:15:11 +00:00
Rhys Perry
62e50de5d0 aco: use v_perm_b32 for byte swaps within a VGPR on gfx10
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636>
2025-04-23 18:23:18 +00:00
Rhys Perry
a43783fd76 aco: use v_perm_b32 for do_pack_2x16 on gfx10+
fossil-db (gfx1201);
Totals from 93 (0.12% of 79377) affected shaders:
Instrs: 373212 -> 372761 (-0.12%)
CodeSize: 2062752 -> 2063704 (+0.05%); split: -0.00%, +0.05%
Latency: 4172059 -> 4171993 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1299144 -> 1299093 (-0.00%)
Copies: 51268 -> 50831 (-0.85%)
Branches: 10980 -> 10979 (-0.01%)
VALU: 220192 -> 219756 (-0.20%)
VOPD: 48 -> 47 (-2.08%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636>
2025-04-23 18:23:18 +00:00
Rhys Perry
3d6fa6996c aco: init vm_vsrc/sa_sdst from depctr_wait
fossil-db (navi31):
Totals from 5805 (7.31% of 79377) affected shaders:
Instrs: 14229621 -> 14207115 (-0.16%); split: -0.16%, +0.00%
CodeSize: 75358724 -> 75268624 (-0.12%); split: -0.12%, +0.00%
Latency: 133637034 -> 133624262 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 22067819 -> 22066213 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529>
2025-04-17 17:28:22 +00:00
Rhys Perry
4fcf2eb1d7 aco/gfx12: VOPD src0/1 are src bank compatible if they are the same vgpr
fossil-db (gfx1201):
Totals from 66518 (83.80% of 79377) affected shaders:
Instrs: 36939667 -> 36656685 (-0.77%); split: -0.79%, +0.02%
CodeSize: 220575208 -> 220201764 (-0.17%); split: -0.21%, +0.04%
Latency: 258919732 -> 258137974 (-0.30%); split: -0.35%, +0.05%
InvThroughput: 49911351 -> 49643836 (-0.54%); split: -0.55%, +0.02%
VClause: 788661 -> 788430 (-0.03%); split: -0.04%, +0.01%
SClause: 1176416 -> 1176263 (-0.01%); split: -0.02%, +0.01%
VALU: 18014058 -> 17818119 (-1.09%); split: -1.10%, +0.01%
VOPD: 4926983 -> 5122922 (+3.98%); split: +4.01%, -0.04%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246>
2025-04-17 14:00:29 +00:00
Rhys Perry
3446f2059d aco/gfx12: assume VOPD with two v_mov_b32 are src bank compatible
fossil-db (gfx1201):
Totals from 10576 (13.32% of 79377) affected shaders:
(no stats changed)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246>
2025-04-17 14:00:29 +00:00
Rhys Perry
408fa33c09 aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR
fossil-db (gfx1201):
Totals from 38908 (49.02% of 79377) affected shaders:
Instrs: 30268107 -> 30268131 (+0.00%); split: -0.00%, +0.00%
CodeSize: 180843648 -> 180843640 (-0.00%); split: -0.00%, +0.00%
Latency: 224905962 -> 224906072 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 44322988 -> 44323004 (+0.00%)
VALU: 15124145 -> 15124167 (+0.00%)
VOPD: 4018504 -> 4018482 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246>
2025-04-17 14:00:29 +00:00
Georg Lehmann
64cae5c48d aco: form mixed MTBUF/MUBUF clauses
This should be one clause (all of the instructions load from the same vertex buffer)

s_clause 0x2                                                ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2                                                ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen           ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005

Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379>
2025-04-08 09:22:04 +00:00
Samuel Pitoiset
f46830912e aco: do not apply OMOD/CLAMP for pseudo scalar trans instrs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.

This fixes a GPU hang when launching Enshrouded on GFX1201.

No fossils db changes on GFX1201.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027>
2025-03-13 11:22:10 +00:00
Samuel Pitoiset
dd2e9c11af aco/tests: use GFX1201 instead of GFX1200
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33970>
2025-03-11 06:50:49 +00:00
Georg Lehmann
7eb43c3b1c aco/optimizer: delete combine_and_subbrev
This is now done in NIR. No Foz-DB changes on Navi21.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Natalie Vock
1967b0f0c4 aco/tests: Add tests for precolored operands in different regs
The first test verifies that, if possible, we don't emit unnecessary
renames/copies for temporaries where it's possible for them to stay
in their current register (if an operand is precolored to the register
the temporary is currently residing in).

The second test verifies that we correctly choose a non-clobbered
operand even if there is one fixed to the temporary's current register.
To minimize copies, we'll want to have the live copy of
%tmp0 in v[2] there, because v[0-1] gets overwritten.

The third test verifies that we add a copy to another free register and
rename if all possible precolored operands are clobbered.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Daniel Schürmann
3c27a9f0e2 aco/tests: add more tests for chained branches
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762>
2025-02-27 10:40:01 +00:00
Daniel Schürmann
1a8a643bbd aco/isel: track control flow divergence in loops more accurately
We introduce two new variables, cf_context::in_divergent_cf and
cf_context::parent_loop.has_divergent_break, in order to determine
whether there is any other invocations on a different CF path.

Totals from 1305 (1.64% of 79395) affected shaders: (Navi31)

Instrs: 659211 -> 657815 (-0.21%); split: -0.22%, +0.01%
CodeSize: 3483228 -> 3477960 (-0.15%); split: -0.16%, +0.01%
VGPRs: 68820 -> 48048 (-30.18%)
Latency: 14197750 -> 14170767 (-0.19%); split: -0.26%, +0.07%
InvThroughput: 1619103 -> 1619826 (+0.04%); split: -0.02%, +0.07%
VClause: 12384 -> 12350 (-0.27%)
SClause: 26693 -> 26844 (+0.57%); split: -0.01%, +0.57%
Copies: 44994 -> 43535 (-3.24%); split: -3.26%, +0.02%
PreSGPRs: 49007 -> 48907 (-0.20%)
PreVGPRs: 32171 -> 32121 (-0.16%)
VALU: 349984 -> 349857 (-0.04%); split: -0.04%, +0.00%
SALU: 84252 -> 83988 (-0.31%); split: -0.32%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
61fa007e48 aco/isel: fix empty exec tracking for uniform branches
Totals from 5 (0.01% of 79395) affected shaders: (Navi31)

Instrs: 54730 -> 54715 (-0.03%)
CodeSize: 276928 -> 276852 (-0.03%)
Latency: 215212 -> 214874 (-0.16%)
InvThroughput: 40154 -> 40150 (-0.01%)
Copies: 6824 -> 6821 (-0.04%); split: -0.06%, +0.01%
Branches: 1625 -> 1615 (-0.62%)
SALU: 5682 -> 5678 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
65f95ae74e aco/insert_NOPs: implement VALU -> VALU case for VALUReadSGPRHazard on GFX12
Totals from 36918 (46.50% of 79395) affected shaders: (GFX1200)

Instrs: 34997889 -> 35296429 (+0.85%); split: -0.00%, +0.85%
CodeSize: 186161112 -> 187334364 (+0.63%); split: -0.00%, +0.63%
Latency: 250265551 -> 250330784 (+0.03%); split: -0.00%, +0.03%
InvThroughput: 41185298 -> 41192503 (+0.02%); split: -0.00%, +0.02%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32682>
2025-01-30 03:13:16 +00:00
Georg Lehmann
b23ff87db4 aco/sched_ilp: base latency and issue cycles on aco_statistics
This matters for trans and scalar fpu instructions.

Foz-DB GFX1150:
Totals from 53894 (67.90% of 79377) affected shaders:
Instrs: 38528421 -> 38481337 (-0.12%); split: -0.16%, +0.04%
CodeSize: 200206016 -> 200023916 (-0.09%); split: -0.12%, +0.03%
Latency: 265011734 -> 264303762 (-0.27%); split: -0.28%, +0.02%
InvThroughput: 53804490 -> 53696097 (-0.20%); split: -0.21%, +0.01%
VClause: 736996 -> 736988 (-0.00%); split: -0.00%, +0.00%
SClause: 1118494 -> 1118474 (-0.00%); split: -0.01%, +0.01%
VALU: 21982349 -> 21982358 (+0.00%); split: -0.00%, +0.00%

Foz-DB Navi31:
Totals from 50791 (63.99% of 79377) affected shaders:
Instrs: 37511862 -> 37495712 (-0.04%); split: -0.11%, +0.07%
CodeSize: 197990892 -> 197925104 (-0.03%); split: -0.09%, +0.06%
Latency: 261929261 -> 261273534 (-0.25%); split: -0.27%, +0.01%
InvThroughput: 43978329 -> 43921618 (-0.13%); split: -0.14%, +0.01%
VClause: 727683 -> 727695 (+0.00%); split: -0.00%, +0.00%
SClause: 1092527 -> 1092544 (+0.00%); split: -0.01%, +0.01%
VALU: 22646553 -> 22646566 (+0.00%)

Foz-DB Navi21:
Totals from 43899 (55.30% of 79377) affected shaders:
Instrs: 35649081 -> 35649110 (+0.00%); split: -0.00%, +0.00%
CodeSize: 192336212 -> 192337276 (+0.00%); split: -0.00%, +0.00%
Latency: 270621538 -> 270221431 (-0.15%); split: -0.16%, +0.02%
InvThroughput: 66757841 -> 66715918 (-0.06%); split: -0.07%, +0.01%
VClause: 734884 -> 734867 (-0.00%); split: -0.01%, +0.01%
SClause: 1072956 -> 1072951 (-0.00%); split: -0.01%, +0.01%

Foz-DB Vega10:
Totals from 52687 (83.60% of 63026) affected shaders:
Instrs: 24595280 -> 24595693 (+0.00%); split: -0.01%, +0.01%
CodeSize: 127199836 -> 127200164 (+0.00%); split: -0.01%, +0.01%
Latency: 252281578 -> 252497934 (+0.09%); split: -0.03%, +0.12%
InvThroughput: 136551527 -> 136577609 (+0.02%); split: -0.01%, +0.03%
VClause: 536798 -> 536718 (-0.01%); split: -0.04%, +0.03%
SClause: 819978 -> 819693 (-0.03%); split: -0.04%, +0.01%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33222>
2025-01-28 17:00:45 +00:00
Georg Lehmann
819938d2fa aco/sched_ilp: new latency heuristic
The main train of thought is that we should consider latency after
the write was scheduled. This means we rely a lot less on the input
order of instructions for good results.

Foz-DB GFX1150:
Totals from 75606 (95.25% of 79377) affected shaders:
Instrs: 43274326 -> 42129011 (-2.65%); split: -2.65%, +0.01%
CodeSize: 223049932 -> 218465796 (-2.06%); split: -2.06%, +0.00%
Latency: 297614199 -> 292317054 (-1.78%); split: -1.84%, +0.06%
InvThroughput: 57020160 -> 56336213 (-1.20%); split: -1.21%, +0.02%
VClause: 841775 -> 841861 (+0.01%); split: -0.06%, +0.07%
SClause: 1253516 -> 1253798 (+0.02%); split: -0.03%, +0.05%
VALU: 23893837 -> 23893828 (-0.00%); split: -0.00%, +0.00%

Foz-DB Navi31:
Totals from 75606 (95.25% of 79377) affected shaders:
Instrs: 42717592 -> 41531696 (-2.78%); split: -2.78%, +0.00%
CodeSize: 223582476 -> 218866196 (-2.11%); split: -2.11%, +0.00%
Latency: 297736383 -> 292450493 (-1.78%); split: -1.83%, +0.05%
InvThroughput: 47298730 -> 46934084 (-0.77%); split: -0.78%, +0.01%
VClause: 844982 -> 844892 (-0.01%); split: -0.07%, +0.06%
SClause: 1248433 -> 1248693 (+0.02%); split: -0.03%, +0.05%
VALU: 24819703 -> 24819704 (+0.00%); split: -0.00%, +0.00%

Foz-DB Navi21:
Totals from 76224 (96.03% of 79377) affected shaders:
Instrs: 46019515 -> 46015691 (-0.01%); split: -0.03%, +0.03%
CodeSize: 246992544 -> 246977404 (-0.01%); split: -0.03%, +0.02%
Latency: 324647457 -> 318661132 (-1.84%); split: -1.90%, +0.05%
InvThroughput: 74834800 -> 74269723 (-0.76%); split: -0.76%, +0.01%
VClause: 927601 -> 927579 (-0.00%); split: -0.04%, +0.04%
SClause: 1302666 -> 1303178 (+0.04%); split: -0.02%, +0.06%

Foz-DB Vega10:
Totals from 60142 (95.42% of 63026) affected shaders:
Instrs: 25117688 -> 25098175 (-0.08%); split: -0.10%, +0.02%
CodeSize: 129847464 -> 129769456 (-0.06%); split: -0.08%, +0.02%
Latency: 261606546 -> 262407481 (+0.31%); split: -0.12%, +0.43%
InvThroughput: 138422594 -> 138500401 (+0.06%); split: -0.03%, +0.09%
VClause: 555424 -> 555321 (-0.02%); split: -0.11%, +0.09%
SClause: 851219 -> 851620 (+0.05%); split: -0.03%, +0.08%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33222>
2025-01-28 17:00:44 +00:00
Georg Lehmann
df1de388a3 aco/sched_ilp: reorder VINTRP
VINTRP(gfx6-gfx10.3) is mostly just VALU, but we treated it like memory
instructions as an afterthought. This had issues as VINTRP was never reordered
with itself, or other memory instructions. Reordering VINTRP in clauses
increases ILP. We don't really need collect_clause_dependencies for VINTRP
either, because they ususally have the same dependencies already. That means
we can still form VINTRP clauses by selecting preferably VINTRP after a
previous one.

Foz-DB Navi21:
Totals from 34184 (43.16% of 79206) affected shaders:
Instrs: 18811270 -> 18812046 (+0.00%); split: -0.01%, +0.02%
CodeSize: 103627276 -> 103630056 (+0.00%); split: -0.01%, +0.01%
Latency: 188379364 -> 187936731 (-0.23%); split: -0.27%, +0.03%
InvThroughput: 42600163 -> 42590608 (-0.02%); split: -0.03%, +0.00%
VClause: 378960 -> 378912 (-0.01%); split: -0.02%, +0.00%
SClause: 727560 -> 720573 (-0.96%); split: -1.08%, +0.12%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33111>
2025-01-27 11:59:45 +00:00
Rhys Perry
0eb5f66660 nir/validate: validate ssa dominance by default
This no longer modifies dominance metadata, so enable it by default.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32005>
2025-01-23 23:35:44 +00:00
Timur Kristóf
50035f0316 ac/nir: Move all ac_nir_* files to a new folder.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:46:30 +01:00
Timur Kristóf
305fdfddb5 ac/nir: Move ac_set_nir_options to ac_nir.c
And rename it to ac_nir_set_options to match other functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:45:34 +01:00
Rhys Perry
2b10930b48 aco: use VOP3 v_mov_b16 if necessary
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32944>
2025-01-10 15:05:00 +00:00
Rhys Perry
8ac4744706 aco/tests: fix skip_lines=True with remaining characters in matches
If the remaining character check fails, we should try a later line if
skip_lines=True. So the check has to be done earlier.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32902>
2025-01-08 15:28:37 +00:00
Marek Olšák
61bfb4fa06 amd: lower load_subgroup_invocation in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Georg Lehmann
3da2d96bc5 aco/optimizer: fix signed extract of sub dword temps with SDWA
If an instruction didn't already use SDWA convert_to_SDWA in apply_extract
will add ubyte0/uword0 selections for v1b/v2b operands. This loses information
that the instruction doesn't care about the high bits and makes the next
apply_extract_twice fail.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Fixes: 6cb9d39bc2 ("aco: combine extracts with sub-dword definitions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32803>
2025-01-02 09:33:18 +00:00
Georg Lehmann
42512208d8 aco/insert_exec: exit shader using exec for top level discard
Totals from 14538 (18.31% of 79395) affected shaders:
no changes

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>
2024-12-30 13:05:23 +00:00
Georg Lehmann
c279e63a79 aco: rename p_early_exit_if to if_not
It exits the shaders if the condition is false, not true.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32731>
2024-12-30 13:05:23 +00:00
Daniel Schürmann
b64fff7731 aco: remove definition from Pseudo branch instructions
They are not needed anymore.

Totals from 7019 (8.84% of 79395) affected shaders: (Navi31)

Instrs: 14805400 -> 14824196 (+0.13%); split: -0.00%, +0.13%
CodeSize: 78079972 -> 78132932 (+0.07%); split: -0.01%, +0.08%
SpillSGPRs: 4485 -> 4515 (+0.67%); split: -0.76%, +1.43%
Latency: 165862000 -> 165836134 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 30061764 -> 30057781 (-0.01%); split: -0.01%, +0.00%
SClause: 392323 -> 392286 (-0.01%); split: -0.01%, +0.00%
Copies: 1012262 -> 1012234 (-0.00%); split: -0.04%, +0.04%
Branches: 365910 -> 365909 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 360167 -> 355363 (-1.33%)
VALU: 8837197 -> 8837276 (+0.00%); split: -0.00%, +0.00%
SALU: 1402593 -> 1402621 (+0.00%); split: -0.03%, +0.03%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
7e4687fd04 aco: remove definition from SOPP branch instructions
Totals from 17942 (22.60% of 79395) affected shaders: (Navi31)

Instrs: 20334063 -> 20312676 (-0.11%); split: -0.11%, +0.00%
CodeSize: 108458732 -> 108377540 (-0.07%); split: -0.08%, +0.00%
Latency: 180510540 -> 180479666 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 28079325 -> 28077938 (-0.00%); split: -0.01%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Daniel Schürmann
cab5639a09 aco/assembler: chain branches instead of emitting long jumps
As regular branch instructions cannot jump further than
32768 dwords, previously we used long jumps as fallback
solution. The disadvantage of that is that an extra SGPR
pair must be provided in order to temporarily store the PC.

This patch changes that to chained branch instructions by
inserting an artificial extra block into the code to be
targeted by the original branch. This block contains a
single branch instruction jumping to the original target.
Before the block, if necessary, we insert a <branch 1>
instruction for the existing code in order to jump over
the newly inserted block.

Only a few RT shaders are affected.

Totals from 29 (0.04% of 79395) affected shaders: (Navi31)

CodeSize: 17281176 -> 17276332 (-0.03%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Rhys Perry
86c63b29bc aco/gfx12: insert wait between VMEM WaW
https://github.com/llvm/llvm-project/pull/105549

fossil-db (gfx1200):
Totals from 1783 (2.25% of 79395) affected shaders:
Instrs: 7398391 -> 7404566 (+0.08%); split: -0.00%, +0.08%
CodeSize: 38862456 -> 38886364 (+0.06%); split: -0.00%, +0.06%
Latency: 83191513 -> 84211504 (+1.23%); split: -0.00%, +1.23%
InvThroughput: 15185936 -> 15345744 (+1.05%); split: -0.01%, +1.06%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32373>
2024-12-02 10:13:39 +00:00
Rhys Perry
63b0692eac aco: don't use uniform continues if exec might be empty
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143>
2024-11-25 10:32:59 +00:00
Rhys Perry
aa0ede751d aco/tests: add tests for empty exec masks
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143>
2024-11-25 10:32:59 +00:00
Rhys Perry
f35e229fae aco: skip code if exec is empty
This is safer and potentially faster.

fossil-db (navi21):
Totals from 690 (0.87% of 79395) affected shaders:
Instrs: 4534778 -> 4535916 (+0.03%)
CodeSize: 25268516 -> 25272080 (+0.01%); split: -0.00%, +0.01%
Latency: 48482721 -> 48513907 (+0.06%); split: -0.00%, +0.07%
InvThroughput: 13213965 -> 13217828 (+0.03%); split: -0.00%, +0.03%
Copies: 432307 -> 432295 (-0.00%); split: -0.05%, +0.04%
Branches: 187305 -> 188249 (+0.50%)
VALU: 2904490 -> 2904508 (+0.00%); split: -0.00%, +0.00%
SALU: 674962 -> 675133 (+0.03%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143>
2024-11-25 10:32:59 +00:00