Commit graph

408 commits

Author SHA1 Message Date
Georg Lehmann
859505d95a aco/optimizer: use new helpers to apply literals
Foz-DB Navi21:
Totals from 21009 (26.33% of 79789) affected shaders:
MaxWaves: 495342 -> 495414 (+0.01%)
Instrs: 22345587 -> 22335371 (-0.05%); split: -0.05%, +0.00%
CodeSize: 122095820 -> 121795112 (-0.25%); split: -0.25%, +0.00%
VGPRs: 1025800 -> 1025480 (-0.03%)
Latency: 202876235 -> 203076272 (+0.10%); split: -0.04%, +0.14%
InvThroughput: 47599930 -> 47596113 (-0.01%); split: -0.03%, +0.02%
VClause: 475271 -> 475439 (+0.04%); split: -0.02%, +0.05%
SClause: 700679 -> 700629 (-0.01%); split: -0.01%, +0.01%
Copies: 1628498 -> 1618165 (-0.63%); split: -0.64%, +0.01%
Branches: 567199 -> 567216 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 952134 -> 952043 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 846614 -> 846272 (-0.04%)
VALU: 15572374 -> 15564050 (-0.05%); split: -0.05%, +0.00%
SALU: 2423329 -> 2421319 (-0.08%); split: -0.08%, +0.00%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Foz-DB Navi31:
Totals from 13049 (16.44% of 79395) affected shaders:

MaxWaves: 357242 -> 357268 (+0.01%)
Instrs: 19955572 -> 19944106 (-0.06%); split: -0.06%, +0.00%
CodeSize: 105689464 -> 105454348 (-0.22%); split: -0.23%, +0.00%
VGPRs: 765744 -> 764952 (-0.10%); split: -0.11%, +0.00%
Latency: 179063640 -> 179141591 (+0.04%); split: -0.02%, +0.07%
InvThroughput: 27978134 -> 27971318 (-0.02%); split: -0.03%, +0.01%
VClause: 386791 -> 386826 (+0.01%); split: -0.02%, +0.03%
SClause: 598113 -> 598106 (-0.00%); split: -0.01%, +0.01%
Copies: 1393111 -> 1383102 (-0.72%); split: -0.73%, +0.01%
Branches: 498533 -> 498535 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 573310 -> 573236 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 591459 -> 591043 (-0.07%)
VALU: 11623734 -> 11615755 (-0.07%); split: -0.07%, +0.00%
SALU: 1962055 -> 1960005 (-0.10%); split: -0.11%, +0.00%
VOPD: 3544 -> 3566 (+0.62%); split: +0.73%, -0.11%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272>
2025-10-14 08:33:39 +00:00
Georg Lehmann
0d8219f367 aco/tests: allow even more literals
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272>
2025-10-14 08:33:37 +00:00
Rhys Perry
dfa8ac6b91 aco: remove buffer_load_lds instructions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
They don't exist

See https://github.com/llvm/llvm-project/pull/132916

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14041
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37716>
2025-10-07 09:50:26 +00:00
Georg Lehmann
26e041e821 aco: remove existing dealloc_vgprs use
We didn't consider that s_sendmsg dealloc_vgpr waits for all counters
expect vscnt.

Foz-DB Navi31:
Totals from 74090 (92.52% of 80084) affected shaders:
Instrs: 36031071 -> 35853573 (-0.49%)
CodeSize: 189233756 -> 188523764 (-0.38%)
Latency: 222378318 -> 222374890 (-0.00%)
InvThroughput: 33366893 -> 33362457 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508>
2025-09-26 07:51:02 +00:00
Rhys Perry
e2181744c2 aco/tests: add barrier-to-waitcnt tests
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
20cd5cf5f7 aco: delay barrier waitcnt until they are needed
fossil-db (navi21):
Totals from 44 (0.06% of 79825) affected shaders:
Instrs: 16001 -> 15932 (-0.43%); split: -0.46%, +0.02%
CodeSize: 85800 -> 85548 (-0.29%); split: -0.30%, +0.01%
Latency: 190124 -> 173458 (-8.77%)
InvThroughput: 23605 -> 22756 (-3.60%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Georg Lehmann
8903bb4618 aco/optimizer: don't apply packed clamp to v_fma_mix
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13758
Fixes: 345bf8a2f2 ("aco/optimizer: remove label_vop3p")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36963>
2025-08-25 16:47:38 +00:00
Konstantin Seurer
951b187b95 nir: Use nir_def_block in more places
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746>
2025-08-24 14:03:10 +00:00
Georg Lehmann
883b1ca364 aco: disable wqm for tex loads when not needed
By only executing VMEM loads for lanes where the result is used, we can save
bandwidth.

The NIR pass only handles tex for now, but those are most common anyway.
We can extend it handle image/ssbo/ubo/global loads in the future.

Foz-DB GFX1201:
Totals from 32633 (40.66% of 80251) affected shaders:
Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46%
CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81%
VGPRs: 1481868 -> 1481712 (-0.01%)
SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45%
Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30%
InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12%
VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13%
SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51%
Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96%
Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1330303 -> 1416540 (+6.48%)
PreVGPRs: 964864 -> 964863 (-0.00%)
VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01%
SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Georg Lehmann
a4c537c5b3 aco: use new disable_wqm for mubuf/mtbuf
Foz-DB GFX1201:
Totals from 66 (0.08% of 80251) affected shaders:
Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65%
CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48%
Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02%
InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02%
VClause: 982 -> 962 (-2.04%)
Copies: 2882 -> 2808 (-2.57%)
PreSGPRs: 2564 -> 2599 (+1.37%)
SALU: 4748 -> 5010 (+5.52%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>
2025-08-15 07:03:46 +00:00
Qiang Yu
196569b1a4 all: rename gl_shader_stage to mesa_shader_stage
It's not only for GL, change to a generic name.

Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:40 +08:00
Daniel Schürmann
caa2c22d8b aco/tests: Fix p_startpgm definitions to registers
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Georg Lehmann
404e1f13e8 aco/print_asm: use real true16 instr on gfx11+
Fake16 doesn't print opsel on v_cndmask_b16, so it looks really broken.
Restrict to LLVM20+ because older versions have incomplete tru16 support.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919>
2025-07-31 12:07:07 +00:00
Natalie Vock
c515f1fd58 aco: Use vector-aligned operands for image_bvh8_intersect_ray
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>
2025-07-15 21:34:38 +00:00
Rhys Perry
0094e6c32a aco: optimize lds-only or vmem-only flat access
fossil-db (polaris10):
Totals from 138 (0.22% of 62070) affected shaders:
Instrs: 233452 -> 234436 (+0.42%)
CodeSize: 1209392 -> 1213220 (+0.32%)
Latency: 3934496 -> 3928089 (-0.16%); split: -0.17%, +0.00%
InvThroughput: 3040782 -> 3038562 (-0.07%); split: -0.07%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>
2025-07-11 12:15:08 +00:00
Rhys Perry
d705b6198c aco: simplify waitcnt insertion for flat access
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>
2025-07-11 12:15:08 +00:00
Rhys Perry
34f1a8f707 aco: handle FPAtomicToDenormModeHazard
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is quite unlikely to happen, but I guess it might be possible and
it's relatively simple to work around.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35884>
2025-07-07 13:02:43 +00:00
Rhys Perry
2cfd2d3b1d aco/tests: add lower_branches tests
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202>
2025-06-19 10:58:39 +00:00
Rhys Perry
86ccceb4de aco: don't consider gfx1153 to have point sample acceleration
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
f10b49781d aco: make all wait entries linear
If we remove exec skips, then we can wait for an entry on all paths in the
linear cfg, but not the logical cfg.

fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 1586 (1.99% of 79653) affected shaders:
Instrs: 5118897 -> 5113206 (-0.11%); split: -0.11%, +0.00%
CodeSize: 28365852 -> 28343696 (-0.08%); split: -0.08%, +0.00%
Latency: 47820341 -> 47799532 (-0.04%); split: -0.09%, +0.05%
InvThroughput: 9904391 -> 9908653 (+0.04%); split: -0.02%, +0.06%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:55:13 +01:00
Rhys Perry
1088ac49db aco: sometimes join linear wait entries on logical edges
fossil-db (gfx1201):
Totals from 1303 (1.64% of 79653) affected shaders:
Instrs: 6920949 -> 6917692 (-0.05%); split: -0.06%, +0.01%
CodeSize: 37112404 -> 37095728 (-0.04%); split: -0.05%, +0.01%
Latency: 70471343 -> 70365986 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 11515673 -> 11504666 (-0.10%); split: -0.10%, +0.01%

fossil-db (navi31):
Totals from 1293 (1.62% of 79653) affected shaders:
Instrs: 6500186 -> 6496761 (-0.05%); split: -0.06%, +0.01%
CodeSize: 34562712 -> 34549236 (-0.04%); split: -0.04%, +0.01%
Latency: 68604746 -> 68666532 (+0.09%); split: -0.15%, +0.24%
InvThroughput: 11276591 -> 11284914 (+0.07%); split: -0.10%, +0.17%

fossil-db (navi21):
Totals from 811 (1.02% of 79653) affected shaders:
Instrs: 4110953 -> 4108788 (-0.05%); split: -0.05%, +0.00%
CodeSize: 22955984 -> 22948064 (-0.03%); split: -0.03%, +0.00%
Latency: 35070231 -> 35064448 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6945610 -> 6945053 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
c1f8537131 aco: skip waitcnt between two vmem writing different lanes
fossil-db (gfx1201):
Totals from 1382 (1.74% of 79653) affected shaders:
Instrs: 6531704 -> 6523935 (-0.12%); split: -0.12%, +0.00%
CodeSize: 34992076 -> 34933568 (-0.17%); split: -0.17%, +0.01%
Latency: 70183360 -> 69616066 (-0.81%); split: -0.81%, +0.00%
InvThroughput: 11155445 -> 11068667 (-0.78%); split: -0.78%, +0.00%

fossil-db (navi31):
Totals from 46 (0.06% of 79653) affected shaders:
Instrs: 1833768 -> 1833732 (-0.00%)
CodeSize: 9468788 -> 9468716 (-0.00%)
Latency: 11683092 -> 11667865 (-0.13%)
InvThroughput: 2274377 -> 2272872 (-0.07%)

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
9649deb50e aco: skip waitcnt between two vmem writing different halves
fossil-db (gfx1201):
Totals from 4 (0.01% of 79653) affected shaders:
Instrs: 41374 -> 41380 (+0.01%); split: -0.01%, +0.02%
CodeSize: 238912 -> 238924 (+0.01%); split: -0.01%, +0.01%
Latency: 706714 -> 706410 (-0.04%)
InvThroughput: 352269 -> 352118 (-0.04%)
VClause: 803 -> 798 (-0.62%)

fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:

fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13028
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978>
2025-06-06 11:51:08 +01:00
Rhys Perry
c50f9541e4 aco/tests: Add tests for vector-aligned operands
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:17 +00:00
Daniel Schürmann
6aabcb02a1 aco/print_ir: only print 'lateKill' if requested via print_kill flag
Also only print lateKill for actually killed operands.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359>
2025-05-28 09:24:16 +00:00
Rhys Perry
072e6d1ab5 aco/tests: add tests for tied definitions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Some of these would have failed before the rewrite.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34700>
2025-05-20 15:40:47 +00:00
Rhys Perry
171920ceed aco/gfx115: consider point sample acceleration
Like 15428e0d786939a5c7629a9978947c8a9112ce96 in LLVM.

fossil-db (gfx1150):
Totals from 909 (1.14% of 79653) affected shaders:
Instrs: 5840489 -> 5840705 (+0.00%); split: -0.00%, +0.00%
CodeSize: 31133460 -> 31134296 (+0.00%); split: -0.00%, +0.00%
Latency: 52982280 -> 53438577 (+0.86%); split: -0.00%, +0.86%
InvThroughput: 10841454 -> 10942682 (+0.93%); split: -0.00%, +0.93%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34935>
2025-05-14 11:22:13 +00:00
Daniel Schürmann
2b0536e921 aco: remove block_kind_continue_or_break workaround and tests
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479>
2025-05-09 17:20:29 +00:00
Rhys Perry
20279c28c8 aco/tests: add pseudo-scalar transcendental and fallback path RA tests
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34343>
2025-04-29 15:15:11 +00:00
Rhys Perry
62e50de5d0 aco: use v_perm_b32 for byte swaps within a VGPR on gfx10
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636>
2025-04-23 18:23:18 +00:00
Rhys Perry
a43783fd76 aco: use v_perm_b32 for do_pack_2x16 on gfx10+
fossil-db (gfx1201);
Totals from 93 (0.12% of 79377) affected shaders:
Instrs: 373212 -> 372761 (-0.12%)
CodeSize: 2062752 -> 2063704 (+0.05%); split: -0.00%, +0.05%
Latency: 4172059 -> 4171993 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1299144 -> 1299093 (-0.00%)
Copies: 51268 -> 50831 (-0.85%)
Branches: 10980 -> 10979 (-0.01%)
VALU: 220192 -> 219756 (-0.20%)
VOPD: 48 -> 47 (-2.08%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636>
2025-04-23 18:23:18 +00:00
Rhys Perry
3d6fa6996c aco: init vm_vsrc/sa_sdst from depctr_wait
fossil-db (navi31):
Totals from 5805 (7.31% of 79377) affected shaders:
Instrs: 14229621 -> 14207115 (-0.16%); split: -0.16%, +0.00%
CodeSize: 75358724 -> 75268624 (-0.12%); split: -0.12%, +0.00%
Latency: 133637034 -> 133624262 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 22067819 -> 22066213 (-0.01%); split: -0.01%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529>
2025-04-17 17:28:22 +00:00
Rhys Perry
4fcf2eb1d7 aco/gfx12: VOPD src0/1 are src bank compatible if they are the same vgpr
fossil-db (gfx1201):
Totals from 66518 (83.80% of 79377) affected shaders:
Instrs: 36939667 -> 36656685 (-0.77%); split: -0.79%, +0.02%
CodeSize: 220575208 -> 220201764 (-0.17%); split: -0.21%, +0.04%
Latency: 258919732 -> 258137974 (-0.30%); split: -0.35%, +0.05%
InvThroughput: 49911351 -> 49643836 (-0.54%); split: -0.55%, +0.02%
VClause: 788661 -> 788430 (-0.03%); split: -0.04%, +0.01%
SClause: 1176416 -> 1176263 (-0.01%); split: -0.02%, +0.01%
VALU: 18014058 -> 17818119 (-1.09%); split: -1.10%, +0.01%
VOPD: 4926983 -> 5122922 (+3.98%); split: +4.01%, -0.04%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246>
2025-04-17 14:00:29 +00:00
Rhys Perry
3446f2059d aco/gfx12: assume VOPD with two v_mov_b32 are src bank compatible
fossil-db (gfx1201):
Totals from 10576 (13.32% of 79377) affected shaders:
(no stats changed)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246>
2025-04-17 14:00:29 +00:00
Rhys Perry
408fa33c09 aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR
fossil-db (gfx1201):
Totals from 38908 (49.02% of 79377) affected shaders:
Instrs: 30268107 -> 30268131 (+0.00%); split: -0.00%, +0.00%
CodeSize: 180843648 -> 180843640 (-0.00%); split: -0.00%, +0.00%
Latency: 224905962 -> 224906072 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 44322988 -> 44323004 (+0.00%)
VALU: 15124145 -> 15124167 (+0.00%)
VOPD: 4018504 -> 4018482 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246>
2025-04-17 14:00:29 +00:00
Georg Lehmann
64cae5c48d aco: form mixed MTBUF/MUBUF clauses
This should be one clause (all of the instructions load from the same vertex buffer)

s_clause 0x2                                                ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2                                                ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen           ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005

Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379>
2025-04-08 09:22:04 +00:00
Samuel Pitoiset
f46830912e aco: do not apply OMOD/CLAMP for pseudo scalar trans instrs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.

This fixes a GPU hang when launching Enshrouded on GFX1201.

No fossils db changes on GFX1201.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027>
2025-03-13 11:22:10 +00:00
Samuel Pitoiset
dd2e9c11af aco/tests: use GFX1201 instead of GFX1200
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33970>
2025-03-11 06:50:49 +00:00
Georg Lehmann
7eb43c3b1c aco/optimizer: delete combine_and_subbrev
This is now done in NIR. No Foz-DB changes on Navi21.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761>
2025-03-01 07:49:28 +00:00
Natalie Vock
1967b0f0c4 aco/tests: Add tests for precolored operands in different regs
The first test verifies that, if possible, we don't emit unnecessary
renames/copies for temporaries where it's possible for them to stay
in their current register (if an operand is precolored to the register
the temporary is currently residing in).

The second test verifies that we correctly choose a non-clobbered
operand even if there is one fixed to the temporary's current register.
To minimize copies, we'll want to have the live copy of
%tmp0 in v[2] there, because v[0-1] gets overwritten.

The third test verifies that we add a copy to another free register and
rename if all possible precolored operands are clobbered.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576>
2025-02-28 16:00:48 +00:00
Daniel Schürmann
3c27a9f0e2 aco/tests: add more tests for chained branches
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762>
2025-02-27 10:40:01 +00:00
Daniel Schürmann
1a8a643bbd aco/isel: track control flow divergence in loops more accurately
We introduce two new variables, cf_context::in_divergent_cf and
cf_context::parent_loop.has_divergent_break, in order to determine
whether there is any other invocations on a different CF path.

Totals from 1305 (1.64% of 79395) affected shaders: (Navi31)

Instrs: 659211 -> 657815 (-0.21%); split: -0.22%, +0.01%
CodeSize: 3483228 -> 3477960 (-0.15%); split: -0.16%, +0.01%
VGPRs: 68820 -> 48048 (-30.18%)
Latency: 14197750 -> 14170767 (-0.19%); split: -0.26%, +0.07%
InvThroughput: 1619103 -> 1619826 (+0.04%); split: -0.02%, +0.07%
VClause: 12384 -> 12350 (-0.27%)
SClause: 26693 -> 26844 (+0.57%); split: -0.01%, +0.57%
Copies: 44994 -> 43535 (-3.24%); split: -3.26%, +0.02%
PreSGPRs: 49007 -> 48907 (-0.20%)
PreVGPRs: 32171 -> 32121 (-0.16%)
VALU: 349984 -> 349857 (-0.04%); split: -0.04%, +0.00%
SALU: 84252 -> 83988 (-0.31%); split: -0.32%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
61fa007e48 aco/isel: fix empty exec tracking for uniform branches
Totals from 5 (0.01% of 79395) affected shaders: (Navi31)

Instrs: 54730 -> 54715 (-0.03%)
CodeSize: 276928 -> 276852 (-0.03%)
Latency: 215212 -> 214874 (-0.16%)
InvThroughput: 40154 -> 40150 (-0.01%)
Copies: 6824 -> 6821 (-0.04%); split: -0.06%, +0.01%
Branches: 1625 -> 1615 (-0.62%)
SALU: 5682 -> 5678 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann
65f95ae74e aco/insert_NOPs: implement VALU -> VALU case for VALUReadSGPRHazard on GFX12
Totals from 36918 (46.50% of 79395) affected shaders: (GFX1200)

Instrs: 34997889 -> 35296429 (+0.85%); split: -0.00%, +0.85%
CodeSize: 186161112 -> 187334364 (+0.63%); split: -0.00%, +0.63%
Latency: 250265551 -> 250330784 (+0.03%); split: -0.00%, +0.03%
InvThroughput: 41185298 -> 41192503 (+0.02%); split: -0.00%, +0.02%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32682>
2025-01-30 03:13:16 +00:00
Georg Lehmann
b23ff87db4 aco/sched_ilp: base latency and issue cycles on aco_statistics
This matters for trans and scalar fpu instructions.

Foz-DB GFX1150:
Totals from 53894 (67.90% of 79377) affected shaders:
Instrs: 38528421 -> 38481337 (-0.12%); split: -0.16%, +0.04%
CodeSize: 200206016 -> 200023916 (-0.09%); split: -0.12%, +0.03%
Latency: 265011734 -> 264303762 (-0.27%); split: -0.28%, +0.02%
InvThroughput: 53804490 -> 53696097 (-0.20%); split: -0.21%, +0.01%
VClause: 736996 -> 736988 (-0.00%); split: -0.00%, +0.00%
SClause: 1118494 -> 1118474 (-0.00%); split: -0.01%, +0.01%
VALU: 21982349 -> 21982358 (+0.00%); split: -0.00%, +0.00%

Foz-DB Navi31:
Totals from 50791 (63.99% of 79377) affected shaders:
Instrs: 37511862 -> 37495712 (-0.04%); split: -0.11%, +0.07%
CodeSize: 197990892 -> 197925104 (-0.03%); split: -0.09%, +0.06%
Latency: 261929261 -> 261273534 (-0.25%); split: -0.27%, +0.01%
InvThroughput: 43978329 -> 43921618 (-0.13%); split: -0.14%, +0.01%
VClause: 727683 -> 727695 (+0.00%); split: -0.00%, +0.00%
SClause: 1092527 -> 1092544 (+0.00%); split: -0.01%, +0.01%
VALU: 22646553 -> 22646566 (+0.00%)

Foz-DB Navi21:
Totals from 43899 (55.30% of 79377) affected shaders:
Instrs: 35649081 -> 35649110 (+0.00%); split: -0.00%, +0.00%
CodeSize: 192336212 -> 192337276 (+0.00%); split: -0.00%, +0.00%
Latency: 270621538 -> 270221431 (-0.15%); split: -0.16%, +0.02%
InvThroughput: 66757841 -> 66715918 (-0.06%); split: -0.07%, +0.01%
VClause: 734884 -> 734867 (-0.00%); split: -0.01%, +0.01%
SClause: 1072956 -> 1072951 (-0.00%); split: -0.01%, +0.01%

Foz-DB Vega10:
Totals from 52687 (83.60% of 63026) affected shaders:
Instrs: 24595280 -> 24595693 (+0.00%); split: -0.01%, +0.01%
CodeSize: 127199836 -> 127200164 (+0.00%); split: -0.01%, +0.01%
Latency: 252281578 -> 252497934 (+0.09%); split: -0.03%, +0.12%
InvThroughput: 136551527 -> 136577609 (+0.02%); split: -0.01%, +0.03%
VClause: 536798 -> 536718 (-0.01%); split: -0.04%, +0.03%
SClause: 819978 -> 819693 (-0.03%); split: -0.04%, +0.01%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33222>
2025-01-28 17:00:45 +00:00
Georg Lehmann
819938d2fa aco/sched_ilp: new latency heuristic
The main train of thought is that we should consider latency after
the write was scheduled. This means we rely a lot less on the input
order of instructions for good results.

Foz-DB GFX1150:
Totals from 75606 (95.25% of 79377) affected shaders:
Instrs: 43274326 -> 42129011 (-2.65%); split: -2.65%, +0.01%
CodeSize: 223049932 -> 218465796 (-2.06%); split: -2.06%, +0.00%
Latency: 297614199 -> 292317054 (-1.78%); split: -1.84%, +0.06%
InvThroughput: 57020160 -> 56336213 (-1.20%); split: -1.21%, +0.02%
VClause: 841775 -> 841861 (+0.01%); split: -0.06%, +0.07%
SClause: 1253516 -> 1253798 (+0.02%); split: -0.03%, +0.05%
VALU: 23893837 -> 23893828 (-0.00%); split: -0.00%, +0.00%

Foz-DB Navi31:
Totals from 75606 (95.25% of 79377) affected shaders:
Instrs: 42717592 -> 41531696 (-2.78%); split: -2.78%, +0.00%
CodeSize: 223582476 -> 218866196 (-2.11%); split: -2.11%, +0.00%
Latency: 297736383 -> 292450493 (-1.78%); split: -1.83%, +0.05%
InvThroughput: 47298730 -> 46934084 (-0.77%); split: -0.78%, +0.01%
VClause: 844982 -> 844892 (-0.01%); split: -0.07%, +0.06%
SClause: 1248433 -> 1248693 (+0.02%); split: -0.03%, +0.05%
VALU: 24819703 -> 24819704 (+0.00%); split: -0.00%, +0.00%

Foz-DB Navi21:
Totals from 76224 (96.03% of 79377) affected shaders:
Instrs: 46019515 -> 46015691 (-0.01%); split: -0.03%, +0.03%
CodeSize: 246992544 -> 246977404 (-0.01%); split: -0.03%, +0.02%
Latency: 324647457 -> 318661132 (-1.84%); split: -1.90%, +0.05%
InvThroughput: 74834800 -> 74269723 (-0.76%); split: -0.76%, +0.01%
VClause: 927601 -> 927579 (-0.00%); split: -0.04%, +0.04%
SClause: 1302666 -> 1303178 (+0.04%); split: -0.02%, +0.06%

Foz-DB Vega10:
Totals from 60142 (95.42% of 63026) affected shaders:
Instrs: 25117688 -> 25098175 (-0.08%); split: -0.10%, +0.02%
CodeSize: 129847464 -> 129769456 (-0.06%); split: -0.08%, +0.02%
Latency: 261606546 -> 262407481 (+0.31%); split: -0.12%, +0.43%
InvThroughput: 138422594 -> 138500401 (+0.06%); split: -0.03%, +0.09%
VClause: 555424 -> 555321 (-0.02%); split: -0.11%, +0.09%
SClause: 851219 -> 851620 (+0.05%); split: -0.03%, +0.08%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33222>
2025-01-28 17:00:44 +00:00
Georg Lehmann
df1de388a3 aco/sched_ilp: reorder VINTRP
VINTRP(gfx6-gfx10.3) is mostly just VALU, but we treated it like memory
instructions as an afterthought. This had issues as VINTRP was never reordered
with itself, or other memory instructions. Reordering VINTRP in clauses
increases ILP. We don't really need collect_clause_dependencies for VINTRP
either, because they ususally have the same dependencies already. That means
we can still form VINTRP clauses by selecting preferably VINTRP after a
previous one.

Foz-DB Navi21:
Totals from 34184 (43.16% of 79206) affected shaders:
Instrs: 18811270 -> 18812046 (+0.00%); split: -0.01%, +0.02%
CodeSize: 103627276 -> 103630056 (+0.00%); split: -0.01%, +0.01%
Latency: 188379364 -> 187936731 (-0.23%); split: -0.27%, +0.03%
InvThroughput: 42600163 -> 42590608 (-0.02%); split: -0.03%, +0.00%
VClause: 378960 -> 378912 (-0.01%); split: -0.02%, +0.00%
SClause: 727560 -> 720573 (-0.96%); split: -1.08%, +0.12%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33111>
2025-01-27 11:59:45 +00:00
Rhys Perry
0eb5f66660 nir/validate: validate ssa dominance by default
This no longer modifies dominance metadata, so enable it by default.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32005>
2025-01-23 23:35:44 +00:00
Timur Kristóf
50035f0316 ac/nir: Move all ac_nir_* files to a new folder.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:46:30 +01:00
Timur Kristóf
305fdfddb5 ac/nir: Move ac_set_nir_options to ac_nir.c
And rename it to ac_nir_set_options to match other functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:45:34 +01:00