Georg Lehmann
859505d95a
aco/optimizer: use new helpers to apply literals
...
Foz-DB Navi21:
Totals from 21009 (26.33% of 79789) affected shaders:
MaxWaves: 495342 -> 495414 (+0.01%)
Instrs: 22345587 -> 22335371 (-0.05%); split: -0.05%, +0.00%
CodeSize: 122095820 -> 121795112 (-0.25%); split: -0.25%, +0.00%
VGPRs: 1025800 -> 1025480 (-0.03%)
Latency: 202876235 -> 203076272 (+0.10%); split: -0.04%, +0.14%
InvThroughput: 47599930 -> 47596113 (-0.01%); split: -0.03%, +0.02%
VClause: 475271 -> 475439 (+0.04%); split: -0.02%, +0.05%
SClause: 700679 -> 700629 (-0.01%); split: -0.01%, +0.01%
Copies: 1628498 -> 1618165 (-0.63%); split: -0.64%, +0.01%
Branches: 567199 -> 567216 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 952134 -> 952043 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 846614 -> 846272 (-0.04%)
VALU: 15572374 -> 15564050 (-0.05%); split: -0.05%, +0.00%
SALU: 2423329 -> 2421319 (-0.08%); split: -0.08%, +0.00%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Foz-DB Navi31:
Totals from 13049 (16.44% of 79395) affected shaders:
MaxWaves: 357242 -> 357268 (+0.01%)
Instrs: 19955572 -> 19944106 (-0.06%); split: -0.06%, +0.00%
CodeSize: 105689464 -> 105454348 (-0.22%); split: -0.23%, +0.00%
VGPRs: 765744 -> 764952 (-0.10%); split: -0.11%, +0.00%
Latency: 179063640 -> 179141591 (+0.04%); split: -0.02%, +0.07%
InvThroughput: 27978134 -> 27971318 (-0.02%); split: -0.03%, +0.01%
VClause: 386791 -> 386826 (+0.01%); split: -0.02%, +0.03%
SClause: 598113 -> 598106 (-0.00%); split: -0.01%, +0.01%
Copies: 1393111 -> 1383102 (-0.72%); split: -0.73%, +0.01%
Branches: 498533 -> 498535 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 573310 -> 573236 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 591459 -> 591043 (-0.07%)
VALU: 11623734 -> 11615755 (-0.07%); split: -0.07%, +0.00%
SALU: 1962055 -> 1960005 (-0.10%); split: -0.11%, +0.00%
VOPD: 3544 -> 3566 (+0.62%); split: +0.73%, -0.11%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:39 +00:00
Georg Lehmann
0d8219f367
aco/tests: allow even more literals
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272 >
2025-10-14 08:33:37 +00:00
Rhys Perry
dfa8ac6b91
aco: remove buffer_load_lds instructions
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
They don't exist
See https://github.com/llvm/llvm-project/pull/132916
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14041
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37716 >
2025-10-07 09:50:26 +00:00
Georg Lehmann
26e041e821
aco: remove existing dealloc_vgprs use
...
We didn't consider that s_sendmsg dealloc_vgpr waits for all counters
expect vscnt.
Foz-DB Navi31:
Totals from 74090 (92.52% of 80084) affected shaders:
Instrs: 36031071 -> 35853573 (-0.49%)
CodeSize: 189233756 -> 188523764 (-0.38%)
Latency: 222378318 -> 222374890 (-0.00%)
InvThroughput: 33366893 -> 33362457 (-0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508 >
2025-09-26 07:51:02 +00:00
Rhys Perry
e2181744c2
aco/tests: add barrier-to-waitcnt tests
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491 >
2025-09-09 12:34:40 +00:00
Rhys Perry
20cd5cf5f7
aco: delay barrier waitcnt until they are needed
...
fossil-db (navi21):
Totals from 44 (0.06% of 79825) affected shaders:
Instrs: 16001 -> 15932 (-0.43%); split: -0.46%, +0.02%
CodeSize: 85800 -> 85548 (-0.29%); split: -0.30%, +0.01%
Latency: 190124 -> 173458 (-8.77%)
InvThroughput: 23605 -> 22756 (-3.60%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491 >
2025-09-09 12:34:40 +00:00
Georg Lehmann
8903bb4618
aco/optimizer: don't apply packed clamp to v_fma_mix
...
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13758
Fixes: 345bf8a2f2 ("aco/optimizer: remove label_vop3p")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36963 >
2025-08-25 16:47:38 +00:00
Konstantin Seurer
951b187b95
nir: Use nir_def_block in more places
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746 >
2025-08-24 14:03:10 +00:00
Georg Lehmann
883b1ca364
aco: disable wqm for tex loads when not needed
...
By only executing VMEM loads for lanes where the result is used, we can save
bandwidth.
The NIR pass only handles tex for now, but those are most common anyway.
We can extend it handle image/ssbo/ubo/global loads in the future.
Foz-DB GFX1201:
Totals from 32633 (40.66% of 80251) affected shaders:
Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46%
CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81%
VGPRs: 1481868 -> 1481712 (-0.01%)
SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45%
Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30%
InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12%
VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13%
SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51%
Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96%
Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1330303 -> 1416540 (+6.48%)
PreVGPRs: 964864 -> 964863 (-0.00%)
VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01%
SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
a4c537c5b3
aco: use new disable_wqm for mubuf/mtbuf
...
Foz-DB GFX1201:
Totals from 66 (0.08% of 80251) affected shaders:
Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65%
CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48%
Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02%
InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02%
VClause: 982 -> 962 (-2.04%)
Copies: 2882 -> 2808 (-2.57%)
PreSGPRs: 2564 -> 2599 (+1.37%)
SALU: 4748 -> 5010 (+5.52%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Qiang Yu
196569b1a4
all: rename gl_shader_stage to mesa_shader_stage
...
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569 >
2025-08-06 10:28:40 +08:00
Daniel Schürmann
caa2c22d8b
aco/tests: Fix p_startpgm definitions to registers
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Georg Lehmann
404e1f13e8
aco/print_asm: use real true16 instr on gfx11+
...
Fake16 doesn't print opsel on v_cndmask_b16, so it looks really broken.
Restrict to LLVM20+ because older versions have incomplete tru16 support.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919 >
2025-07-31 12:07:07 +00:00
Natalie Vock
c515f1fd58
aco: Use vector-aligned operands for image_bvh8_intersect_ray
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Rhys Perry
0094e6c32a
aco: optimize lds-only or vmem-only flat access
...
fossil-db (polaris10):
Totals from 138 (0.22% of 62070) affected shaders:
Instrs: 233452 -> 234436 (+0.42%)
CodeSize: 1209392 -> 1213220 (+0.32%)
Latency: 3934496 -> 3928089 (-0.16%); split: -0.17%, +0.00%
InvThroughput: 3040782 -> 3038562 (-0.07%); split: -0.07%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00
Rhys Perry
d705b6198c
aco: simplify waitcnt insertion for flat access
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00
Rhys Perry
34f1a8f707
aco: handle FPAtomicToDenormModeHazard
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This is quite unlikely to happen, but I guess it might be possible and
it's relatively simple to work around.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35884 >
2025-07-07 13:02:43 +00:00
Rhys Perry
2cfd2d3b1d
aco/tests: add lower_branches tests
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35202 >
2025-06-19 10:58:39 +00:00
Rhys Perry
86ccceb4de
aco: don't consider gfx1153 to have point sample acceleration
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978 >
2025-06-06 11:55:13 +01:00
Rhys Perry
f10b49781d
aco: make all wait entries linear
...
If we remove exec skips, then we can wait for an entry on all paths in the
linear cfg, but not the logical cfg.
fossil-db (gfx1201):
Totals from 0 (0.00% of 79653) affected shaders:
fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:
fossil-db (navi21):
Totals from 1586 (1.99% of 79653) affected shaders:
Instrs: 5118897 -> 5113206 (-0.11%); split: -0.11%, +0.00%
CodeSize: 28365852 -> 28343696 (-0.08%); split: -0.08%, +0.00%
Latency: 47820341 -> 47799532 (-0.04%); split: -0.09%, +0.05%
InvThroughput: 9904391 -> 9908653 (+0.04%); split: -0.02%, +0.06%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978 >
2025-06-06 11:55:13 +01:00
Rhys Perry
1088ac49db
aco: sometimes join linear wait entries on logical edges
...
fossil-db (gfx1201):
Totals from 1303 (1.64% of 79653) affected shaders:
Instrs: 6920949 -> 6917692 (-0.05%); split: -0.06%, +0.01%
CodeSize: 37112404 -> 37095728 (-0.04%); split: -0.05%, +0.01%
Latency: 70471343 -> 70365986 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 11515673 -> 11504666 (-0.10%); split: -0.10%, +0.01%
fossil-db (navi31):
Totals from 1293 (1.62% of 79653) affected shaders:
Instrs: 6500186 -> 6496761 (-0.05%); split: -0.06%, +0.01%
CodeSize: 34562712 -> 34549236 (-0.04%); split: -0.04%, +0.01%
Latency: 68604746 -> 68666532 (+0.09%); split: -0.15%, +0.24%
InvThroughput: 11276591 -> 11284914 (+0.07%); split: -0.10%, +0.17%
fossil-db (navi21):
Totals from 811 (1.02% of 79653) affected shaders:
Instrs: 4110953 -> 4108788 (-0.05%); split: -0.05%, +0.00%
CodeSize: 22955984 -> 22948064 (-0.03%); split: -0.03%, +0.00%
Latency: 35070231 -> 35064448 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6945610 -> 6945053 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978 >
2025-06-06 11:51:08 +01:00
Rhys Perry
c1f8537131
aco: skip waitcnt between two vmem writing different lanes
...
fossil-db (gfx1201):
Totals from 1382 (1.74% of 79653) affected shaders:
Instrs: 6531704 -> 6523935 (-0.12%); split: -0.12%, +0.00%
CodeSize: 34992076 -> 34933568 (-0.17%); split: -0.17%, +0.01%
Latency: 70183360 -> 69616066 (-0.81%); split: -0.81%, +0.00%
InvThroughput: 11155445 -> 11068667 (-0.78%); split: -0.78%, +0.00%
fossil-db (navi31):
Totals from 46 (0.06% of 79653) affected shaders:
Instrs: 1833768 -> 1833732 (-0.00%)
CodeSize: 9468788 -> 9468716 (-0.00%)
Latency: 11683092 -> 11667865 (-0.13%)
InvThroughput: 2274377 -> 2272872 (-0.07%)
fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978 >
2025-06-06 11:51:08 +01:00
Rhys Perry
9649deb50e
aco: skip waitcnt between two vmem writing different halves
...
fossil-db (gfx1201):
Totals from 4 (0.01% of 79653) affected shaders:
Instrs: 41374 -> 41380 (+0.01%); split: -0.01%, +0.02%
CodeSize: 238912 -> 238924 (+0.01%); split: -0.01%, +0.01%
Latency: 706714 -> 706410 (-0.04%)
InvThroughput: 352269 -> 352118 (-0.04%)
VClause: 803 -> 798 (-0.62%)
fossil-db (navi31):
Totals from 0 (0.00% of 79653) affected shaders:
fossil-db (navi21):
Totals from 0 (0.00% of 79653) affected shaders:
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13028
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34978 >
2025-06-06 11:51:08 +01:00
Rhys Perry
c50f9541e4
aco/tests: Add tests for vector-aligned operands
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359 >
2025-05-28 09:24:17 +00:00
Daniel Schürmann
6aabcb02a1
aco/print_ir: only print 'lateKill' if requested via print_kill flag
...
Also only print lateKill for actually killed operands.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34359 >
2025-05-28 09:24:16 +00:00
Rhys Perry
072e6d1ab5
aco/tests: add tests for tied definitions
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Some of these would have failed before the rewrite.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34700 >
2025-05-20 15:40:47 +00:00
Rhys Perry
171920ceed
aco/gfx115: consider point sample acceleration
...
Like 15428e0d786939a5c7629a9978947c8a9112ce96 in LLVM.
fossil-db (gfx1150):
Totals from 909 (1.14% of 79653) affected shaders:
Instrs: 5840489 -> 5840705 (+0.00%); split: -0.00%, +0.00%
CodeSize: 31133460 -> 31134296 (+0.00%); split: -0.00%, +0.00%
Latency: 52982280 -> 53438577 (+0.86%); split: -0.00%, +0.86%
InvThroughput: 10841454 -> 10942682 (+0.93%); split: -0.00%, +0.93%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34935 >
2025-05-14 11:22:13 +00:00
Daniel Schürmann
2b0536e921
aco: remove block_kind_continue_or_break workaround and tests
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33479 >
2025-05-09 17:20:29 +00:00
Rhys Perry
20279c28c8
aco/tests: add pseudo-scalar transcendental and fallback path RA tests
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34343 >
2025-04-29 15:15:11 +00:00
Rhys Perry
62e50de5d0
aco: use v_perm_b32 for byte swaps within a VGPR on gfx10
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636 >
2025-04-23 18:23:18 +00:00
Rhys Perry
a43783fd76
aco: use v_perm_b32 for do_pack_2x16 on gfx10+
...
fossil-db (gfx1201);
Totals from 93 (0.12% of 79377) affected shaders:
Instrs: 373212 -> 372761 (-0.12%)
CodeSize: 2062752 -> 2063704 (+0.05%); split: -0.00%, +0.05%
Latency: 4172059 -> 4171993 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1299144 -> 1299093 (-0.00%)
Copies: 51268 -> 50831 (-0.85%)
Branches: 10980 -> 10979 (-0.01%)
VALU: 220192 -> 219756 (-0.20%)
VOPD: 48 -> 47 (-2.08%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34636 >
2025-04-23 18:23:18 +00:00
Rhys Perry
3d6fa6996c
aco: init vm_vsrc/sa_sdst from depctr_wait
...
fossil-db (navi31):
Totals from 5805 (7.31% of 79377) affected shaders:
Instrs: 14229621 -> 14207115 (-0.16%); split: -0.16%, +0.00%
CodeSize: 75358724 -> 75268624 (-0.12%); split: -0.12%, +0.00%
Latency: 133637034 -> 133624262 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 22067819 -> 22066213 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34529 >
2025-04-17 17:28:22 +00:00
Rhys Perry
4fcf2eb1d7
aco/gfx12: VOPD src0/1 are src bank compatible if they are the same vgpr
...
fossil-db (gfx1201):
Totals from 66518 (83.80% of 79377) affected shaders:
Instrs: 36939667 -> 36656685 (-0.77%); split: -0.79%, +0.02%
CodeSize: 220575208 -> 220201764 (-0.17%); split: -0.21%, +0.04%
Latency: 258919732 -> 258137974 (-0.30%); split: -0.35%, +0.05%
InvThroughput: 49911351 -> 49643836 (-0.54%); split: -0.55%, +0.02%
VClause: 788661 -> 788430 (-0.03%); split: -0.04%, +0.01%
SClause: 1176416 -> 1176263 (-0.01%); split: -0.02%, +0.01%
VALU: 18014058 -> 17818119 (-1.09%); split: -1.10%, +0.01%
VOPD: 4926983 -> 5122922 (+3.98%); split: +4.01%, -0.04%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Rhys Perry
3446f2059d
aco/gfx12: assume VOPD with two v_mov_b32 are src bank compatible
...
fossil-db (gfx1201):
Totals from 10576 (13.32% of 79377) affected shaders:
(no stats changed)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Rhys Perry
408fa33c09
aco/gfx12: don't use second VALU for VOPD's OPX if there is a WaR
...
fossil-db (gfx1201):
Totals from 38908 (49.02% of 79377) affected shaders:
Instrs: 30268107 -> 30268131 (+0.00%); split: -0.00%, +0.00%
CodeSize: 180843648 -> 180843640 (-0.00%); split: -0.00%, +0.00%
Latency: 224905962 -> 224906072 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 44322988 -> 44323004 (+0.00%)
VALU: 15124145 -> 15124167 (+0.00%)
VOPD: 4018504 -> 4018482 (-0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Backport-to: 25.0
Backport-to: 25.1
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34246 >
2025-04-17 14:00:29 +00:00
Georg Lehmann
64cae5c48d
aco: form mixed MTBUF/MUBUF clauses
...
This should be one clause (all of the instructions load from the same vertex buffer)
s_clause 0x2 ; bfa10002
tbuffer_load_format_xyzw v[8:11], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:36 ; e9c32024 80010805
tbuffer_load_format_xyzw v[12:15], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:16 ; e9c32010 80010c05
tbuffer_load_format_xyzw v[16:19], v5, s[4:7], 0 format:[BUF_FMT_8_8_8_8_UNORM] idxen offset:12 ; e9c3200c 80011005
s_clause 0x2 ; bfa10002
buffer_load_dwordx3 v[20:22], v5, s[4:7], 0 idxen ; e03c2000 80011405
buffer_load_dwordx3 v[23:25], v5, s[4:7], 0 idxen offset:20 ; e03c2014 80011705
buffer_load_dwordx4 v[28:31], v5, s[4:7], 0 idxen offset:48 ; e0382030 80011c05
tbuffer_load_format_xy v[0:1], v5, s[4:7], 0 format:[BUF_FMT_8_8_UNORM] idxen offset:32 ; e8712020 80010005
Foz-DB Navi21:
Totals from 5624 (7.08% of 79395) affected shaders:
MaxWaves: 149894 -> 149898 (+0.00%)
Instrs: 3032697 -> 3034853 (+0.07%); split: -0.05%, +0.12%
CodeSize: 15907852 -> 15915752 (+0.05%); split: -0.05%, +0.10%
VGPRs: 216248 -> 216144 (-0.05%)
Latency: 10955137 -> 11008760 (+0.49%); split: -0.22%, +0.70%
InvThroughput: 2032857 -> 2033916 (+0.05%); split: -0.03%, +0.08%
VClause: 50120 -> 41778 (-16.64%); split: -16.66%, +0.02%
SClause: 62034 -> 62004 (-0.05%); split: -0.33%, +0.29%
Copies: 253836 -> 254505 (+0.26%); split: -0.17%, +0.43%
VALU: 1621606 -> 1622274 (+0.04%); split: -0.03%, +0.07%
SALU: 653251 -> 653252 (+0.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34379 >
2025-04-08 09:22:04 +00:00
Samuel Pitoiset
f46830912e
aco: do not apply OMOD/CLAMP for pseudo scalar trans instrs
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This optimization seems broken because eg. v_s_log_f32 uses SGPRs
for both the source and destination but applying OMOD seems to require
VGPRs.
This fixes a GPU hang when launching Enshrouded on GFX1201.
No fossils db changes on GFX1201.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34027 >
2025-03-13 11:22:10 +00:00
Samuel Pitoiset
dd2e9c11af
aco/tests: use GFX1201 instead of GFX1200
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33970 >
2025-03-11 06:50:49 +00:00
Georg Lehmann
7eb43c3b1c
aco/optimizer: delete combine_and_subbrev
...
This is now done in NIR. No Foz-DB changes on Navi21.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33761 >
2025-03-01 07:49:28 +00:00
Natalie Vock
1967b0f0c4
aco/tests: Add tests for precolored operands in different regs
...
The first test verifies that, if possible, we don't emit unnecessary
renames/copies for temporaries where it's possible for them to stay
in their current register (if an operand is precolored to the register
the temporary is currently residing in).
The second test verifies that we correctly choose a non-clobbered
operand even if there is one fixed to the temporary's current register.
To minimize copies, we'll want to have the live copy of
%tmp0 in v[2] there, because v[0-1] gets overwritten.
The third test verifies that we add a copy to another free register and
rename if all possible precolored operands are clobbered.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29576 >
2025-02-28 16:00:48 +00:00
Daniel Schürmann
3c27a9f0e2
aco/tests: add more tests for chained branches
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33762 >
2025-02-27 10:40:01 +00:00
Daniel Schürmann
1a8a643bbd
aco/isel: track control flow divergence in loops more accurately
...
We introduce two new variables, cf_context::in_divergent_cf and
cf_context::parent_loop.has_divergent_break, in order to determine
whether there is any other invocations on a different CF path.
Totals from 1305 (1.64% of 79395) affected shaders: (Navi31)
Instrs: 659211 -> 657815 (-0.21%); split: -0.22%, +0.01%
CodeSize: 3483228 -> 3477960 (-0.15%); split: -0.16%, +0.01%
VGPRs: 68820 -> 48048 (-30.18%)
Latency: 14197750 -> 14170767 (-0.19%); split: -0.26%, +0.07%
InvThroughput: 1619103 -> 1619826 (+0.04%); split: -0.02%, +0.07%
VClause: 12384 -> 12350 (-0.27%)
SClause: 26693 -> 26844 (+0.57%); split: -0.01%, +0.57%
Copies: 44994 -> 43535 (-3.24%); split: -3.26%, +0.02%
PreSGPRs: 49007 -> 48907 (-0.20%)
PreVGPRs: 32171 -> 32121 (-0.16%)
VALU: 349984 -> 349857 (-0.04%); split: -0.04%, +0.00%
SALU: 84252 -> 83988 (-0.31%); split: -0.32%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206 >
2025-02-05 10:54:21 +00:00
Daniel Schürmann
61fa007e48
aco/isel: fix empty exec tracking for uniform branches
...
Totals from 5 (0.01% of 79395) affected shaders: (Navi31)
Instrs: 54730 -> 54715 (-0.03%)
CodeSize: 276928 -> 276852 (-0.03%)
Latency: 215212 -> 214874 (-0.16%)
InvThroughput: 40154 -> 40150 (-0.01%)
Copies: 6824 -> 6821 (-0.04%); split: -0.06%, +0.01%
Branches: 1625 -> 1615 (-0.62%)
SALU: 5682 -> 5678 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206 >
2025-02-05 10:54:21 +00:00
Daniel Schürmann
65f95ae74e
aco/insert_NOPs: implement VALU -> VALU case for VALUReadSGPRHazard on GFX12
...
Totals from 36918 (46.50% of 79395) affected shaders: (GFX1200)
Instrs: 34997889 -> 35296429 (+0.85%); split: -0.00%, +0.85%
CodeSize: 186161112 -> 187334364 (+0.63%); split: -0.00%, +0.63%
Latency: 250265551 -> 250330784 (+0.03%); split: -0.00%, +0.03%
InvThroughput: 41185298 -> 41192503 (+0.02%); split: -0.00%, +0.02%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32682 >
2025-01-30 03:13:16 +00:00
Georg Lehmann
b23ff87db4
aco/sched_ilp: base latency and issue cycles on aco_statistics
...
This matters for trans and scalar fpu instructions.
Foz-DB GFX1150:
Totals from 53894 (67.90% of 79377) affected shaders:
Instrs: 38528421 -> 38481337 (-0.12%); split: -0.16%, +0.04%
CodeSize: 200206016 -> 200023916 (-0.09%); split: -0.12%, +0.03%
Latency: 265011734 -> 264303762 (-0.27%); split: -0.28%, +0.02%
InvThroughput: 53804490 -> 53696097 (-0.20%); split: -0.21%, +0.01%
VClause: 736996 -> 736988 (-0.00%); split: -0.00%, +0.00%
SClause: 1118494 -> 1118474 (-0.00%); split: -0.01%, +0.01%
VALU: 21982349 -> 21982358 (+0.00%); split: -0.00%, +0.00%
Foz-DB Navi31:
Totals from 50791 (63.99% of 79377) affected shaders:
Instrs: 37511862 -> 37495712 (-0.04%); split: -0.11%, +0.07%
CodeSize: 197990892 -> 197925104 (-0.03%); split: -0.09%, +0.06%
Latency: 261929261 -> 261273534 (-0.25%); split: -0.27%, +0.01%
InvThroughput: 43978329 -> 43921618 (-0.13%); split: -0.14%, +0.01%
VClause: 727683 -> 727695 (+0.00%); split: -0.00%, +0.00%
SClause: 1092527 -> 1092544 (+0.00%); split: -0.01%, +0.01%
VALU: 22646553 -> 22646566 (+0.00%)
Foz-DB Navi21:
Totals from 43899 (55.30% of 79377) affected shaders:
Instrs: 35649081 -> 35649110 (+0.00%); split: -0.00%, +0.00%
CodeSize: 192336212 -> 192337276 (+0.00%); split: -0.00%, +0.00%
Latency: 270621538 -> 270221431 (-0.15%); split: -0.16%, +0.02%
InvThroughput: 66757841 -> 66715918 (-0.06%); split: -0.07%, +0.01%
VClause: 734884 -> 734867 (-0.00%); split: -0.01%, +0.01%
SClause: 1072956 -> 1072951 (-0.00%); split: -0.01%, +0.01%
Foz-DB Vega10:
Totals from 52687 (83.60% of 63026) affected shaders:
Instrs: 24595280 -> 24595693 (+0.00%); split: -0.01%, +0.01%
CodeSize: 127199836 -> 127200164 (+0.00%); split: -0.01%, +0.01%
Latency: 252281578 -> 252497934 (+0.09%); split: -0.03%, +0.12%
InvThroughput: 136551527 -> 136577609 (+0.02%); split: -0.01%, +0.03%
VClause: 536798 -> 536718 (-0.01%); split: -0.04%, +0.03%
SClause: 819978 -> 819693 (-0.03%); split: -0.04%, +0.01%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33222 >
2025-01-28 17:00:45 +00:00
Georg Lehmann
819938d2fa
aco/sched_ilp: new latency heuristic
...
The main train of thought is that we should consider latency after
the write was scheduled. This means we rely a lot less on the input
order of instructions for good results.
Foz-DB GFX1150:
Totals from 75606 (95.25% of 79377) affected shaders:
Instrs: 43274326 -> 42129011 (-2.65%); split: -2.65%, +0.01%
CodeSize: 223049932 -> 218465796 (-2.06%); split: -2.06%, +0.00%
Latency: 297614199 -> 292317054 (-1.78%); split: -1.84%, +0.06%
InvThroughput: 57020160 -> 56336213 (-1.20%); split: -1.21%, +0.02%
VClause: 841775 -> 841861 (+0.01%); split: -0.06%, +0.07%
SClause: 1253516 -> 1253798 (+0.02%); split: -0.03%, +0.05%
VALU: 23893837 -> 23893828 (-0.00%); split: -0.00%, +0.00%
Foz-DB Navi31:
Totals from 75606 (95.25% of 79377) affected shaders:
Instrs: 42717592 -> 41531696 (-2.78%); split: -2.78%, +0.00%
CodeSize: 223582476 -> 218866196 (-2.11%); split: -2.11%, +0.00%
Latency: 297736383 -> 292450493 (-1.78%); split: -1.83%, +0.05%
InvThroughput: 47298730 -> 46934084 (-0.77%); split: -0.78%, +0.01%
VClause: 844982 -> 844892 (-0.01%); split: -0.07%, +0.06%
SClause: 1248433 -> 1248693 (+0.02%); split: -0.03%, +0.05%
VALU: 24819703 -> 24819704 (+0.00%); split: -0.00%, +0.00%
Foz-DB Navi21:
Totals from 76224 (96.03% of 79377) affected shaders:
Instrs: 46019515 -> 46015691 (-0.01%); split: -0.03%, +0.03%
CodeSize: 246992544 -> 246977404 (-0.01%); split: -0.03%, +0.02%
Latency: 324647457 -> 318661132 (-1.84%); split: -1.90%, +0.05%
InvThroughput: 74834800 -> 74269723 (-0.76%); split: -0.76%, +0.01%
VClause: 927601 -> 927579 (-0.00%); split: -0.04%, +0.04%
SClause: 1302666 -> 1303178 (+0.04%); split: -0.02%, +0.06%
Foz-DB Vega10:
Totals from 60142 (95.42% of 63026) affected shaders:
Instrs: 25117688 -> 25098175 (-0.08%); split: -0.10%, +0.02%
CodeSize: 129847464 -> 129769456 (-0.06%); split: -0.08%, +0.02%
Latency: 261606546 -> 262407481 (+0.31%); split: -0.12%, +0.43%
InvThroughput: 138422594 -> 138500401 (+0.06%); split: -0.03%, +0.09%
VClause: 555424 -> 555321 (-0.02%); split: -0.11%, +0.09%
SClause: 851219 -> 851620 (+0.05%); split: -0.03%, +0.08%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33222 >
2025-01-28 17:00:44 +00:00
Georg Lehmann
df1de388a3
aco/sched_ilp: reorder VINTRP
...
VINTRP(gfx6-gfx10.3) is mostly just VALU, but we treated it like memory
instructions as an afterthought. This had issues as VINTRP was never reordered
with itself, or other memory instructions. Reordering VINTRP in clauses
increases ILP. We don't really need collect_clause_dependencies for VINTRP
either, because they ususally have the same dependencies already. That means
we can still form VINTRP clauses by selecting preferably VINTRP after a
previous one.
Foz-DB Navi21:
Totals from 34184 (43.16% of 79206) affected shaders:
Instrs: 18811270 -> 18812046 (+0.00%); split: -0.01%, +0.02%
CodeSize: 103627276 -> 103630056 (+0.00%); split: -0.01%, +0.01%
Latency: 188379364 -> 187936731 (-0.23%); split: -0.27%, +0.03%
InvThroughput: 42600163 -> 42590608 (-0.02%); split: -0.03%, +0.00%
VClause: 378960 -> 378912 (-0.01%); split: -0.02%, +0.00%
SClause: 727560 -> 720573 (-0.96%); split: -1.08%, +0.12%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33111 >
2025-01-27 11:59:45 +00:00
Rhys Perry
0eb5f66660
nir/validate: validate ssa dominance by default
...
This no longer modifies dominance metadata, so enable it by default.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32005 >
2025-01-23 23:35:44 +00:00
Timur Kristóf
50035f0316
ac/nir: Move all ac_nir_* files to a new folder.
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966 >
2025-01-14 13:46:30 +01:00
Timur Kristóf
305fdfddb5
ac/nir: Move ac_set_nir_options to ac_nir.c
...
And rename it to ac_nir_set_options to match other functions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966 >
2025-01-14 13:45:34 +01:00