Daniel Schürmann
d3743dd7ba
aco/scheduler: improve scheduling heuristic
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The heuristic we are currently using still stems from the GCN era
with the only adjustments being made for RDNA was to double (or triple)
the wave count.
This rewrite aims to detangle some concepts and provide more consistent results.
- wave_factor: The purpose of this value is to reflect that RDNA SIMDs can
accomodate twice as many waves as GCN SIMDs.
- reg_file_multiple: This value accounts for the larger register file of wave32
and some RDNA3 families.
- wave_minimum: Below this value, we don't sacrifice any waves. It corresponds
to a register demand of 64 VGPRs in wave64.
- occupancy_factor: Depending on target_waves and wave_factor, this controls
the scheduling window sizes and number of moves.
The main differences from the previous heuristic is a lower wave minimum and
a slightly less aggressive reduction of waves.
It also increases SMEM_MAX_MOVES in order to mitigate some of the changes
from targeting less waves.
Totals from 62777 (78.63% of 79839) affected shaders: (Navi48)
MaxWaves: 1880983 -> 1848028 (-1.75%); split: +0.01%, -1.76%
Instrs: 40904711 -> 40800797 (-0.25%); split: -0.39%, +0.14%
CodeSize: 217132208 -> 216748832 (-0.18%); split: -0.29%, +0.12%
VGPRs: 3019304 -> 3099596 (+2.66%); split: -0.11%, +2.77%
Latency: 268857129 -> 265951122 (-1.08%); split: -1.33%, +0.25%
InvThroughput: 40960938 -> 41044533 (+0.20%); split: -0.18%, +0.39%
VClause: 794000 -> 782913 (-1.40%); split: -2.24%, +0.84%
SClause: 1192476 -> 1150831 (-3.49%); split: -3.94%, +0.45%
Copies: 2720470 -> 2700148 (-0.75%); split: -1.84%, +1.09%
Branches: 785926 -> 785951 (+0.00%); split: -0.01%, +0.01%
VALU: 22918411 -> 22890189 (-0.12%); split: -0.19%, +0.06%
SALU: 5281201 -> 5289486 (+0.16%); split: -0.21%, +0.36%
VOPD: 8790 -> 8685 (-1.19%); split: +1.08%, -2.28%
Totals from 62081 (77.77% of 79825) affected shaders: (Navi31)
MaxWaves: 1848555 -> 1812347 (-1.96%); split: +0.01%, -1.97%
Instrs: 39794460 -> 39704180 (-0.23%); split: -0.39%, +0.16%
CodeSize: 208987052 -> 208621524 (-0.17%); split: -0.31%, +0.13%
VGPRs: 3046284 -> 3135156 (+2.92%); split: -0.11%, +3.03%
Latency: 268863465 -> 265218186 (-1.36%); split: -1.59%, +0.23%
InvThroughput: 41101515 -> 41167075 (+0.16%); split: -0.22%, +0.38%
VClause: 795316 -> 774899 (-2.57%); split: -3.17%, +0.61%
SClause: 1177294 -> 1135451 (-3.55%); split: -4.06%, +0.51%
Copies: 2743254 -> 2725127 (-0.66%); split: -1.90%, +1.24%
Branches: 801395 -> 801428 (+0.00%); split: -0.01%, +0.02%
VALU: 23898938 -> 23871294 (-0.12%); split: -0.20%, +0.08%
SALU: 3908807 -> 3919130 (+0.26%); split: -0.23%, +0.50%
VOPD: 8529 -> 8500 (-0.34%); split: +1.29%, -1.63%
Totals from 44996 (71.01% of 63370) affected shaders: (Vega10)
MaxWaves: 307074 -> 304808 (-0.74%); split: +0.63%, -1.37%
Instrs: 22743534 -> 22716240 (-0.12%); split: -0.22%, +0.10%
CodeSize: 117284856 -> 117173212 (-0.10%); split: -0.19%, +0.09%
SGPRs: 3249008 -> 3330480 (+2.51%); split: -0.36%, +2.87%
VGPRs: 1901400 -> 1943880 (+2.23%); split: -0.60%, +2.83%
Latency: 224839126 -> 222878477 (-0.87%); split: -1.19%, +0.31%
InvThroughput: 114389570 -> 114316559 (-0.06%); split: -0.17%, +0.11%
VClause: 482012 -> 473304 (-1.81%); split: -2.86%, +1.05%
SClause: 757799 -> 717092 (-5.37%); split: -5.64%, +0.27%
Copies: 2182735 -> 2183598 (+0.04%); split: -1.17%, +1.21%
Branches: 396026 -> 395996 (-0.01%); split: -0.03%, +0.02%
VALU: 16740283 -> 16728098 (-0.07%); split: -0.14%, +0.07%
SALU: 2133575 -> 2145863 (+0.58%); split: -0.29%, +0.86%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30720 >
2025-08-06 09:16:33 +00:00
Qiang Yu
196569b1a4
all: rename gl_shader_stage to mesa_shader_stage
...
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569 >
2025-08-06 10:28:40 +08:00
Rhys Perry
76c96bf558
aco: fix possible scratch offset overflow
...
We split vector load/store, so consider that we might add to the constant
offset.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406 >
2025-08-04 15:06:44 +00:00
Rhys Perry
44ab4ad732
aco: align scratch size after isel
...
Make it safe for VGPR spilling if it's not a multiple of 4.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406 >
2025-08-04 15:06:43 +00:00
Rhys Perry
ab10604924
aco/gfx12: fix printing of temporal hints
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406 >
2025-08-04 15:06:41 +00:00
Rhys Perry
cec845079e
ac/nir/lower_ps: remove barrier for end_invocation_interlock
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
SPIR-V->NIR now inserts this barrier itself.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513 >
2025-08-04 09:30:06 +00:00
Daniel Schürmann
4ca3cc5a1a
aco/ra: propagate precolor affinities through parallelcopies and tied definitions
...
Totals from 214 (0.27% of 79839) affected shaders: (Navi48)
Instrs: 65339 -> 65311 (-0.04%); split: -0.05%, +0.00%
CodeSize: 352616 -> 350952 (-0.47%); split: -0.55%, +0.07%
VGPRs: 9984 -> 9960 (-0.24%)
Latency: 207556 -> 207508 (-0.02%); split: -0.03%, +0.01%
InvThroughput: 40422 -> 40397 (-0.06%)
Copies: 3180 -> 3155 (-0.79%)
VALU: 38347 -> 38322 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
a667d9a68d
aco/ra: propagate precolor affinities through phis
...
Totals from 917 (1.15% of 79839) affected shaders: (Navi48)
Instrs: 3217861 -> 3216947 (-0.03%); split: -0.04%, +0.01%
CodeSize: 17427204 -> 17432264 (+0.03%); split: -0.06%, +0.09%
VGPRs: 65328 -> 65316 (-0.02%)
Latency: 35336268 -> 35335528 (-0.00%); split: -0.01%, +0.01%
InvThroughput: 7305032 -> 7302187 (-0.04%); split: -0.04%, +0.00%
SClause: 120537 -> 120553 (+0.01%); split: -0.01%, +0.02%
Copies: 307257 -> 306852 (-0.13%); split: -0.21%, +0.08%
Branches: 115744 -> 115743 (-0.00%)
VALU: 1572522 -> 1572183 (-0.02%); split: -0.02%, +0.00%
SALU: 574229 -> 574155 (-0.01%); split: -0.05%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
2ddd8ef0a3
aco/ra: don't optimize encodings on precolor affinity mismatch
...
Totals from 238 (0.30% of 79839) affected shaders: (Navi48)
Instrs: 137836 -> 137176 (-0.48%); split: -0.50%, +0.02%
CodeSize: 728616 -> 728668 (+0.01%); split: -0.06%, +0.07%
Latency: 1503248 -> 1500202 (-0.20%); split: -0.56%, +0.36%
InvThroughput: 297725 -> 296715 (-0.34%); split: -0.70%, +0.36%
Copies: 9390 -> 8825 (-6.02%); split: -6.33%, +0.31%
VALU: 89861 -> 89296 (-0.63%); split: -0.66%, +0.03%
SALU: 13166 -> 13167 (+0.01%); split: -0.05%, +0.06%
Suggested-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
93606a19c6
aco/ra: collect register affinities for all precolored operands.
...
Totals from 1280 (1.60% of 79839) affected shaders: (Navi48)
Instrs: 817363 -> 812639 (-0.58%); split: -0.58%, +0.00%
CodeSize: 4262644 -> 4243540 (-0.45%); split: -0.45%, +0.00%
VGPRs: 61692 -> 61668 (-0.04%)
Latency: 4354318 -> 4347818 (-0.15%); split: -0.15%, +0.00%
InvThroughput: 711914 -> 707698 (-0.59%); split: -0.59%, +0.00%
VClause: 14685 -> 14677 (-0.05%); split: -0.09%, +0.03%
SClause: 25623 -> 25621 (-0.01%)
Copies: 50663 -> 46242 (-8.73%); split: -8.73%, +0.00%
VALU: 427744 -> 423323 (-1.03%); split: -1.03%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
e32eec52f0
aco/ra: generalize register affinities
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Daniel Schürmann
caa2c22d8b
aco/tests: Fix p_startpgm definitions to registers
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345 >
2025-08-01 17:15:54 +00:00
Alyssa Rosenzweig
cc6e3b84cb
treewide: use nir_def_as_*
...
Via Coccinelle patch:
@@
expression definition;
@@
-nir_instr_as_alu(definition->parent_instr)
+nir_def_as_alu(definition)
@@
expression definition;
@@
-nir_instr_as_intrinsic(definition->parent_instr)
+nir_def_as_intrinsic(definition)
@@
expression definition;
@@
-nir_instr_as_phi(definition->parent_instr)
+nir_def_as_phi(definition)
@@
expression definition;
@@
-nir_instr_as_load_const(definition->parent_instr)
+nir_def_as_load_const(definition)
@@
expression definition;
@@
-nir_instr_as_deref(definition->parent_instr)
+nir_def_as_deref(definition)
@@
expression definition;
@@
-nir_instr_as_tex(definition->parent_instr)
+nir_def_as_tex(definition)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489 >
2025-08-01 15:34:24 +00:00
Antonio Ospite
ddf2aa3a4d
build: avoid redefining unreachable() which is standard in C23
...
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437 >
2025-07-31 17:49:42 +00:00
Georg Lehmann
a6a6c2f691
aco/ra: convert bitwise instruction to gfx11+ 16bit on demand
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The 32bit versions are smaller, allow more optimizations and VOPD,
so only use the 16bit opcodes if nessecary.
Foz-DB Navi31:
Totals from 84 (0.10% of 80237) affected shaders:
Instrs: 176673 -> 176347 (-0.18%); split: -0.20%, +0.01%
CodeSize: 970148 -> 969716 (-0.04%); split: -0.08%, +0.03%
VGPRs: 5876 -> 5864 (-0.20%)
Latency: 2805974 -> 2805674 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 769007 -> 768738 (-0.03%); split: -0.04%, +0.01%
VClause: 2593 -> 2597 (+0.15%)
Copies: 23749 -> 23487 (-1.10%); split: -1.11%, +0.00%
VALU: 107124 -> 106862 (-0.24%); split: -0.25%, +0.00%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919 >
2025-07-31 12:07:07 +00:00
Georg Lehmann
404e1f13e8
aco/print_asm: use real true16 instr on gfx11+
...
Fake16 doesn't print opsel on v_cndmask_b16, so it looks really broken.
Restrict to LLVM20+ because older versions have incomplete tru16 support.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919 >
2025-07-31 12:07:07 +00:00
Georg Lehmann
b12db991eb
aco/gfx10: optimize subgroupRotate(x, 32) and subgroupShuffleXor(x, 32)
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We don't have v_permlane64_b32 yet, but we can still optimize it using
shared vgprs. Using the DPP16 row mask, we can even avoid writing exec.
With v0 input/output and v24/v25 as shared vgprs, this results in:
v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390 >
2025-07-29 06:33:20 +00:00
Georg Lehmann
eb4df58a3d
aco/isel: refactor shared vgpr usage
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390 >
2025-07-29 06:33:20 +00:00
Georg Lehmann
8a2aca8d6f
aco/select_alu: avoid vector get_alu_src for instructions with scalar operands
...
Foz-DB Navi21:
Totals from 1 (0.00% of 80237) affected shaders:
Instrs: 22 -> 21 (-4.55%)
CodeSize: 112 -> 108 (-3.57%)
Latency: 392 -> 386 (-1.53%)
InvThroughput: 25 -> 24 (-4.00%)
Copies: 4 -> 3 (-25.00%)
PreVGPRs: 8 -> 4 (-50.00%)
VALU: 10 -> 9 (-10.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728 >
2025-07-29 06:07:15 +00:00
Georg Lehmann
ad9c340d86
aco: insert VALU s_delay_alu for WMMA
...
This should avoid some SIMD stalls.
I think this special case was added to try to handle this case:
First Instruction: WMMA
Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C
Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction
If I read it correctly, we shouldn't need a delay if the type is the same and no
modifier is used. That's kind of complex to handle, so leave it for now.
Not inserting any delays likely hurts more than this.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
413d0d2ec8
aco/statistics: update GFX12 WMMA cost
...
Based on marketing numbers, but they seem to match RGP.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
8f61c85880
aco/statistics: add latency to WMMA
...
Assume the normal VALU latency of 4 cycles.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
004f8aa2f4
aco: optimize get_alu_src with constant source and size > 1
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Emulated FSR4, Navi31:
Totals from 14 (100.00% of 14) affected shaders:
MaxWaves: 130 -> 131 (+0.77%)
Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09%
CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07%
VGPRs: 2544 -> 2520 (-0.94%)
SpillVGPRs: 92 -> 89 (-3.26%)
Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66%
InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02%
VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80%
Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51%
PreSGPRs: 504 -> 519 (+2.98%)
PreVGPRs: 2460 -> 2448 (-0.49%)
VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05%
SALU: 1231 -> 1384 (+12.43%)
VMEM: 3807 -> 3801 (-0.16%)
VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090 >
2025-07-25 11:33:00 +00:00
Alyssa Rosenzweig
8a1a410389
treewide: use SWAP macro
...
Via Coccinelle patch + manual clean up:
@@
identifier temporary, a, b;
type T;
@@
-T temporary = a;
-a = b;
-b = temporary;
+SWAP(a, b);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297 >
2025-07-23 19:49:47 +00:00
Georg Lehmann
c80daf934c
aco: supported 64bit or vectorized bitfield_select
...
No Foz-DB changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141 >
2025-07-21 20:42:32 +00:00
Georg Lehmann
14b36fb790
aco/isel: don't create literal operands for SALU bitfield_select
...
Let the optimizer handle this.
No Foz-DB changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141 >
2025-07-21 20:42:32 +00:00
Rhys Perry
256a7cc4f0
aco/isel: optimize uniform vote
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
fossil-db (navi21):
Totals from 21 (0.03% of 79825) affected shaders:
Instrs: 44939 -> 44913 (-0.06%)
CodeSize: 236612 -> 236504 (-0.05%)
Latency: 509496 -> 509349 (-0.03%)
Copies: 3624 -> 3620 (-0.11%)
SALU: 5458 -> 5432 (-0.48%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
0bceb07c03
aco: optimize uniform s_not
...
fossil-db (navi21):
Totals from 1442 (1.81% of 79825) affected shaders:
Instrs: 2224425 -> 2220624 (-0.17%); split: -0.18%, +0.01%
CodeSize: 11778260 -> 11763264 (-0.13%); split: -0.14%, +0.01%
Latency: 13396254 -> 13392346 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 3145007 -> 3144982 (-0.00%); split: -0.00%, +0.00%
SClause: 53037 -> 53035 (-0.00%); split: -0.01%, +0.01%
Copies: 185852 -> 184777 (-0.58%); split: -0.71%, +0.13%
Branches: 60799 -> 60805 (+0.01%)
PreSGPRs: 62940 -> 62954 (+0.02%); split: -0.01%, +0.03%
SALU: 298564 -> 294761 (-1.27%); split: -1.34%, +0.06%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
85b31c9c4d
aco/opt: add some comments
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
2fff1db5c8
aco: don't both flip s_cselect and label uniform_bool
...
Otherwise, the uniform_bool could point to a temporary which might be
DCE'd, since it's not used by the s_cselect.
fossil-db (navi21):
Totals from 1 (0.00% of 79825) affected shaders:
Instrs: 1267 -> 1269 (+0.16%)
Latency: 91071 -> 91103 (+0.04%)
SALU: 283 -> 285 (+0.71%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
2239a5e9ae
aco: stop labeling first def of and(uniform_bool/uniform_bitwise, exec)
...
The optimizer shouldn't consider a lanemask to be a uniform boolean unless
it's either 0 or -1. Optimizations involving s_not/s_xor might not work
properly otherwise.
No fossil-db changes (navi21).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
d81add3dda
aco: optimize s_and(s_cselect, exec)
...
fossil-db (navi21):
Totals from 62 (0.08% of 79825) affected shaders:
Instrs: 178887 -> 178745 (-0.08%)
CodeSize: 942980 -> 942328 (-0.07%)
Latency: 1274513 -> 1273653 (-0.07%)
InvThroughput: 213862 -> 213774 (-0.04%); split: -0.04%, +0.00%
SALU: 26446 -> 26301 (-0.55%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36005 >
2025-07-21 08:27:01 +00:00
Rhys Perry
b2e5fc9451
aco/lower_phis: add bld_before_logical_end helper
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36005 >
2025-07-21 08:27:01 +00:00
Georg Lehmann
d672737372
nir,aco: add byte_perm_amd
...
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Natalie Vock
ac96594b86
aco/isel: Use vector-aligned operands for ds_stack_push8_pop1_rtn_b32
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
b2a95d2133
aco/ra: Add affinities for DS vector-aligned operands
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
df5495b934
aco/assembler: Support vector-aligned operands on DS instructions
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
ea66a8d1c5
aco,nir: Add support for GFX12 ds_bvh_stack_push8_pop1_rtn_b32 instruction
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
9707b30965
nir,aco: Add ds_bvh_stack_rtn
...
This is a ds instruction that also overwrites its first input, so
introduce a new ds format with two outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:39 +00:00
Natalie Vock
c515f1fd58
aco: Use vector-aligned operands for image_bvh8_intersect_ray
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Natalie Vock
c279dd6e61
aco: Support vector-aligned ops fixed to defs
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Natalie Vock
f17fe05e32
aco/isel: Improve vector splits for image_bvh8_intersect_ray
...
Using split_vector to split everything into scalars allows copy-prop to
eliminate the final p_create_vector. Considerably reduces copies and
register thrashing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Marek Olšák
d12bc87dda
aco: implement upcasting 16-bit types for 32-bit color buffers in PS epilog
...
This was missed when implementing the change for LLVM.
Fixes: fbbf029529 - radeonsi: enable 16-bit mediump IO for PS outputs only, and VS->PS with env var
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36112 >
2025-07-15 18:28:30 +00:00
Daniel Schürmann
47ef60cbf1
aco/ra: always use bytes for register stride requirements
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
instead of a mixture between dwords and bytes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36053 >
2025-07-14 08:45:29 +00:00
Marek Olšák
5ded4f3c7d
aco: remove unused aco_symbol_lds_ngg_gs_out_vertex_base
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529 >
2025-07-12 10:28:21 +00:00
Georg Lehmann
92d433c54a
aco: vectorize conversions from 8bit to 16bit
...
Massively helps emulated fp8 performance.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854 >
2025-07-12 08:39:15 +00:00
Georg Lehmann
7fece5592c
aco: vectorize 16bit extracts
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854 >
2025-07-12 08:39:14 +00:00
Rhys Perry
3b9a1ce4ca
aco: remove RegClass::as_subdword
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:09 +00:00
Rhys Perry
9c55b0ca20
aco: use MUBUF for global access with SGPR address on GFX7/8
...
This should be better than using FLAT, which only supports a VGPR address.
fossil-db (polaris10):
Totals from 159 (0.26% of 62070) affected shaders:
MaxWaves: 789 -> 803 (+1.77%)
Instrs: 234284 -> 230557 (-1.59%); split: -1.71%, +0.12%
CodeSize: 1212324 -> 1186716 (-2.11%); split: -2.23%, +0.11%
SGPRs: 10504 -> 10712 (+1.98%)
VGPRs: 10556 -> 10236 (-3.03%); split: -3.37%, +0.34%
SpillSGPRs: 579 -> 577 (-0.35%)
Latency: 3903056 -> 3875625 (-0.70%); split: -0.87%, +0.16%
InvThroughput: 3139443 -> 3114426 (-0.80%); split: -0.86%, +0.07%
VClause: 4205 -> 4433 (+5.42%); split: -0.43%, +5.85%
SClause: 4461 -> 4445 (-0.36%); split: -0.43%, +0.07%
Copies: 30889 -> 31507 (+2.00%); split: -0.29%, +2.29%
PreSGPRs: 7370 -> 7609 (+3.24%)
PreVGPRs: 8339 -> 8193 (-1.75%)
VALU: 175025 -> 170232 (-2.74%); split: -2.77%, +0.03%
SALU: 27269 -> 28532 (+4.63%); split: -0.01%, +4.64%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00
Rhys Perry
0094e6c32a
aco: optimize lds-only or vmem-only flat access
...
fossil-db (polaris10):
Totals from 138 (0.22% of 62070) affected shaders:
Instrs: 233452 -> 234436 (+0.42%)
CodeSize: 1209392 -> 1213220 (+0.32%)
Latency: 3934496 -> 3928089 (-0.16%); split: -0.17%, +0.00%
InvThroughput: 3040782 -> 3038562 (-0.07%); split: -0.07%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00