Georg Lehmann
883b1ca364
aco: disable wqm for tex loads when not needed
...
By only executing VMEM loads for lanes where the result is used, we can save
bandwidth.
The NIR pass only handles tex for now, but those are most common anyway.
We can extend it handle image/ssbo/ubo/global loads in the future.
Foz-DB GFX1201:
Totals from 32633 (40.66% of 80251) affected shaders:
Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46%
CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81%
VGPRs: 1481868 -> 1481712 (-0.01%)
SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45%
Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30%
InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12%
VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13%
SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51%
Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96%
Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1330303 -> 1416540 (+6.48%)
PreVGPRs: 964864 -> 964863 (-0.00%)
VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01%
SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
7159fd21f8
aco: don't restrict vmem load scheduling by inserting p_end_wqm early
...
Foz-DB GFX1201:
Totals from 7 (0.01% of 80251) affected shaders:
Instrs: 703 -> 729 (+3.70%)
CodeSize: 4032 -> 4136 (+2.58%)
Latency: 5840 -> 4715 (-19.26%)
InvThroughput: 441 -> 405 (-8.16%)
Copies: 61 -> 67 (+9.84%)
PreSGPRs: 216 -> 218 (+0.93%)
SALU: 93 -> 113 (+21.51%)
When reordered after the next commit:
Foz-DB GFX1201:
Totals from 1609 (2.00% of 80251) affected shaders:
MaxWaves: 47984 -> 47986 (+0.00%)
Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50%
CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41%
VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30%
SpillSGPRs: 182 -> 184 (+1.10%)
Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15%
InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06%
VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32%
SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66%
Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16%
Branches: 21599 -> 21727 (+0.59%)
PreSGPRs: 69963 -> 70732 (+1.10%)
VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03%
SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
c1b29174b4
aco: use a smaller wqm section for strict_wqm sampling
...
It's only important that the coordinate is created in WQM,
the sample itself doesn't care.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
11cee3d634
aco: use new disable_wqm for p_dual_src_export_gfx11
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
8e53ba9a0a
aco: use new disable_wqm for exp
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
0e66f2b2cc
aco: use new disable_wqm for mimg
...
Foz-DB GFX1201:
Totals from 88 (0.11% of 80251) affected shaders:
Instrs: 81954 -> 82218 (+0.32%); split: -0.02%, +0.34%
CodeSize: 451824 -> 452880 (+0.23%); split: -0.02%, +0.25%
Latency: 308818 -> 308746 (-0.02%); split: -0.05%, +0.02%
VClause: 1324 -> 1318 (-0.45%)
Copies: 2795 -> 2784 (-0.39%)
PreSGPRs: 4029 -> 4035 (+0.15%)
SALU: 6563 -> 6809 (+3.75%); split: -0.15%, +3.90%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
922f559c3c
aco: use new disable_wqm for flatlike
...
No Foz-DB changes on GFX1201.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Georg Lehmann
a4c537c5b3
aco: use new disable_wqm for mubuf/mtbuf
...
Foz-DB GFX1201:
Totals from 66 (0.08% of 80251) affected shaders:
Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65%
CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48%
Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02%
InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02%
VClause: 982 -> 962 (-2.04%)
Copies: 2882 -> 2808 (-2.57%)
PreSGPRs: 2564 -> 2599 (+1.37%)
SALU: 4748 -> 5010 (+5.52%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970 >
2025-08-15 07:03:46 +00:00
Rhys Perry
44ab4ad732
aco: align scratch size after isel
...
Make it safe for VGPR spilling if it's not a multiple of 4.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406 >
2025-08-04 15:06:43 +00:00
Rhys Perry
cec845079e
ac/nir/lower_ps: remove barrier for end_invocation_interlock
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
SPIR-V->NIR now inserts this barrier itself.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513 >
2025-08-04 09:30:06 +00:00
Alyssa Rosenzweig
cc6e3b84cb
treewide: use nir_def_as_*
...
Via Coccinelle patch:
@@
expression definition;
@@
-nir_instr_as_alu(definition->parent_instr)
+nir_def_as_alu(definition)
@@
expression definition;
@@
-nir_instr_as_intrinsic(definition->parent_instr)
+nir_def_as_intrinsic(definition)
@@
expression definition;
@@
-nir_instr_as_phi(definition->parent_instr)
+nir_def_as_phi(definition)
@@
expression definition;
@@
-nir_instr_as_load_const(definition->parent_instr)
+nir_def_as_load_const(definition)
@@
expression definition;
@@
-nir_instr_as_deref(definition->parent_instr)
+nir_def_as_deref(definition)
@@
expression definition;
@@
-nir_instr_as_tex(definition->parent_instr)
+nir_def_as_tex(definition)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489 >
2025-08-01 15:34:24 +00:00
Antonio Ospite
ddf2aa3a4d
build: avoid redefining unreachable() which is standard in C23
...
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437 >
2025-07-31 17:49:42 +00:00
Georg Lehmann
b12db991eb
aco/gfx10: optimize subgroupRotate(x, 32) and subgroupShuffleXor(x, 32)
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We don't have v_permlane64_b32 yet, but we can still optimize it using
shared vgprs. Using the DPP16 row mask, we can even avoid writing exec.
With v0 input/output and v24/v25 as shared vgprs, this results in:
v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390 >
2025-07-29 06:33:20 +00:00
Georg Lehmann
eb4df58a3d
aco/isel: refactor shared vgpr usage
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390 >
2025-07-29 06:33:20 +00:00
Georg Lehmann
8a2aca8d6f
aco/select_alu: avoid vector get_alu_src for instructions with scalar operands
...
Foz-DB Navi21:
Totals from 1 (0.00% of 80237) affected shaders:
Instrs: 22 -> 21 (-4.55%)
CodeSize: 112 -> 108 (-3.57%)
Latency: 392 -> 386 (-1.53%)
InvThroughput: 25 -> 24 (-4.00%)
Copies: 4 -> 3 (-25.00%)
PreVGPRs: 8 -> 4 (-50.00%)
VALU: 10 -> 9 (-10.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728 >
2025-07-29 06:07:15 +00:00
Georg Lehmann
004f8aa2f4
aco: optimize get_alu_src with constant source and size > 1
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Emulated FSR4, Navi31:
Totals from 14 (100.00% of 14) affected shaders:
MaxWaves: 130 -> 131 (+0.77%)
Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09%
CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07%
VGPRs: 2544 -> 2520 (-0.94%)
SpillVGPRs: 92 -> 89 (-3.26%)
Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66%
InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02%
VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80%
Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51%
PreSGPRs: 504 -> 519 (+2.98%)
PreVGPRs: 2460 -> 2448 (-0.49%)
VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05%
SALU: 1231 -> 1384 (+12.43%)
VMEM: 3807 -> 3801 (-0.16%)
VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090 >
2025-07-25 11:33:00 +00:00
Alyssa Rosenzweig
8a1a410389
treewide: use SWAP macro
...
Via Coccinelle patch + manual clean up:
@@
identifier temporary, a, b;
type T;
@@
-T temporary = a;
-a = b;
-b = temporary;
+SWAP(a, b);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297 >
2025-07-23 19:49:47 +00:00
Georg Lehmann
c80daf934c
aco: supported 64bit or vectorized bitfield_select
...
No Foz-DB changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141 >
2025-07-21 20:42:32 +00:00
Georg Lehmann
14b36fb790
aco/isel: don't create literal operands for SALU bitfield_select
...
Let the optimizer handle this.
No Foz-DB changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141 >
2025-07-21 20:42:32 +00:00
Rhys Perry
256a7cc4f0
aco/isel: optimize uniform vote
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
fossil-db (navi21):
Totals from 21 (0.03% of 79825) affected shaders:
Instrs: 44939 -> 44913 (-0.06%)
CodeSize: 236612 -> 236504 (-0.05%)
Latency: 509496 -> 509349 (-0.03%)
Copies: 3624 -> 3620 (-0.11%)
SALU: 5458 -> 5432 (-0.48%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Georg Lehmann
d672737372
nir,aco: add byte_perm_amd
...
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Natalie Vock
ac96594b86
aco/isel: Use vector-aligned operands for ds_stack_push8_pop1_rtn_b32
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
ea66a8d1c5
aco,nir: Add support for GFX12 ds_bvh_stack_push8_pop1_rtn_b32 instruction
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
9707b30965
nir,aco: Add ds_bvh_stack_rtn
...
This is a ds instruction that also overwrites its first input, so
introduce a new ds format with two outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:39 +00:00
Natalie Vock
c515f1fd58
aco: Use vector-aligned operands for image_bvh8_intersect_ray
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Natalie Vock
f17fe05e32
aco/isel: Improve vector splits for image_bvh8_intersect_ray
...
Using split_vector to split everything into scalars allows copy-prop to
eliminate the final p_create_vector. Considerably reduces copies and
register thrashing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Marek Olšák
d12bc87dda
aco: implement upcasting 16-bit types for 32-bit color buffers in PS epilog
...
This was missed when implementing the change for LLVM.
Fixes: fbbf029529 - radeonsi: enable 16-bit mediump IO for PS outputs only, and VS->PS with env var
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36112 >
2025-07-15 18:28:30 +00:00
Marek Olšák
5ded4f3c7d
aco: remove unused aco_symbol_lds_ngg_gs_out_vertex_base
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529 >
2025-07-12 10:28:21 +00:00
Georg Lehmann
92d433c54a
aco: vectorize conversions from 8bit to 16bit
...
Massively helps emulated fp8 performance.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854 >
2025-07-12 08:39:15 +00:00
Georg Lehmann
7fece5592c
aco: vectorize 16bit extracts
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854 >
2025-07-12 08:39:14 +00:00
Rhys Perry
3b9a1ce4ca
aco: remove RegClass::as_subdword
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:09 +00:00
Rhys Perry
9c55b0ca20
aco: use MUBUF for global access with SGPR address on GFX7/8
...
This should be better than using FLAT, which only supports a VGPR address.
fossil-db (polaris10):
Totals from 159 (0.26% of 62070) affected shaders:
MaxWaves: 789 -> 803 (+1.77%)
Instrs: 234284 -> 230557 (-1.59%); split: -1.71%, +0.12%
CodeSize: 1212324 -> 1186716 (-2.11%); split: -2.23%, +0.11%
SGPRs: 10504 -> 10712 (+1.98%)
VGPRs: 10556 -> 10236 (-3.03%); split: -3.37%, +0.34%
SpillSGPRs: 579 -> 577 (-0.35%)
Latency: 3903056 -> 3875625 (-0.70%); split: -0.87%, +0.16%
InvThroughput: 3139443 -> 3114426 (-0.80%); split: -0.86%, +0.07%
VClause: 4205 -> 4433 (+5.42%); split: -0.43%, +5.85%
SClause: 4461 -> 4445 (-0.36%); split: -0.43%, +0.07%
Copies: 30889 -> 31507 (+2.00%); split: -0.29%, +2.29%
PreSGPRs: 7370 -> 7609 (+3.24%)
PreVGPRs: 8339 -> 8193 (-1.75%)
VALU: 175025 -> 170232 (-2.74%); split: -2.77%, +0.03%
SALU: 27269 -> 28532 (+4.63%); split: -0.01%, +4.64%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00
Rhys Perry
6396a82695
aco: return a format in lower_global_address
...
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:07 +00:00
Rhys Perry
89c2c94147
aco: increase global constant offset limit slightly
...
Before, this wasn't actually the maximum value plus one.
fossil-db (navi10):
Totals from 4 (0.01% of 63207) affected shaders:
(no stat changes)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:07 +00:00
Rhys Perry
d7dcd81c77
aco/gfx6: allow both constant and gpr offset for global with sgpr address
...
fossil-db (pitcairn):
Totals from 81 (0.13% of 62069) affected shaders:
MaxWaves: 332 -> 335 (+0.90%)
Instrs: 150087 -> 149737 (-0.23%); split: -0.30%, +0.06%
CodeSize: 754636 -> 752712 (-0.25%); split: -0.31%, +0.06%
SGPRs: 6128 -> 6184 (+0.91%)
VGPRs: 7220 -> 7208 (-0.17%); split: -0.28%, +0.11%
SpillSGPRs: 288 -> 287 (-0.35%)
Latency: 2199197 -> 2198338 (-0.04%); split: -0.20%, +0.17%
InvThroughput: 1613474 -> 1614303 (+0.05%); split: -0.07%, +0.12%
VClause: 2905 -> 2862 (-1.48%); split: -2.34%, +0.86%
SClause: 2366 -> 2378 (+0.51%); split: -0.17%, +0.68%
Copies: 17312 -> 17264 (-0.28%); split: -1.03%, +0.76%
PreSGPRs: 5080 -> 5004 (-1.50%)
PreVGPRs: 5656 -> 5640 (-0.28%)
VALU: 114097 -> 113831 (-0.23%); split: -0.31%, +0.07%
SALU: 16004 -> 15944 (-0.37%); split: -0.41%, +0.04%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:06 +00:00
Rhys Perry
684943bd1f
aco/gfx6: allow vgpr offset for global access with sgpr address
...
No reason why we can't use offen like normal buffer loads.
fossil-db (pitcairn):
Totals from 122 (0.20% of 62069) affected shaders:
MaxWaves: 521 -> 525 (+0.77%)
Instrs: 238341 -> 237228 (-0.47%); split: -0.57%, +0.10%
CodeSize: 1196260 -> 1188076 (-0.68%); split: -0.78%, +0.09%
SGPRs: 8752 -> 8760 (+0.09%); split: -0.64%, +0.73%
VGPRs: 10456 -> 10440 (-0.15%); split: -0.88%, +0.73%
Latency: 3958385 -> 3946186 (-0.31%); split: -0.38%, +0.07%
InvThroughput: 3097193 -> 3084417 (-0.41%); split: -0.42%, +0.01%
VClause: 4058 -> 4500 (+10.89%); split: -0.02%, +10.92%
SClause: 4511 -> 4500 (-0.24%); split: -0.42%, +0.18%
Copies: 31228 -> 31718 (+1.57%); split: -0.38%, +1.95%
PreSGPRs: 7211 -> 7461 (+3.47%)
PreVGPRs: 8174 -> 8147 (-0.33%); split: -0.34%, +0.01%
VALU: 174779 -> 173294 (-0.85%); split: -0.87%, +0.02%
SALU: 29138 -> 29641 (+1.73%); split: -0.09%, +1.82%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:05 +00:00
Rhys Perry
09a5af121f
aco: simplify the load callback
...
We can put these parameters in the LoadEmitInfo instead.
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:04 +00:00
Rhys Perry
101d0b791f
aco: add too-large constant offset to the address instead of the offset
...
In case the addition with the offset overflows.
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:03 +00:00
Rhys Perry
bd9a9a77fe
aco: use addition helper in emit_load
...
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:03 +00:00
Rhys Perry
8defd1bc16
aco/gfx6: disallow global access with sgpr address and two offsets
...
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:03 +00:00
Georg Lehmann
d45f375a9d
aco: only insert fp mode when needed
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35746 >
2025-07-10 13:48:50 +00:00
Daniel Schürmann
610a19cf31
aco/isel: allow to select SGPR defs for vectorized bcsel and logical operations
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:37 +00:00
Daniel Schürmann
d7477111d2
aco: split vectorized bcsel and bitwise logic VGPR definitions
...
This has a slightly negative effect on parallel-rdp, but positively affects FSR4.
Totals from 14 (0.02% of 79839) affected shaders: (Navi48)
Instrs: 63543 -> 63646 (+0.16%); split: -0.01%, +0.17%
CodeSize: 352888 -> 353608 (+0.20%); split: -0.02%, +0.23%
Latency: 1822354 -> 1825036 (+0.15%)
InvThroughput: 364683 -> 365738 (+0.29%); split: -0.04%, +0.32%
Copies: 9299 -> 9363 (+0.69%); split: -0.11%, +0.80%
PreVGPRs: 1381 -> 1394 (+0.94%)
VALU: 34511 -> 34575 (+0.19%); split: -0.03%, +0.21%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
764ee3a834
radv: don't lower subdword phis to scalar
...
Totals from 193 (0.24% of 79839) affected shaders: (Navi48)
MaxWaves: 6004 -> 6024 (+0.33%)
Instrs: 169276 -> 166784 (-1.47%); split: -3.01%, +1.53%
CodeSize: 940608 -> 915768 (-2.64%); split: -4.29%, +1.64%
VGPRs: 8012 -> 7716 (-3.69%); split: -3.99%, +0.30%
SpillVGPRs: 185 -> 0 (-inf%)
Scratch: 13568 -> 0 (-inf%)
Latency: 2159787 -> 2147084 (-0.59%); split: -2.86%, +2.28%
InvThroughput: 664022 -> 395859 (-40.38%); split: -42.59%, +2.21%
VClause: 2998 -> 2880 (-3.94%); split: -4.27%, +0.33%
SClause: 3117 -> 3120 (+0.10%)
Copies: 21290 -> 16278 (-23.54%); split: -24.74%, +1.20%
Branches: 4757 -> 4760 (+0.06%); split: -0.34%, +0.40%
PreSGPRs: 7369 -> 7378 (+0.12%); split: -0.11%, +0.23%
PreVGPRs: 4257 -> 3859 (-9.35%); split: -9.94%, +0.59%
VALU: 83173 -> 79804 (-4.05%); split: -5.68%, +1.63%
SALU: 36672 -> 37318 (+1.76%); split: -0.02%, +1.78%
VMEM: 4012 -> 3762 (-6.23%); split: -6.83%, +0.60%
SMEM: 4300 -> 4303 (+0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
fc2fcac04e
aco: allow vectorized nir_op_mov
...
nir_lower_phis_to_scalar() can create these with the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
3f35b1329e
aco: allow subdword vector-definitions on some VALU instructions
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
025306a95d
aco/isel: refactor emission of bitwise logical operations
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Georg Lehmann
82af226690
aco: remove unused swap_srcs from emit_vop3p_instruction
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35825 >
2025-07-09 07:23:09 +00:00
Georg Lehmann
96793fb0c1
aco/isel: implement 16bit vec2 shifts
...
The source bit size mismatch is a bit annoying, but it's still worth it to
vectorize these.
Foz-DB Navi48:
Totals from 85 (0.11% of 80251) affected shaders:
Instrs: 119073 -> 118827 (-0.21%); split: -0.21%, +0.00%
CodeSize: 669604 -> 667552 (-0.31%); split: -0.31%, +0.00%
VGPRs: 4796 -> 4736 (-1.25%)
Latency: 1907685 -> 1901983 (-0.30%); split: -0.32%, +0.02%
InvThroughput: 642603 -> 640680 (-0.30%); split: -0.33%, +0.03%
VClause: 2088 -> 2091 (+0.14%)
Copies: 18300 -> 18394 (+0.51%); split: -0.01%, +0.52%
Branches: 3452 -> 3440 (-0.35%)
VALU: 63378 -> 63144 (-0.37%); split: -0.37%, +0.00%
SALU: 23065 -> 23076 (+0.05%); split: -0.00%, +0.05%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35825 >
2025-07-09 07:23:08 +00:00
Daniel Schürmann
2c51a8870d
nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar()
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783 >
2025-07-08 15:33:59 +00:00