Georg Lehmann
eb4df58a3d
aco/isel: refactor shared vgpr usage
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390 >
2025-07-29 06:33:20 +00:00
Georg Lehmann
8a2aca8d6f
aco/select_alu: avoid vector get_alu_src for instructions with scalar operands
...
Foz-DB Navi21:
Totals from 1 (0.00% of 80237) affected shaders:
Instrs: 22 -> 21 (-4.55%)
CodeSize: 112 -> 108 (-3.57%)
Latency: 392 -> 386 (-1.53%)
InvThroughput: 25 -> 24 (-4.00%)
Copies: 4 -> 3 (-25.00%)
PreVGPRs: 8 -> 4 (-50.00%)
VALU: 10 -> 9 (-10.00%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728 >
2025-07-29 06:07:15 +00:00
Georg Lehmann
ad9c340d86
aco: insert VALU s_delay_alu for WMMA
...
This should avoid some SIMD stalls.
I think this special case was added to try to handle this case:
First Instruction: WMMA
Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C
Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction
If I read it correctly, we shouldn't need a delay if the type is the same and no
modifier is used. That's kind of complex to handle, so leave it for now.
Not inserting any delays likely hurts more than this.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
413d0d2ec8
aco/statistics: update GFX12 WMMA cost
...
Based on marketing numbers, but they seem to match RGP.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
8f61c85880
aco/statistics: add latency to WMMA
...
Assume the normal VALU latency of 4 cycles.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328 >
2025-07-29 05:48:29 +00:00
Georg Lehmann
004f8aa2f4
aco: optimize get_alu_src with constant source and size > 1
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Emulated FSR4, Navi31:
Totals from 14 (100.00% of 14) affected shaders:
MaxWaves: 130 -> 131 (+0.77%)
Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09%
CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07%
VGPRs: 2544 -> 2520 (-0.94%)
SpillVGPRs: 92 -> 89 (-3.26%)
Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66%
InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02%
VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80%
Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51%
PreSGPRs: 504 -> 519 (+2.98%)
PreVGPRs: 2460 -> 2448 (-0.49%)
VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05%
SALU: 1231 -> 1384 (+12.43%)
VMEM: 3807 -> 3801 (-0.16%)
VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090 >
2025-07-25 11:33:00 +00:00
Alyssa Rosenzweig
8a1a410389
treewide: use SWAP macro
...
Via Coccinelle patch + manual clean up:
@@
identifier temporary, a, b;
type T;
@@
-T temporary = a;
-a = b;
-b = temporary;
+SWAP(a, b);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297 >
2025-07-23 19:49:47 +00:00
Georg Lehmann
c80daf934c
aco: supported 64bit or vectorized bitfield_select
...
No Foz-DB changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141 >
2025-07-21 20:42:32 +00:00
Georg Lehmann
14b36fb790
aco/isel: don't create literal operands for SALU bitfield_select
...
Let the optimizer handle this.
No Foz-DB changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141 >
2025-07-21 20:42:32 +00:00
Rhys Perry
256a7cc4f0
aco/isel: optimize uniform vote
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
fossil-db (navi21):
Totals from 21 (0.03% of 79825) affected shaders:
Instrs: 44939 -> 44913 (-0.06%)
CodeSize: 236612 -> 236504 (-0.05%)
Latency: 509496 -> 509349 (-0.03%)
Copies: 3624 -> 3620 (-0.11%)
SALU: 5458 -> 5432 (-0.48%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
0bceb07c03
aco: optimize uniform s_not
...
fossil-db (navi21):
Totals from 1442 (1.81% of 79825) affected shaders:
Instrs: 2224425 -> 2220624 (-0.17%); split: -0.18%, +0.01%
CodeSize: 11778260 -> 11763264 (-0.13%); split: -0.14%, +0.01%
Latency: 13396254 -> 13392346 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 3145007 -> 3144982 (-0.00%); split: -0.00%, +0.00%
SClause: 53037 -> 53035 (-0.00%); split: -0.01%, +0.01%
Copies: 185852 -> 184777 (-0.58%); split: -0.71%, +0.13%
Branches: 60799 -> 60805 (+0.01%)
PreSGPRs: 62940 -> 62954 (+0.02%); split: -0.01%, +0.03%
SALU: 298564 -> 294761 (-1.27%); split: -1.34%, +0.06%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
85b31c9c4d
aco/opt: add some comments
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
2fff1db5c8
aco: don't both flip s_cselect and label uniform_bool
...
Otherwise, the uniform_bool could point to a temporary which might be
DCE'd, since it's not used by the s_cselect.
fossil-db (navi21):
Totals from 1 (0.00% of 79825) affected shaders:
Instrs: 1267 -> 1269 (+0.16%)
Latency: 91071 -> 91103 (+0.04%)
SALU: 283 -> 285 (+0.71%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
2239a5e9ae
aco: stop labeling first def of and(uniform_bool/uniform_bitwise, exec)
...
The optimizer shouldn't consider a lanemask to be a uniform boolean unless
it's either 0 or -1. Optimizations involving s_not/s_xor might not work
properly otherwise.
No fossil-db changes (navi21).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177 >
2025-07-21 14:19:58 +00:00
Rhys Perry
d81add3dda
aco: optimize s_and(s_cselect, exec)
...
fossil-db (navi21):
Totals from 62 (0.08% of 79825) affected shaders:
Instrs: 178887 -> 178745 (-0.08%)
CodeSize: 942980 -> 942328 (-0.07%)
Latency: 1274513 -> 1273653 (-0.07%)
InvThroughput: 213862 -> 213774 (-0.04%); split: -0.04%, +0.00%
SALU: 26446 -> 26301 (-0.55%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36005 >
2025-07-21 08:27:01 +00:00
Rhys Perry
b2e5fc9451
aco/lower_phis: add bld_before_logical_end helper
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36005 >
2025-07-21 08:27:01 +00:00
Georg Lehmann
d672737372
nir,aco: add byte_perm_amd
...
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115 >
2025-07-16 11:46:52 +00:00
Natalie Vock
ac96594b86
aco/isel: Use vector-aligned operands for ds_stack_push8_pop1_rtn_b32
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
b2a95d2133
aco/ra: Add affinities for DS vector-aligned operands
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
df5495b934
aco/assembler: Support vector-aligned operands on DS instructions
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
ea66a8d1c5
aco,nir: Add support for GFX12 ds_bvh_stack_push8_pop1_rtn_b32 instruction
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:40 +00:00
Natalie Vock
9707b30965
nir,aco: Add ds_bvh_stack_rtn
...
This is a ds instruction that also overwrites its first input, so
introduce a new ds format with two outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:39 +00:00
Natalie Vock
c515f1fd58
aco: Use vector-aligned operands for image_bvh8_intersect_ray
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Natalie Vock
c279dd6e61
aco: Support vector-aligned ops fixed to defs
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Natalie Vock
f17fe05e32
aco/isel: Improve vector splits for image_bvh8_intersect_ray
...
Using split_vector to split everything into scalars allows copy-prop to
eliminate the final p_create_vector. Considerably reduces copies and
register thrashing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269 >
2025-07-15 21:34:38 +00:00
Marek Olšák
d12bc87dda
aco: implement upcasting 16-bit types for 32-bit color buffers in PS epilog
...
This was missed when implementing the change for LLVM.
Fixes: fbbf029529 - radeonsi: enable 16-bit mediump IO for PS outputs only, and VS->PS with env var
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36112 >
2025-07-15 18:28:30 +00:00
Daniel Schürmann
47ef60cbf1
aco/ra: always use bytes for register stride requirements
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
instead of a mixture between dwords and bytes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36053 >
2025-07-14 08:45:29 +00:00
Marek Olšák
5ded4f3c7d
aco: remove unused aco_symbol_lds_ngg_gs_out_vertex_base
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529 >
2025-07-12 10:28:21 +00:00
Georg Lehmann
92d433c54a
aco: vectorize conversions from 8bit to 16bit
...
Massively helps emulated fp8 performance.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854 >
2025-07-12 08:39:15 +00:00
Georg Lehmann
7fece5592c
aco: vectorize 16bit extracts
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854 >
2025-07-12 08:39:14 +00:00
Rhys Perry
3b9a1ce4ca
aco: remove RegClass::as_subdword
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:09 +00:00
Rhys Perry
9c55b0ca20
aco: use MUBUF for global access with SGPR address on GFX7/8
...
This should be better than using FLAT, which only supports a VGPR address.
fossil-db (polaris10):
Totals from 159 (0.26% of 62070) affected shaders:
MaxWaves: 789 -> 803 (+1.77%)
Instrs: 234284 -> 230557 (-1.59%); split: -1.71%, +0.12%
CodeSize: 1212324 -> 1186716 (-2.11%); split: -2.23%, +0.11%
SGPRs: 10504 -> 10712 (+1.98%)
VGPRs: 10556 -> 10236 (-3.03%); split: -3.37%, +0.34%
SpillSGPRs: 579 -> 577 (-0.35%)
Latency: 3903056 -> 3875625 (-0.70%); split: -0.87%, +0.16%
InvThroughput: 3139443 -> 3114426 (-0.80%); split: -0.86%, +0.07%
VClause: 4205 -> 4433 (+5.42%); split: -0.43%, +5.85%
SClause: 4461 -> 4445 (-0.36%); split: -0.43%, +0.07%
Copies: 30889 -> 31507 (+2.00%); split: -0.29%, +2.29%
PreSGPRs: 7370 -> 7609 (+3.24%)
PreVGPRs: 8339 -> 8193 (-1.75%)
VALU: 175025 -> 170232 (-2.74%); split: -2.77%, +0.03%
SALU: 27269 -> 28532 (+4.63%); split: -0.01%, +4.64%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00
Rhys Perry
0094e6c32a
aco: optimize lds-only or vmem-only flat access
...
fossil-db (polaris10):
Totals from 138 (0.22% of 62070) affected shaders:
Instrs: 233452 -> 234436 (+0.42%)
CodeSize: 1209392 -> 1213220 (+0.32%)
Latency: 3934496 -> 3928089 (-0.16%); split: -0.17%, +0.00%
InvThroughput: 3040782 -> 3038562 (-0.07%); split: -0.07%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00
Rhys Perry
d705b6198c
aco: simplify waitcnt insertion for flat access
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:08 +00:00
Rhys Perry
6396a82695
aco: return a format in lower_global_address
...
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:07 +00:00
Rhys Perry
89c2c94147
aco: increase global constant offset limit slightly
...
Before, this wasn't actually the maximum value plus one.
fossil-db (navi10):
Totals from 4 (0.01% of 63207) affected shaders:
(no stat changes)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:07 +00:00
Rhys Perry
d7dcd81c77
aco/gfx6: allow both constant and gpr offset for global with sgpr address
...
fossil-db (pitcairn):
Totals from 81 (0.13% of 62069) affected shaders:
MaxWaves: 332 -> 335 (+0.90%)
Instrs: 150087 -> 149737 (-0.23%); split: -0.30%, +0.06%
CodeSize: 754636 -> 752712 (-0.25%); split: -0.31%, +0.06%
SGPRs: 6128 -> 6184 (+0.91%)
VGPRs: 7220 -> 7208 (-0.17%); split: -0.28%, +0.11%
SpillSGPRs: 288 -> 287 (-0.35%)
Latency: 2199197 -> 2198338 (-0.04%); split: -0.20%, +0.17%
InvThroughput: 1613474 -> 1614303 (+0.05%); split: -0.07%, +0.12%
VClause: 2905 -> 2862 (-1.48%); split: -2.34%, +0.86%
SClause: 2366 -> 2378 (+0.51%); split: -0.17%, +0.68%
Copies: 17312 -> 17264 (-0.28%); split: -1.03%, +0.76%
PreSGPRs: 5080 -> 5004 (-1.50%)
PreVGPRs: 5656 -> 5640 (-0.28%)
VALU: 114097 -> 113831 (-0.23%); split: -0.31%, +0.07%
SALU: 16004 -> 15944 (-0.37%); split: -0.41%, +0.04%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:06 +00:00
Rhys Perry
684943bd1f
aco/gfx6: allow vgpr offset for global access with sgpr address
...
No reason why we can't use offen like normal buffer loads.
fossil-db (pitcairn):
Totals from 122 (0.20% of 62069) affected shaders:
MaxWaves: 521 -> 525 (+0.77%)
Instrs: 238341 -> 237228 (-0.47%); split: -0.57%, +0.10%
CodeSize: 1196260 -> 1188076 (-0.68%); split: -0.78%, +0.09%
SGPRs: 8752 -> 8760 (+0.09%); split: -0.64%, +0.73%
VGPRs: 10456 -> 10440 (-0.15%); split: -0.88%, +0.73%
Latency: 3958385 -> 3946186 (-0.31%); split: -0.38%, +0.07%
InvThroughput: 3097193 -> 3084417 (-0.41%); split: -0.42%, +0.01%
VClause: 4058 -> 4500 (+10.89%); split: -0.02%, +10.92%
SClause: 4511 -> 4500 (-0.24%); split: -0.42%, +0.18%
Copies: 31228 -> 31718 (+1.57%); split: -0.38%, +1.95%
PreSGPRs: 7211 -> 7461 (+3.47%)
PreVGPRs: 8174 -> 8147 (-0.33%); split: -0.34%, +0.01%
VALU: 174779 -> 173294 (-0.85%); split: -0.87%, +0.02%
SALU: 29138 -> 29641 (+1.73%); split: -0.09%, +1.82%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:05 +00:00
Rhys Perry
09a5af121f
aco: simplify the load callback
...
We can put these parameters in the LoadEmitInfo instead.
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:04 +00:00
Rhys Perry
101d0b791f
aco: add too-large constant offset to the address instead of the offset
...
In case the addition with the offset overflows.
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:03 +00:00
Rhys Perry
bd9a9a77fe
aco: use addition helper in emit_load
...
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:03 +00:00
Rhys Perry
8defd1bc16
aco/gfx6: disallow global access with sgpr address and two offsets
...
No fossil-db changes (navi10, pitcairn).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:03 +00:00
Rhys Perry
1c0d065dae
aco: count flat as vmem in statistics
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465 >
2025-07-11 12:15:03 +00:00
Georg Lehmann
d45f375a9d
aco: only insert fp mode when needed
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35746 >
2025-07-10 13:48:50 +00:00
Georg Lehmann
46c1bd1147
aco: add a dedicated pass for better float MODE insertion
...
Foz-DB Navi48:
Totals from 14 (0.02% of 80251) affected shaders:
Instrs: 13998 -> 11684 (-16.53%)
CodeSize: 104464 -> 86260 (-17.43%)
Latency: 108722 -> 106667 (-1.89%)
InvThroughput: 100332 -> 100324 (-0.01%)
VClause: 621 -> 595 (-4.19%); split: -4.99%, +0.81%
VALU: 6875 -> 6871 (-0.06%)
SALU: 3256 -> 1015 (-68.83%)
VOPD: 1328 -> 1332 (+0.30%)
Removes the s_setreg spam in FSR4.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35746 >
2025-07-10 13:48:50 +00:00
Daniel Schürmann
610a19cf31
aco/isel: allow to select SGPR defs for vectorized bcsel and logical operations
...
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No fossil changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:37 +00:00
Daniel Schürmann
d7477111d2
aco: split vectorized bcsel and bitwise logic VGPR definitions
...
This has a slightly negative effect on parallel-rdp, but positively affects FSR4.
Totals from 14 (0.02% of 79839) affected shaders: (Navi48)
Instrs: 63543 -> 63646 (+0.16%); split: -0.01%, +0.17%
CodeSize: 352888 -> 353608 (+0.20%); split: -0.02%, +0.23%
Latency: 1822354 -> 1825036 (+0.15%)
InvThroughput: 364683 -> 365738 (+0.29%); split: -0.04%, +0.32%
Copies: 9299 -> 9363 (+0.69%); split: -0.11%, +0.80%
PreVGPRs: 1381 -> 1394 (+0.94%)
VALU: 34511 -> 34575 (+0.19%); split: -0.03%, +0.21%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
764ee3a834
radv: don't lower subdword phis to scalar
...
Totals from 193 (0.24% of 79839) affected shaders: (Navi48)
MaxWaves: 6004 -> 6024 (+0.33%)
Instrs: 169276 -> 166784 (-1.47%); split: -3.01%, +1.53%
CodeSize: 940608 -> 915768 (-2.64%); split: -4.29%, +1.64%
VGPRs: 8012 -> 7716 (-3.69%); split: -3.99%, +0.30%
SpillVGPRs: 185 -> 0 (-inf%)
Scratch: 13568 -> 0 (-inf%)
Latency: 2159787 -> 2147084 (-0.59%); split: -2.86%, +2.28%
InvThroughput: 664022 -> 395859 (-40.38%); split: -42.59%, +2.21%
VClause: 2998 -> 2880 (-3.94%); split: -4.27%, +0.33%
SClause: 3117 -> 3120 (+0.10%)
Copies: 21290 -> 16278 (-23.54%); split: -24.74%, +1.20%
Branches: 4757 -> 4760 (+0.06%); split: -0.34%, +0.40%
PreSGPRs: 7369 -> 7378 (+0.12%); split: -0.11%, +0.23%
PreVGPRs: 4257 -> 3859 (-9.35%); split: -9.94%, +0.59%
VALU: 83173 -> 79804 (-4.05%); split: -5.68%, +1.63%
SALU: 36672 -> 37318 (+1.76%); split: -0.02%, +1.78%
VMEM: 4012 -> 3762 (-6.23%); split: -6.83%, +0.60%
SMEM: 4300 -> 4303 (+0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
fc2fcac04e
aco: allow vectorized nir_op_mov
...
nir_lower_phis_to_scalar() can create these with the next commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00
Daniel Schürmann
3f35b1329e
aco: allow subdword vector-definitions on some VALU instructions
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784 >
2025-07-09 14:10:36 +00:00