fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 20:38:06 +02:00

Author	SHA1	Message	Date
Daniel Schürmann	d3743dd7ba	aco/scheduler: improve scheduling heuristic Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The heuristic we are currently using still stems from the GCN era with the only adjustments being made for RDNA was to double (or triple) the wave count. This rewrite aims to detangle some concepts and provide more consistent results. - wave_factor: The purpose of this value is to reflect that RDNA SIMDs can accomodate twice as many waves as GCN SIMDs. - reg_file_multiple: This value accounts for the larger register file of wave32 and some RDNA3 families. - wave_minimum: Below this value, we don't sacrifice any waves. It corresponds to a register demand of 64 VGPRs in wave64. - occupancy_factor: Depending on target_waves and wave_factor, this controls the scheduling window sizes and number of moves. The main differences from the previous heuristic is a lower wave minimum and a slightly less aggressive reduction of waves. It also increases SMEM_MAX_MOVES in order to mitigate some of the changes from targeting less waves. Totals from 62777 (78.63% of 79839) affected shaders: (Navi48) MaxWaves: 1880983 -> 1848028 (-1.75%); split: +0.01%, -1.76% Instrs: 40904711 -> 40800797 (-0.25%); split: -0.39%, +0.14% CodeSize: 217132208 -> 216748832 (-0.18%); split: -0.29%, +0.12% VGPRs: 3019304 -> 3099596 (+2.66%); split: -0.11%, +2.77% Latency: 268857129 -> 265951122 (-1.08%); split: -1.33%, +0.25% InvThroughput: 40960938 -> 41044533 (+0.20%); split: -0.18%, +0.39% VClause: 794000 -> 782913 (-1.40%); split: -2.24%, +0.84% SClause: 1192476 -> 1150831 (-3.49%); split: -3.94%, +0.45% Copies: 2720470 -> 2700148 (-0.75%); split: -1.84%, +1.09% Branches: 785926 -> 785951 (+0.00%); split: -0.01%, +0.01% VALU: 22918411 -> 22890189 (-0.12%); split: -0.19%, +0.06% SALU: 5281201 -> 5289486 (+0.16%); split: -0.21%, +0.36% VOPD: 8790 -> 8685 (-1.19%); split: +1.08%, -2.28% Totals from 62081 (77.77% of 79825) affected shaders: (Navi31) MaxWaves: 1848555 -> 1812347 (-1.96%); split: +0.01%, -1.97% Instrs: 39794460 -> 39704180 (-0.23%); split: -0.39%, +0.16% CodeSize: 208987052 -> 208621524 (-0.17%); split: -0.31%, +0.13% VGPRs: 3046284 -> 3135156 (+2.92%); split: -0.11%, +3.03% Latency: 268863465 -> 265218186 (-1.36%); split: -1.59%, +0.23% InvThroughput: 41101515 -> 41167075 (+0.16%); split: -0.22%, +0.38% VClause: 795316 -> 774899 (-2.57%); split: -3.17%, +0.61% SClause: 1177294 -> 1135451 (-3.55%); split: -4.06%, +0.51% Copies: 2743254 -> 2725127 (-0.66%); split: -1.90%, +1.24% Branches: 801395 -> 801428 (+0.00%); split: -0.01%, +0.02% VALU: 23898938 -> 23871294 (-0.12%); split: -0.20%, +0.08% SALU: 3908807 -> 3919130 (+0.26%); split: -0.23%, +0.50% VOPD: 8529 -> 8500 (-0.34%); split: +1.29%, -1.63% Totals from 44996 (71.01% of 63370) affected shaders: (Vega10) MaxWaves: 307074 -> 304808 (-0.74%); split: +0.63%, -1.37% Instrs: 22743534 -> 22716240 (-0.12%); split: -0.22%, +0.10% CodeSize: 117284856 -> 117173212 (-0.10%); split: -0.19%, +0.09% SGPRs: 3249008 -> 3330480 (+2.51%); split: -0.36%, +2.87% VGPRs: 1901400 -> 1943880 (+2.23%); split: -0.60%, +2.83% Latency: 224839126 -> 222878477 (-0.87%); split: -1.19%, +0.31% InvThroughput: 114389570 -> 114316559 (-0.06%); split: -0.17%, +0.11% VClause: 482012 -> 473304 (-1.81%); split: -2.86%, +1.05% SClause: 757799 -> 717092 (-5.37%); split: -5.64%, +0.27% Copies: 2182735 -> 2183598 (+0.04%); split: -1.17%, +1.21% Branches: 396026 -> 395996 (-0.01%); split: -0.03%, +0.02% VALU: 16740283 -> 16728098 (-0.07%); split: -0.14%, +0.07% SALU: 2133575 -> 2145863 (+0.58%); split: -0.29%, +0.86% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30720>	2025-08-06 09:16:33 +00:00
Qiang Yu	196569b1a4	all: rename gl_shader_stage to mesa_shader_stage It's not only for GL, change to a generic name. Use command: find . -type f -not -path '/.git/' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} + Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>	2025-08-06 10:28:40 +08:00
Rhys Perry	76c96bf558	aco: fix possible scratch offset overflow We split vector load/store, so consider that we might add to the constant offset. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>	2025-08-04 15:06:44 +00:00
Rhys Perry	44ab4ad732	aco: align scratch size after isel Make it safe for VGPR spilling if it's not a multiple of 4. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>	2025-08-04 15:06:43 +00:00
Rhys Perry	ab10604924	aco/gfx12: fix printing of temporal hints Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>	2025-08-04 15:06:41 +00:00
Rhys Perry	cec845079e	ac/nir/lower_ps: remove barrier for end_invocation_interlock Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details SPIR-V->NIR now inserts this barrier itself. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513>	2025-08-04 09:30:06 +00:00
Daniel Schürmann	4ca3cc5a1a	aco/ra: propagate precolor affinities through parallelcopies and tied definitions Totals from 214 (0.27% of 79839) affected shaders: (Navi48) Instrs: 65339 -> 65311 (-0.04%); split: -0.05%, +0.00% CodeSize: 352616 -> 350952 (-0.47%); split: -0.55%, +0.07% VGPRs: 9984 -> 9960 (-0.24%) Latency: 207556 -> 207508 (-0.02%); split: -0.03%, +0.01% InvThroughput: 40422 -> 40397 (-0.06%) Copies: 3180 -> 3155 (-0.79%) VALU: 38347 -> 38322 (-0.07%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	a667d9a68d	aco/ra: propagate precolor affinities through phis Totals from 917 (1.15% of 79839) affected shaders: (Navi48) Instrs: 3217861 -> 3216947 (-0.03%); split: -0.04%, +0.01% CodeSize: 17427204 -> 17432264 (+0.03%); split: -0.06%, +0.09% VGPRs: 65328 -> 65316 (-0.02%) Latency: 35336268 -> 35335528 (-0.00%); split: -0.01%, +0.01% InvThroughput: 7305032 -> 7302187 (-0.04%); split: -0.04%, +0.00% SClause: 120537 -> 120553 (+0.01%); split: -0.01%, +0.02% Copies: 307257 -> 306852 (-0.13%); split: -0.21%, +0.08% Branches: 115744 -> 115743 (-0.00%) VALU: 1572522 -> 1572183 (-0.02%); split: -0.02%, +0.00% SALU: 574229 -> 574155 (-0.01%); split: -0.05%, +0.04% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	2ddd8ef0a3	aco/ra: don't optimize encodings on precolor affinity mismatch Totals from 238 (0.30% of 79839) affected shaders: (Navi48) Instrs: 137836 -> 137176 (-0.48%); split: -0.50%, +0.02% CodeSize: 728616 -> 728668 (+0.01%); split: -0.06%, +0.07% Latency: 1503248 -> 1500202 (-0.20%); split: -0.56%, +0.36% InvThroughput: 297725 -> 296715 (-0.34%); split: -0.70%, +0.36% Copies: 9390 -> 8825 (-6.02%); split: -6.33%, +0.31% VALU: 89861 -> 89296 (-0.63%); split: -0.66%, +0.03% SALU: 13166 -> 13167 (+0.01%); split: -0.05%, +0.06% Suggested-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	93606a19c6	aco/ra: collect register affinities for all precolored operands. Totals from 1280 (1.60% of 79839) affected shaders: (Navi48) Instrs: 817363 -> 812639 (-0.58%); split: -0.58%, +0.00% CodeSize: 4262644 -> 4243540 (-0.45%); split: -0.45%, +0.00% VGPRs: 61692 -> 61668 (-0.04%) Latency: 4354318 -> 4347818 (-0.15%); split: -0.15%, +0.00% InvThroughput: 711914 -> 707698 (-0.59%); split: -0.59%, +0.00% VClause: 14685 -> 14677 (-0.05%); split: -0.09%, +0.03% SClause: 25623 -> 25621 (-0.01%) Copies: 50663 -> 46242 (-8.73%); split: -8.73%, +0.00% VALU: 427744 -> 423323 (-1.03%); split: -1.03%, +0.00% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	e32eec52f0	aco/ra: generalize register affinities Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	caa2c22d8b	aco/tests: Fix p_startpgm definitions to registers Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Alyssa Rosenzweig	cc6e3b84cb	treewide: use nir_def_as_* Via Coccinelle patch: @@ expression definition; @@ -nir_instr_as_alu(definition->parent_instr) +nir_def_as_alu(definition) @@ expression definition; @@ -nir_instr_as_intrinsic(definition->parent_instr) +nir_def_as_intrinsic(definition) @@ expression definition; @@ -nir_instr_as_phi(definition->parent_instr) +nir_def_as_phi(definition) @@ expression definition; @@ -nir_instr_as_load_const(definition->parent_instr) +nir_def_as_load_const(definition) @@ expression definition; @@ -nir_instr_as_deref(definition->parent_instr) +nir_def_as_deref(definition) @@ expression definition; @@ -nir_instr_as_tex(definition->parent_instr) +nir_def_as_tex(definition) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Marek Olšák <maraeo@gmail.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>	2025-08-01 15:34:24 +00:00
Antonio Ospite	ddf2aa3a4d	build: avoid redefining unreachable() which is standard in C23 In the C23 standard unreachable() is now a predefined function-like macro in <stddef.h> See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in And this causes build errors when building for C23: ----------------------------------------------------------------------- In file included from ../src/util/log.h:30, from ../src/util/log.c:30: ../src/util/macros.h:123:9: warning: "unreachable" redefined 123 \| #define unreachable(str) \ \| ^~~~~~~~~~~ In file included from ../src/util/macros.h:31: /usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition 456 \| #define unreachable() (__builtin_unreachable ()) \| ^~~~~~~~~~~ ----------------------------------------------------------------------- So don't redefine it with the same name, but use the name UNREACHABLE() to also signify it's a macro. Using a different name also makes sense because the behavior of the macro was extending the one of __builtin_unreachable() anyway, and it also had a different signature, accepting one argument, compared to the standard unreachable() with no arguments. This change improves the chances of building mesa with the C23 standard, which for instance is the default in recent AOSP versions. All the instances of the macro, including the definition, were updated with the following command line: git grep -l '[^_]unreachable(' -- "src/**" \| sort \| uniq \| \ while read file; \ do \ sed -e 's/$[^_]$unreachable(/\1UNREACHABLE(/g' -i "$file"; \ done && \ sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>	2025-07-31 17:49:42 +00:00
Georg Lehmann	a6a6c2f691	aco/ra: convert bitwise instruction to gfx11+ 16bit on demand Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The 32bit versions are smaller, allow more optimizations and VOPD, so only use the 16bit opcodes if nessecary. Foz-DB Navi31: Totals from 84 (0.10% of 80237) affected shaders: Instrs: 176673 -> 176347 (-0.18%); split: -0.20%, +0.01% CodeSize: 970148 -> 969716 (-0.04%); split: -0.08%, +0.03% VGPRs: 5876 -> 5864 (-0.20%) Latency: 2805974 -> 2805674 (-0.01%); split: -0.02%, +0.01% InvThroughput: 769007 -> 768738 (-0.03%); split: -0.04%, +0.01% VClause: 2593 -> 2597 (+0.15%) Copies: 23749 -> 23487 (-1.10%); split: -1.11%, +0.00% VALU: 107124 -> 106862 (-0.24%); split: -0.25%, +0.00% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919>	2025-07-31 12:07:07 +00:00
Georg Lehmann	404e1f13e8	aco/print_asm: use real true16 instr on gfx11+ Fake16 doesn't print opsel on v_cndmask_b16, so it looks really broken. Restrict to LLVM20+ because older versions have incomplete tru16 support. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919>	2025-07-31 12:07:07 +00:00
Georg Lehmann	b12db991eb	aco/gfx10: optimize subgroupRotate(x, 32) and subgroupShuffleXor(x, 32) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We don't have v_permlane64_b32 yet, but we can still optimize it using shared vgprs. Using the DPP16 row mask, we can even avoid writing exec. With v0 input/output and v24/v25 as shared vgprs, this results in: v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>	2025-07-29 06:33:20 +00:00
Georg Lehmann	eb4df58a3d	aco/isel: refactor shared vgpr usage Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>	2025-07-29 06:33:20 +00:00
Georg Lehmann	8a2aca8d6f	aco/select_alu: avoid vector get_alu_src for instructions with scalar operands Foz-DB Navi21: Totals from 1 (0.00% of 80237) affected shaders: Instrs: 22 -> 21 (-4.55%) CodeSize: 112 -> 108 (-3.57%) Latency: 392 -> 386 (-1.53%) InvThroughput: 25 -> 24 (-4.00%) Copies: 4 -> 3 (-25.00%) PreVGPRs: 8 -> 4 (-50.00%) VALU: 10 -> 9 (-10.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728>	2025-07-29 06:07:15 +00:00
Georg Lehmann	ad9c340d86	aco: insert VALU s_delay_alu for WMMA This should avoid some SIMD stalls. I think this special case was added to try to handle this case: First Instruction: WMMA Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction If I read it correctly, we shouldn't need a delay if the type is the same and no modifier is used. That's kind of complex to handle, so leave it for now. Not inserting any delays likely hurts more than this. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>	2025-07-29 05:48:29 +00:00
Georg Lehmann	413d0d2ec8	aco/statistics: update GFX12 WMMA cost Based on marketing numbers, but they seem to match RGP. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>	2025-07-29 05:48:29 +00:00
Georg Lehmann	8f61c85880	aco/statistics: add latency to WMMA Assume the normal VALU latency of 4 cycles. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>	2025-07-29 05:48:29 +00:00
Georg Lehmann	004f8aa2f4	aco: optimize get_alu_src with constant source and size > 1 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Emulated FSR4, Navi31: Totals from 14 (100.00% of 14) affected shaders: MaxWaves: 130 -> 131 (+0.77%) Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09% CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07% VGPRs: 2544 -> 2520 (-0.94%) SpillVGPRs: 92 -> 89 (-3.26%) Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66% InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02% VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80% Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51% PreSGPRs: 504 -> 519 (+2.98%) PreVGPRs: 2460 -> 2448 (-0.49%) VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05% SALU: 1231 -> 1384 (+12.43%) VMEM: 3807 -> 3801 (-0.16%) VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090>	2025-07-25 11:33:00 +00:00
Alyssa Rosenzweig	8a1a410389	treewide: use SWAP macro Via Coccinelle patch + manual clean up: @@ identifier temporary, a, b; type T; @@ -T temporary = a; -a = b; -b = temporary; +SWAP(a, b); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297>	2025-07-23 19:49:47 +00:00
Georg Lehmann	c80daf934c	aco: supported 64bit or vectorized bitfield_select No Foz-DB changes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>	2025-07-21 20:42:32 +00:00
Georg Lehmann	14b36fb790	aco/isel: don't create literal operands for SALU bitfield_select Let the optimizer handle this. No Foz-DB changes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>	2025-07-21 20:42:32 +00:00
Rhys Perry	256a7cc4f0	aco/isel: optimize uniform vote Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details fossil-db (navi21): Totals from 21 (0.03% of 79825) affected shaders: Instrs: 44939 -> 44913 (-0.06%) CodeSize: 236612 -> 236504 (-0.05%) Latency: 509496 -> 509349 (-0.03%) Copies: 3624 -> 3620 (-0.11%) SALU: 5458 -> 5432 (-0.48%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177>	2025-07-21 14:19:58 +00:00
Rhys Perry	0bceb07c03	aco: optimize uniform s_not fossil-db (navi21): Totals from 1442 (1.81% of 79825) affected shaders: Instrs: 2224425 -> 2220624 (-0.17%); split: -0.18%, +0.01% CodeSize: 11778260 -> 11763264 (-0.13%); split: -0.14%, +0.01% Latency: 13396254 -> 13392346 (-0.03%); split: -0.03%, +0.00% InvThroughput: 3145007 -> 3144982 (-0.00%); split: -0.00%, +0.00% SClause: 53037 -> 53035 (-0.00%); split: -0.01%, +0.01% Copies: 185852 -> 184777 (-0.58%); split: -0.71%, +0.13% Branches: 60799 -> 60805 (+0.01%) PreSGPRs: 62940 -> 62954 (+0.02%); split: -0.01%, +0.03% SALU: 298564 -> 294761 (-1.27%); split: -1.34%, +0.06% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177>	2025-07-21 14:19:58 +00:00
Rhys Perry	85b31c9c4d	aco/opt: add some comments Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177>	2025-07-21 14:19:58 +00:00
Rhys Perry	2fff1db5c8	aco: don't both flip s_cselect and label uniform_bool Otherwise, the uniform_bool could point to a temporary which might be DCE'd, since it's not used by the s_cselect. fossil-db (navi21): Totals from 1 (0.00% of 79825) affected shaders: Instrs: 1267 -> 1269 (+0.16%) Latency: 91071 -> 91103 (+0.04%) SALU: 283 -> 285 (+0.71%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177>	2025-07-21 14:19:58 +00:00
Rhys Perry	2239a5e9ae	aco: stop labeling first def of and(uniform_bool/uniform_bitwise, exec) The optimizer shouldn't consider a lanemask to be a uniform boolean unless it's either 0 or -1. Optimizations involving s_not/s_xor might not work properly otherwise. No fossil-db changes (navi21). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177>	2025-07-21 14:19:58 +00:00
Rhys Perry	d81add3dda	aco: optimize s_and(s_cselect, exec) fossil-db (navi21): Totals from 62 (0.08% of 79825) affected shaders: Instrs: 178887 -> 178745 (-0.08%) CodeSize: 942980 -> 942328 (-0.07%) Latency: 1274513 -> 1273653 (-0.07%) InvThroughput: 213862 -> 213774 (-0.04%); split: -0.04%, +0.00% SALU: 26446 -> 26301 (-0.55%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36005>	2025-07-21 08:27:01 +00:00
Rhys Perry	b2e5fc9451	aco/lower_phis: add bld_before_logical_end helper Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36005>	2025-07-21 08:27:01 +00:00
Georg Lehmann	d672737372	nir,aco: add byte_perm_amd Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>	2025-07-16 11:46:52 +00:00
Natalie Vock	ac96594b86	aco/isel: Use vector-aligned operands for ds_stack_push8_pop1_rtn_b32 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	b2a95d2133	aco/ra: Add affinities for DS vector-aligned operands Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	df5495b934	aco/assembler: Support vector-aligned operands on DS instructions Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	ea66a8d1c5	aco,nir: Add support for GFX12 ds_bvh_stack_push8_pop1_rtn_b32 instruction Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	9707b30965	nir,aco: Add ds_bvh_stack_rtn This is a ds instruction that also overwrites its first input, so introduce a new ds format with two outputs. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:39 +00:00
Natalie Vock	c515f1fd58	aco: Use vector-aligned operands for image_bvh8_intersect_ray Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:38 +00:00
Natalie Vock	c279dd6e61	aco: Support vector-aligned ops fixed to defs Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:38 +00:00
Natalie Vock	f17fe05e32	aco/isel: Improve vector splits for image_bvh8_intersect_ray Using split_vector to split everything into scalars allows copy-prop to eliminate the final p_create_vector. Considerably reduces copies and register thrashing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:38 +00:00
Marek Olšák	d12bc87dda	aco: implement upcasting 16-bit types for 32-bit color buffers in PS epilog This was missed when implementing the change for LLVM. Fixes: `fbbf029529` - radeonsi: enable 16-bit mediump IO for PS outputs only, and VS->PS with env var Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36112>	2025-07-15 18:28:30 +00:00
Daniel Schürmann	47ef60cbf1	aco/ra: always use bytes for register stride requirements Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details instead of a mixture between dwords and bytes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36053>	2025-07-14 08:45:29 +00:00
Marek Olšák	5ded4f3c7d	aco: remove unused aco_symbol_lds_ngg_gs_out_vertex_base Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>	2025-07-12 10:28:21 +00:00
Georg Lehmann	92d433c54a	aco: vectorize conversions from 8bit to 16bit Massively helps emulated fp8 performance. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854>	2025-07-12 08:39:15 +00:00
Georg Lehmann	7fece5592c	aco: vectorize 16bit extracts Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854>	2025-07-12 08:39:14 +00:00
Rhys Perry	3b9a1ce4ca	aco: remove RegClass::as_subdword Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:09 +00:00
Rhys Perry	9c55b0ca20	aco: use MUBUF for global access with SGPR address on GFX7/8 This should be better than using FLAT, which only supports a VGPR address. fossil-db (polaris10): Totals from 159 (0.26% of 62070) affected shaders: MaxWaves: 789 -> 803 (+1.77%) Instrs: 234284 -> 230557 (-1.59%); split: -1.71%, +0.12% CodeSize: 1212324 -> 1186716 (-2.11%); split: -2.23%, +0.11% SGPRs: 10504 -> 10712 (+1.98%) VGPRs: 10556 -> 10236 (-3.03%); split: -3.37%, +0.34% SpillSGPRs: 579 -> 577 (-0.35%) Latency: 3903056 -> 3875625 (-0.70%); split: -0.87%, +0.16% InvThroughput: 3139443 -> 3114426 (-0.80%); split: -0.86%, +0.07% VClause: 4205 -> 4433 (+5.42%); split: -0.43%, +5.85% SClause: 4461 -> 4445 (-0.36%); split: -0.43%, +0.07% Copies: 30889 -> 31507 (+2.00%); split: -0.29%, +2.29% PreSGPRs: 7370 -> 7609 (+3.24%) PreVGPRs: 8339 -> 8193 (-1.75%) VALU: 175025 -> 170232 (-2.74%); split: -2.77%, +0.03% SALU: 27269 -> 28532 (+4.63%); split: -0.01%, +4.64% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:08 +00:00
Rhys Perry	0094e6c32a	aco: optimize lds-only or vmem-only flat access fossil-db (polaris10): Totals from 138 (0.22% of 62070) affected shaders: Instrs: 233452 -> 234436 (+0.42%) CodeSize: 1209392 -> 1213220 (+0.32%) Latency: 3934496 -> 3928089 (-0.16%); split: -0.17%, +0.00% InvThroughput: 3040782 -> 3038562 (-0.07%); split: -0.07%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:08 +00:00

1 2 3 4 5 ...

3869 commits