fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 22:08:10 +02:00

Author	SHA1	Message	Date
Georg Lehmann	883b1ca364	aco: disable wqm for tex loads when not needed By only executing VMEM loads for lanes where the result is used, we can save bandwidth. The NIR pass only handles tex for now, but those are most common anyway. We can extend it handle image/ssbo/ubo/global loads in the future. Foz-DB GFX1201: Totals from 32633 (40.66% of 80251) affected shaders: Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46% CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81% VGPRs: 1481868 -> 1481712 (-0.01%) SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45% Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30% InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12% VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13% SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51% Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96% Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04% PreSGPRs: 1330303 -> 1416540 (+6.48%) PreVGPRs: 964864 -> 964863 (-0.00%) VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01% SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	7159fd21f8	aco: don't restrict vmem load scheduling by inserting p_end_wqm early Foz-DB GFX1201: Totals from 7 (0.01% of 80251) affected shaders: Instrs: 703 -> 729 (+3.70%) CodeSize: 4032 -> 4136 (+2.58%) Latency: 5840 -> 4715 (-19.26%) InvThroughput: 441 -> 405 (-8.16%) Copies: 61 -> 67 (+9.84%) PreSGPRs: 216 -> 218 (+0.93%) SALU: 93 -> 113 (+21.51%) When reordered after the next commit: Foz-DB GFX1201: Totals from 1609 (2.00% of 80251) affected shaders: MaxWaves: 47984 -> 47986 (+0.00%) Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50% CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41% VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30% SpillSGPRs: 182 -> 184 (+1.10%) Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15% InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06% VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32% SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66% Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16% Branches: 21599 -> 21727 (+0.59%) PreSGPRs: 69963 -> 70732 (+1.10%) VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03% SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	c1b29174b4	aco: use a smaller wqm section for strict_wqm sampling It's only important that the coordinate is created in WQM, the sample itself doesn't care. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	11cee3d634	aco: use new disable_wqm for p_dual_src_export_gfx11 No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	8e53ba9a0a	aco: use new disable_wqm for exp No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	0e66f2b2cc	aco: use new disable_wqm for mimg Foz-DB GFX1201: Totals from 88 (0.11% of 80251) affected shaders: Instrs: 81954 -> 82218 (+0.32%); split: -0.02%, +0.34% CodeSize: 451824 -> 452880 (+0.23%); split: -0.02%, +0.25% Latency: 308818 -> 308746 (-0.02%); split: -0.05%, +0.02% VClause: 1324 -> 1318 (-0.45%) Copies: 2795 -> 2784 (-0.39%) PreSGPRs: 4029 -> 4035 (+0.15%) SALU: 6563 -> 6809 (+3.75%); split: -0.15%, +3.90% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	922f559c3c	aco: use new disable_wqm for flatlike No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	a4c537c5b3	aco: use new disable_wqm for mubuf/mtbuf Foz-DB GFX1201: Totals from 66 (0.08% of 80251) affected shaders: Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65% CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48% Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02% InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02% VClause: 982 -> 962 (-2.04%) Copies: 2882 -> 2808 (-2.57%) PreSGPRs: 2564 -> 2599 (+1.37%) SALU: 4748 -> 5010 (+5.52%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Rhys Perry	44ab4ad732	aco: align scratch size after isel Make it safe for VGPR spilling if it's not a multiple of 4. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>	2025-08-04 15:06:43 +00:00
Rhys Perry	cec845079e	ac/nir/lower_ps: remove barrier for end_invocation_interlock Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details SPIR-V->NIR now inserts this barrier itself. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513>	2025-08-04 09:30:06 +00:00
Alyssa Rosenzweig	cc6e3b84cb	treewide: use nir_def_as_* Via Coccinelle patch: @@ expression definition; @@ -nir_instr_as_alu(definition->parent_instr) +nir_def_as_alu(definition) @@ expression definition; @@ -nir_instr_as_intrinsic(definition->parent_instr) +nir_def_as_intrinsic(definition) @@ expression definition; @@ -nir_instr_as_phi(definition->parent_instr) +nir_def_as_phi(definition) @@ expression definition; @@ -nir_instr_as_load_const(definition->parent_instr) +nir_def_as_load_const(definition) @@ expression definition; @@ -nir_instr_as_deref(definition->parent_instr) +nir_def_as_deref(definition) @@ expression definition; @@ -nir_instr_as_tex(definition->parent_instr) +nir_def_as_tex(definition) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Marek Olšák <maraeo@gmail.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>	2025-08-01 15:34:24 +00:00
Antonio Ospite	ddf2aa3a4d	build: avoid redefining unreachable() which is standard in C23 In the C23 standard unreachable() is now a predefined function-like macro in <stddef.h> See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in And this causes build errors when building for C23: ----------------------------------------------------------------------- In file included from ../src/util/log.h:30, from ../src/util/log.c:30: ../src/util/macros.h:123:9: warning: "unreachable" redefined 123 \| #define unreachable(str) \ \| ^~~~~~~~~~~ In file included from ../src/util/macros.h:31: /usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition 456 \| #define unreachable() (__builtin_unreachable ()) \| ^~~~~~~~~~~ ----------------------------------------------------------------------- So don't redefine it with the same name, but use the name UNREACHABLE() to also signify it's a macro. Using a different name also makes sense because the behavior of the macro was extending the one of __builtin_unreachable() anyway, and it also had a different signature, accepting one argument, compared to the standard unreachable() with no arguments. This change improves the chances of building mesa with the C23 standard, which for instance is the default in recent AOSP versions. All the instances of the macro, including the definition, were updated with the following command line: git grep -l '[^_]unreachable(' -- "src/**" \| sort \| uniq \| \ while read file; \ do \ sed -e 's/$[^_]$unreachable(/\1UNREACHABLE(/g' -i "$file"; \ done && \ sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>	2025-07-31 17:49:42 +00:00
Georg Lehmann	b12db991eb	aco/gfx10: optimize subgroupRotate(x, 32) and subgroupShuffleXor(x, 32) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We don't have v_permlane64_b32 yet, but we can still optimize it using shared vgprs. Using the DPP16 row mask, we can even avoid writing exec. With v0 input/output and v24/v25 as shared vgprs, this results in: v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>	2025-07-29 06:33:20 +00:00
Georg Lehmann	eb4df58a3d	aco/isel: refactor shared vgpr usage Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>	2025-07-29 06:33:20 +00:00
Georg Lehmann	8a2aca8d6f	aco/select_alu: avoid vector get_alu_src for instructions with scalar operands Foz-DB Navi21: Totals from 1 (0.00% of 80237) affected shaders: Instrs: 22 -> 21 (-4.55%) CodeSize: 112 -> 108 (-3.57%) Latency: 392 -> 386 (-1.53%) InvThroughput: 25 -> 24 (-4.00%) Copies: 4 -> 3 (-25.00%) PreVGPRs: 8 -> 4 (-50.00%) VALU: 10 -> 9 (-10.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728>	2025-07-29 06:07:15 +00:00
Georg Lehmann	004f8aa2f4	aco: optimize get_alu_src with constant source and size > 1 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Emulated FSR4, Navi31: Totals from 14 (100.00% of 14) affected shaders: MaxWaves: 130 -> 131 (+0.77%) Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09% CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07% VGPRs: 2544 -> 2520 (-0.94%) SpillVGPRs: 92 -> 89 (-3.26%) Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66% InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02% VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80% Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51% PreSGPRs: 504 -> 519 (+2.98%) PreVGPRs: 2460 -> 2448 (-0.49%) VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05% SALU: 1231 -> 1384 (+12.43%) VMEM: 3807 -> 3801 (-0.16%) VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090>	2025-07-25 11:33:00 +00:00
Alyssa Rosenzweig	8a1a410389	treewide: use SWAP macro Via Coccinelle patch + manual clean up: @@ identifier temporary, a, b; type T; @@ -T temporary = a; -a = b; -b = temporary; +SWAP(a, b); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297>	2025-07-23 19:49:47 +00:00
Georg Lehmann	c80daf934c	aco: supported 64bit or vectorized bitfield_select No Foz-DB changes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>	2025-07-21 20:42:32 +00:00
Georg Lehmann	14b36fb790	aco/isel: don't create literal operands for SALU bitfield_select Let the optimizer handle this. No Foz-DB changes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>	2025-07-21 20:42:32 +00:00
Rhys Perry	256a7cc4f0	aco/isel: optimize uniform vote Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details fossil-db (navi21): Totals from 21 (0.03% of 79825) affected shaders: Instrs: 44939 -> 44913 (-0.06%) CodeSize: 236612 -> 236504 (-0.05%) Latency: 509496 -> 509349 (-0.03%) Copies: 3624 -> 3620 (-0.11%) SALU: 5458 -> 5432 (-0.48%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36177>	2025-07-21 14:19:58 +00:00
Georg Lehmann	d672737372	nir,aco: add byte_perm_amd Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>	2025-07-16 11:46:52 +00:00
Natalie Vock	ac96594b86	aco/isel: Use vector-aligned operands for ds_stack_push8_pop1_rtn_b32 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	ea66a8d1c5	aco,nir: Add support for GFX12 ds_bvh_stack_push8_pop1_rtn_b32 instruction Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:40 +00:00
Natalie Vock	9707b30965	nir,aco: Add ds_bvh_stack_rtn This is a ds instruction that also overwrites its first input, so introduce a new ds format with two outputs. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:39 +00:00
Natalie Vock	c515f1fd58	aco: Use vector-aligned operands for image_bvh8_intersect_ray Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:38 +00:00
Natalie Vock	f17fe05e32	aco/isel: Improve vector splits for image_bvh8_intersect_ray Using split_vector to split everything into scalars allows copy-prop to eliminate the final p_create_vector. Considerably reduces copies and register thrashing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35269>	2025-07-15 21:34:38 +00:00
Marek Olšák	d12bc87dda	aco: implement upcasting 16-bit types for 32-bit color buffers in PS epilog This was missed when implementing the change for LLVM. Fixes: `fbbf029529` - radeonsi: enable 16-bit mediump IO for PS outputs only, and VS->PS with env var Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36112>	2025-07-15 18:28:30 +00:00
Marek Olšák	5ded4f3c7d	aco: remove unused aco_symbol_lds_ngg_gs_out_vertex_base Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>	2025-07-12 10:28:21 +00:00
Georg Lehmann	92d433c54a	aco: vectorize conversions from 8bit to 16bit Massively helps emulated fp8 performance. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854>	2025-07-12 08:39:15 +00:00
Georg Lehmann	7fece5592c	aco: vectorize 16bit extracts Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35854>	2025-07-12 08:39:14 +00:00
Rhys Perry	3b9a1ce4ca	aco: remove RegClass::as_subdword Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:09 +00:00
Rhys Perry	9c55b0ca20	aco: use MUBUF for global access with SGPR address on GFX7/8 This should be better than using FLAT, which only supports a VGPR address. fossil-db (polaris10): Totals from 159 (0.26% of 62070) affected shaders: MaxWaves: 789 -> 803 (+1.77%) Instrs: 234284 -> 230557 (-1.59%); split: -1.71%, +0.12% CodeSize: 1212324 -> 1186716 (-2.11%); split: -2.23%, +0.11% SGPRs: 10504 -> 10712 (+1.98%) VGPRs: 10556 -> 10236 (-3.03%); split: -3.37%, +0.34% SpillSGPRs: 579 -> 577 (-0.35%) Latency: 3903056 -> 3875625 (-0.70%); split: -0.87%, +0.16% InvThroughput: 3139443 -> 3114426 (-0.80%); split: -0.86%, +0.07% VClause: 4205 -> 4433 (+5.42%); split: -0.43%, +5.85% SClause: 4461 -> 4445 (-0.36%); split: -0.43%, +0.07% Copies: 30889 -> 31507 (+2.00%); split: -0.29%, +2.29% PreSGPRs: 7370 -> 7609 (+3.24%) PreVGPRs: 8339 -> 8193 (-1.75%) VALU: 175025 -> 170232 (-2.74%); split: -2.77%, +0.03% SALU: 27269 -> 28532 (+4.63%); split: -0.01%, +4.64% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:08 +00:00
Rhys Perry	6396a82695	aco: return a format in lower_global_address No fossil-db changes (navi10, pitcairn). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:07 +00:00
Rhys Perry	89c2c94147	aco: increase global constant offset limit slightly Before, this wasn't actually the maximum value plus one. fossil-db (navi10): Totals from 4 (0.01% of 63207) affected shaders: (no stat changes) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:07 +00:00
Rhys Perry	d7dcd81c77	aco/gfx6: allow both constant and gpr offset for global with sgpr address fossil-db (pitcairn): Totals from 81 (0.13% of 62069) affected shaders: MaxWaves: 332 -> 335 (+0.90%) Instrs: 150087 -> 149737 (-0.23%); split: -0.30%, +0.06% CodeSize: 754636 -> 752712 (-0.25%); split: -0.31%, +0.06% SGPRs: 6128 -> 6184 (+0.91%) VGPRs: 7220 -> 7208 (-0.17%); split: -0.28%, +0.11% SpillSGPRs: 288 -> 287 (-0.35%) Latency: 2199197 -> 2198338 (-0.04%); split: -0.20%, +0.17% InvThroughput: 1613474 -> 1614303 (+0.05%); split: -0.07%, +0.12% VClause: 2905 -> 2862 (-1.48%); split: -2.34%, +0.86% SClause: 2366 -> 2378 (+0.51%); split: -0.17%, +0.68% Copies: 17312 -> 17264 (-0.28%); split: -1.03%, +0.76% PreSGPRs: 5080 -> 5004 (-1.50%) PreVGPRs: 5656 -> 5640 (-0.28%) VALU: 114097 -> 113831 (-0.23%); split: -0.31%, +0.07% SALU: 16004 -> 15944 (-0.37%); split: -0.41%, +0.04% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:06 +00:00
Rhys Perry	684943bd1f	aco/gfx6: allow vgpr offset for global access with sgpr address No reason why we can't use offen like normal buffer loads. fossil-db (pitcairn): Totals from 122 (0.20% of 62069) affected shaders: MaxWaves: 521 -> 525 (+0.77%) Instrs: 238341 -> 237228 (-0.47%); split: -0.57%, +0.10% CodeSize: 1196260 -> 1188076 (-0.68%); split: -0.78%, +0.09% SGPRs: 8752 -> 8760 (+0.09%); split: -0.64%, +0.73% VGPRs: 10456 -> 10440 (-0.15%); split: -0.88%, +0.73% Latency: 3958385 -> 3946186 (-0.31%); split: -0.38%, +0.07% InvThroughput: 3097193 -> 3084417 (-0.41%); split: -0.42%, +0.01% VClause: 4058 -> 4500 (+10.89%); split: -0.02%, +10.92% SClause: 4511 -> 4500 (-0.24%); split: -0.42%, +0.18% Copies: 31228 -> 31718 (+1.57%); split: -0.38%, +1.95% PreSGPRs: 7211 -> 7461 (+3.47%) PreVGPRs: 8174 -> 8147 (-0.33%); split: -0.34%, +0.01% VALU: 174779 -> 173294 (-0.85%); split: -0.87%, +0.02% SALU: 29138 -> 29641 (+1.73%); split: -0.09%, +1.82% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:05 +00:00
Rhys Perry	09a5af121f	aco: simplify the load callback We can put these parameters in the LoadEmitInfo instead. No fossil-db changes (navi10, pitcairn). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:04 +00:00
Rhys Perry	101d0b791f	aco: add too-large constant offset to the address instead of the offset In case the addition with the offset overflows. No fossil-db changes (navi10, pitcairn). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:03 +00:00
Rhys Perry	bd9a9a77fe	aco: use addition helper in emit_load No fossil-db changes (navi10, pitcairn). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:03 +00:00
Rhys Perry	8defd1bc16	aco/gfx6: disallow global access with sgpr address and two offsets No fossil-db changes (navi10, pitcairn). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35465>	2025-07-11 12:15:03 +00:00
Georg Lehmann	d45f375a9d	aco: only insert fp mode when needed Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35746>	2025-07-10 13:48:50 +00:00
Daniel Schürmann	610a19cf31	aco/isel: allow to select SGPR defs for vectorized bcsel and logical operations Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details No fossil changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784>	2025-07-09 14:10:37 +00:00
Daniel Schürmann	d7477111d2	aco: split vectorized bcsel and bitwise logic VGPR definitions This has a slightly negative effect on parallel-rdp, but positively affects FSR4. Totals from 14 (0.02% of 79839) affected shaders: (Navi48) Instrs: 63543 -> 63646 (+0.16%); split: -0.01%, +0.17% CodeSize: 352888 -> 353608 (+0.20%); split: -0.02%, +0.23% Latency: 1822354 -> 1825036 (+0.15%) InvThroughput: 364683 -> 365738 (+0.29%); split: -0.04%, +0.32% Copies: 9299 -> 9363 (+0.69%); split: -0.11%, +0.80% PreVGPRs: 1381 -> 1394 (+0.94%) VALU: 34511 -> 34575 (+0.19%); split: -0.03%, +0.21% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784>	2025-07-09 14:10:36 +00:00
Daniel Schürmann	764ee3a834	radv: don't lower subdword phis to scalar Totals from 193 (0.24% of 79839) affected shaders: (Navi48) MaxWaves: 6004 -> 6024 (+0.33%) Instrs: 169276 -> 166784 (-1.47%); split: -3.01%, +1.53% CodeSize: 940608 -> 915768 (-2.64%); split: -4.29%, +1.64% VGPRs: 8012 -> 7716 (-3.69%); split: -3.99%, +0.30% SpillVGPRs: 185 -> 0 (-inf%) Scratch: 13568 -> 0 (-inf%) Latency: 2159787 -> 2147084 (-0.59%); split: -2.86%, +2.28% InvThroughput: 664022 -> 395859 (-40.38%); split: -42.59%, +2.21% VClause: 2998 -> 2880 (-3.94%); split: -4.27%, +0.33% SClause: 3117 -> 3120 (+0.10%) Copies: 21290 -> 16278 (-23.54%); split: -24.74%, +1.20% Branches: 4757 -> 4760 (+0.06%); split: -0.34%, +0.40% PreSGPRs: 7369 -> 7378 (+0.12%); split: -0.11%, +0.23% PreVGPRs: 4257 -> 3859 (-9.35%); split: -9.94%, +0.59% VALU: 83173 -> 79804 (-4.05%); split: -5.68%, +1.63% SALU: 36672 -> 37318 (+1.76%); split: -0.02%, +1.78% VMEM: 4012 -> 3762 (-6.23%); split: -6.83%, +0.60% SMEM: 4300 -> 4303 (+0.07%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784>	2025-07-09 14:10:36 +00:00
Daniel Schürmann	fc2fcac04e	aco: allow vectorized nir_op_mov nir_lower_phis_to_scalar() can create these with the next commit. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784>	2025-07-09 14:10:36 +00:00
Daniel Schürmann	3f35b1329e	aco: allow subdword vector-definitions on some VALU instructions Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784>	2025-07-09 14:10:36 +00:00
Daniel Schürmann	025306a95d	aco/isel: refactor emission of bitwise logical operations Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35784>	2025-07-09 14:10:36 +00:00
Georg Lehmann	82af226690	aco: remove unused swap_srcs from emit_vop3p_instruction Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35825>	2025-07-09 07:23:09 +00:00
Georg Lehmann	96793fb0c1	aco/isel: implement 16bit vec2 shifts The source bit size mismatch is a bit annoying, but it's still worth it to vectorize these. Foz-DB Navi48: Totals from 85 (0.11% of 80251) affected shaders: Instrs: 119073 -> 118827 (-0.21%); split: -0.21%, +0.00% CodeSize: 669604 -> 667552 (-0.31%); split: -0.31%, +0.00% VGPRs: 4796 -> 4736 (-1.25%) Latency: 1907685 -> 1901983 (-0.30%); split: -0.32%, +0.02% InvThroughput: 642603 -> 640680 (-0.30%); split: -0.33%, +0.03% VClause: 2088 -> 2091 (+0.14%) Copies: 18300 -> 18394 (+0.51%); split: -0.01%, +0.52% Branches: 3452 -> 3440 (-0.35%) VALU: 63378 -> 63144 (-0.37%); split: -0.37%, +0.00% SALU: 23065 -> 23076 (+0.05%); split: -0.00%, +0.05% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35825>	2025-07-09 07:23:08 +00:00
Daniel Schürmann	2c51a8870d	nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar() Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Similar to nir_lower_alu_width(), the callback can return the desired number of components for a phi, or 0 for no lowering. The previous behavior of nir_lower_phis_to_scalar() with lower_all=true can be elicited via nir_lower_all_phis_to_scalar() while the previous behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar() with NULL callback. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>	2025-07-08 15:33:59 +00:00

1 2

81 commits