fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-21 13:18:09 +02:00

Author	SHA1	Message	Date
Daniel Schürmann	eecd1c020d	amd: keep ac_shader_config::lds_size unaligned Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>	2025-10-15 11:20:09 +00:00
Daniel Schürmann	fe6ff6d1ef	aco: remove DeviceInfo::lds_encoding_granule and DeviceInfo::lds_alloc_granule Use utility functions instead. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>	2025-10-15 11:20:08 +00:00
Daniel Schürmann	11db02d5d9	radv: calculate LDS allocation requirements independently from the compiler Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>	2025-10-15 11:20:07 +00:00
Daniel Schürmann	b651234414	amd: change ac_shader_config::lds_size to bytes We still keep it aligned to allocation granularity. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37577>	2025-10-15 11:20:07 +00:00
Daniel Schürmann	d0b87a0d5f	ac/nir_flag_smem_for_loads: call divergence analysis internally Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Also don't flag more SMEM instructions (in ACO) after the last call to ac_nir_lower_mem_access_bit_sizes(). Totals from 75 (0.09% of 79839) affected shaders: (Navi48) Instrs: 191246 -> 189960 (-0.67%) CodeSize: 996840 -> 985976 (-1.09%) Latency: 3066184 -> 2945500 (-3.94%) InvThroughput: 355373 -> 353106 (-0.64%); split: -0.66%, +0.02% SClause: 4848 -> 4699 (-3.07%) Copies: 13827 -> 13925 (+0.71%); split: -0.07%, +0.78% Branches: 5176 -> 5003 (-3.34%) PreSGPRs: 6222 -> 6272 (+0.80%) VALU: 108934 -> 108993 (+0.05%); split: -0.00%, +0.06% SALU: 31679 -> 31210 (-1.48%); split: -1.51%, +0.03% SMEM: 7158 -> 6739 (-5.85%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>	2025-10-14 16:33:12 +00:00
Daniel Schürmann	8ff44f17ef	amd/lower_mem_access_bit_sizes: also use SMEM for subdword loads We can simply extract from the loaded dwords as per nir_lower_mem_access_bit_sizes() lowering. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>	2025-10-14 16:33:11 +00:00
Samuel Pitoiset	bc32286e5b	radv: declare a new user SGPR for dynamic descriptors To move them out of push constants. fossils-db (GFX1201): Totals from 20700 (25.99% of 79646) affected shaders: Instrs: 14375624 -> 14370051 (-0.04%); split: -0.07%, +0.03% CodeSize: 76746128 -> 76723772 (-0.03%); split: -0.05%, +0.02% Latency: 74103586 -> 74113651 (+0.01%); split: -0.01%, +0.02% InvThroughput: 11908817 -> 11908798 (-0.00%); split: -0.00%, +0.00% VClause: 249605 -> 249607 (+0.00%); split: -0.00%, +0.00% SClause: 337914 -> 337772 (-0.04%); split: -0.08%, +0.04% Copies: 843585 -> 839233 (-0.52%); split: -0.62%, +0.10% PreSGPRs: 836283 -> 837260 (+0.12%) SALU: 1790713 -> 1786374 (-0.24%); split: -0.29%, +0.05% Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768>	2025-10-14 15:34:43 +00:00
Georg Lehmann	58163f65f0	aco/optimizer: rework packed fneg opt Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35272>	2025-10-14 08:33:40 +00:00
Georg Lehmann	6eac72088c	aco/gfx10+: only work around split execution of uniform LDS in WGP mode Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details LDS instructions from one CU won't split the execution of other LDS instruction on the same CU. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31630>	2025-10-13 10:22:22 +00:00
Georg Lehmann	c13caa5e5f	aco: fix global_atomic_swap offset overflow check Fixes: `d7dcd81c77` ("aco/gfx6: allow both constant and gpr offset for global with sgpr address") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>	2025-10-13 09:41:41 +00:00
Marek Olšák	3fe651f607	nir: remove load_smem_amd replaced by load_global_amd + ACCESS_SMEM_AMD Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:54:11 +00:00
Rhys Perry	20af16b4d8	aco: use MTBUF for 64-bit atomic load/store Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details A 64-bit atomic load/store should be considered entirely out-of-bounds if any part of it is out-of-bounds. Since we implemented these as 32-bit vec2 load/store, it would have been possible for the first half to be in-bounds while the second half is out-of-bounds. From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification: > Any non-atomic access to a uniform, storage, uniform texel, or storage > texel buffer wider than 32-bits may be treated as multiple 32-bit > accesses that are separately bounds checked. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:31 +00:00
Rhys Perry	f905acfada	aco: remove barrier acquire/release workaround This existed since `ccfe9813fb` because NIR had no atomic loads/stores. This is no longer the case. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:31 +00:00
Rhys Perry	271b135b03	aco: set atomic semantic for atomic load/store Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:30 +00:00
Rhys Perry	74b807cf58	aco: only workaround load tearing for atomic loads For non-atomic loads, this situation would require a data race. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:30 +00:00
Georg Lehmann	d514696a0c	aco/isel: support nir_op_atomic_isub Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>	2025-10-07 14:07:56 +00:00
Georg Lehmann	cf30742a66	radv,aco: don't end monolithic ray tracing with unconditional terminate The terminate requires more code and blocks us from deallocating VGPRs early. Foz-DB Navi31: Totals from 63 (0.08% of 80273) affected shaders: Instrs: 3372702 -> 3372467 (-0.01%) CodeSize: 17441676 -> 17440736 (-0.01%) Latency: 19763447 -> 19763288 (-0.00%) InvThroughput: 3860502 -> 3860478 (-0.00%) Branches: 96204 -> 96141 (-0.07%) SALU: 406648 -> 406549 (-0.02%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37542>	2025-09-25 15:35:55 +00:00
Daniel Schürmann	d041640b88	aco: remove excess offset handling for load/store_shared Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37453>	2025-09-24 14:28:25 +00:00
Rhys Perry	d6ed68212c	aco: fix SGPR 8-bit nir_op_vec with mixed constant and non-constant Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details For example, vec2(non_const, const) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `04e3d7ad93` ("aco: improve nir_op_vec with constant operands") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13911 Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37405>	2025-09-18 12:37:19 +00:00
Rhys Perry	8931672eef	aco: workaround load tearing for load_shared2_amd This probably has the same issue as load_shared. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `04956d54ce` ("aco: force uniform result for LDS load with uniform address if it can be non uniform") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37417>	2025-09-17 11:29:21 +00:00
Rhys Perry	81df517553	aco: avoid unaligned offsets when selecting load_global_amd SMEM instructions mask off the low bits for the base and offset sources both before and after they're added. However, NIR expects ACO to only care about the alignment of the final address. fossil-db (gfx1201): Totals from 21 (0.03% of 79839) affected shaders: Instrs: 229780 -> 229876 (+0.04%) CodeSize: 1267724 -> 1268080 (+0.03%) Latency: 2800924 -> 2800978 (+0.00%) InvThroughput: 520250 -> 520256 (+0.00%) Copies: 27878 -> 27876 (-0.01%); split: -0.01%, +0.00% SALU: 29591 -> 29643 (+0.18%) fossil-db (polaris10): Totals from 3 (0.00% of 62201) affected shaders: Latency: 2651 -> 2652 (+0.04%) InvThroughput: 662 -> 663 (+0.15%) PreSGPRs: 51 -> 54 (+5.88%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301>	2025-09-17 09:15:46 +00:00
Rhys Perry	6d71521ecd	aco: avoid wraparound for smem global loads with both offsets fossil-db (gfx1201): Totals from 296 (0.37% of 79839) affected shaders: Instrs: 382593 -> 380149 (-0.64%) CodeSize: 1981452 -> 1970988 (-0.53%); split: -0.53%, +0.00% Latency: 1575286 -> 1574252 (-0.07%) InvThroughput: 215839 -> 215818 (-0.01%) SClause: 8679 -> 8677 (-0.02%); split: -0.03%, +0.01% Copies: 19642 -> 19641 (-0.01%); split: -0.03%, +0.02% PreSGPRs: 14521 -> 14515 (-0.04%) SALU: 57097 -> 55718 (-2.42%) fossil-db (polaris10): Totals from 30 (0.05% of 62201) affected shaders: Instrs: 23341 -> 23379 (+0.16%); split: -0.01%, +0.18% CodeSize: 121316 -> 121516 (+0.16%); split: -0.01%, +0.17% SGPRs: 2368 -> 2384 (+0.68%) Latency: 235153 -> 235374 (+0.09%); split: -0.01%, +0.11% InvThroughput: 92582 -> 92566 (-0.02%) SClause: 616 -> 619 (+0.49%) Copies: 2717 -> 2720 (+0.11%) PreSGPRs: 1204 -> 1213 (+0.75%) SALU: 3654 -> 3692 (+1.04%); split: -0.08%, +1.12% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Backport-to: 25.2 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37301>	2025-09-17 09:15:46 +00:00
Georg Lehmann	714a149396	nir: remove unsigned upper bound config All config information is now either in nir->info or nir->options. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>	2025-09-16 09:24:04 +00:00
Georg Lehmann	bb67dae12d	nir/uub: remove max_workgroup_size from config For most hardware, this is the same as max invocations in the workgroup. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>	2025-09-16 09:24:04 +00:00
Georg Lehmann	f3c08c9d27	nir/uub: use shader_info subgroup size Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>	2025-09-16 09:24:04 +00:00
Georg Lehmann	d029686e20	aco/isel: fix output args init stack buffer overflow BITSET range functions include the end of the range. Fixes: `eb249bb18e` ("aco: Only fix used variables to registers") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>	2025-09-16 09:24:03 +00:00
Natalie Vock	3667a7b687	aco: Add call info Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34531>	2025-09-15 17:16:20 +00:00
Samuel Pitoiset	decf9af472	radv/rt: only use one user SGPR for the traversal shader addr All shaders are allocated in the 32-bit addr space. To avoid an issue with alignment, and also for future work, there is an unused user SGPR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37133>	2025-09-03 05:53:41 +00:00
Marek Olšák	4c87d002e3	aco,radeonsi: expand 32-bit shader arg pointers to 64 bits for ACO Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>	2025-08-30 15:04:32 -04:00
Marek Olšák	7d5288b5b7	aco: check that global addresses are 64bit, apply_nuw_to_ssa to global_amd/smem Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>	2025-08-30 15:04:32 -04:00
Georg Lehmann	38e32e39a9	aco: never end wqm early for vmem The remaining cases where disable_wqm isn't set are either uniform loads or loads that influence control flow. In the first case, not ending WQM early is free, and in the second case it's likely still better to not block scheduling. Foz-DB GFX1201: Totals from 483 (0.60% of 80287) affected shaders: MaxWaves: 12654 -> 12642 (-0.09%) Instrs: 485234 -> 484830 (-0.08%); split: -0.19%, +0.11% CodeSize: 2630876 -> 2629184 (-0.06%); split: -0.15%, +0.08% VGPRs: 29980 -> 30004 (+0.08%) Latency: 4908015 -> 4813167 (-1.93%); split: -1.95%, +0.02% InvThroughput: 751059 -> 748582 (-0.33%); split: -0.35%, +0.02% VClause: 8723 -> 8705 (-0.21%); split: -0.30%, +0.09% SClause: 11085 -> 10986 (-0.89%); split: -1.45%, +0.56% Copies: 25155 -> 25183 (+0.11%); split: -0.26%, +0.37% Branches: 6203 -> 6204 (+0.02%) PreSGPRs: 23763 -> 23780 (+0.07%) VALU: 296576 -> 296593 (+0.01%); split: -0.01%, +0.02% SALU: 49095 -> 49416 (+0.65%); split: -0.04%, +0.69% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>	2025-08-28 06:29:04 +00:00
Georg Lehmann	3d190f2e9c	aco: implement skip_helpers for load_global_amd Foz-DB GFX1201: Totals from 119 (0.15% of 80287) affected shaders: Instrs: 212449 -> 213452 (+0.47%) CodeSize: 1120656 -> 1124708 (+0.36%) Latency: 2854370 -> 2855772 (+0.05%); split: -0.02%, +0.07% InvThroughput: 586142 -> 586210 (+0.01%); split: -0.00%, +0.01% VClause: 3556 -> 3656 (+2.81%) SClause: 2708 -> 2710 (+0.07%) Copies: 14410 -> 14509 (+0.69%) PreSGPRs: 6810 -> 6850 (+0.59%); split: -0.12%, +0.70% VALU: 135945 -> 135942 (-0.00%); split: -0.01%, +0.01% SALU: 22147 -> 23121 (+4.40%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>	2025-08-28 06:29:04 +00:00
Georg Lehmann	ee7069f875	aco: implement skip_helpers for load_scratch Foz-DB GFX1201: Totals from 2 (0.00% of 80287) affected shaders: Instrs: 4016 -> 4054 (+0.95%) CodeSize: 22104 -> 22256 (+0.69%) Latency: 17123 -> 17129 (+0.04%) Copies: 406 -> 415 (+2.22%) SALU: 323 -> 353 (+9.29%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>	2025-08-28 06:29:04 +00:00
Georg Lehmann	2bfd8918a5	aco: implement skip_helpers for load_ssbo/ubo/constant Foz-DB GFX1201: Totals from 6676 (8.32% of 80287) affected shaders: Instrs: 8786161 -> 8829091 (+0.49%); split: -0.01%, +0.50% CodeSize: 47141800 -> 47320480 (+0.38%); split: -0.01%, +0.39% VGPRs: 376624 -> 376600 (-0.01%) SpillSGPRs: 1251 -> 1250 (-0.08%) Latency: 99716626 -> 99642361 (-0.07%); split: -0.11%, +0.04% InvThroughput: 14893179 -> 14898323 (+0.03%); split: -0.01%, +0.04% VClause: 149425 -> 153539 (+2.75%); split: -0.04%, +2.79% SClause: 251247 -> 251842 (+0.24%); split: -0.06%, +0.30% Copies: 580304 -> 586424 (+1.05%); split: -0.21%, +1.26% Branches: 163014 -> 163013 (-0.00%); split: -0.00%, +0.00% PreSGPRs: 356548 -> 357109 (+0.16%); split: -0.18%, +0.33% VALU: 5149733 -> 5149797 (+0.00%); split: -0.00%, +0.00% SALU: 1082176 -> 1122718 (+3.75%); split: -0.06%, +3.80% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>	2025-08-28 06:29:03 +00:00
Georg Lehmann	bdae511b18	aco: implement skip_helpers for image loads Foz-DB GFX1201: Totals from 5 (0.01% of 80287) affected shaders: Instrs: 1406 -> 1417 (+0.78%) CodeSize: 8012 -> 8056 (+0.55%) Latency: 7279 -> 7282 (+0.04%) Copies: 84 -> 85 (+1.19%) SALU: 170 -> 180 (+5.88%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>	2025-08-28 06:29:02 +00:00
Georg Lehmann	bf453a7c6a	aco/isel: add init_disable_wqm helper Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36785>	2025-08-28 06:29:01 +00:00
Konstantin Seurer	9df7b48d2f	nir: Use nir_def_as_* in more places Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746>	2025-08-24 14:03:09 +00:00
Marek Olšák	3aadae22ad	nir: make nir_block::predecessors & dom_frontier sets non-malloc'd We can just place the set structures inside nir_block. This reduces the number of ralloc calls by 6.7% when compiling Heaven shaders with radeonsi+ACO using a release build (i.e. not including nir_validate set allocations, which are also removed). Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36728>	2025-08-21 06:13:48 +00:00
Georg Lehmann	639b91bb48	aco/isel: fix vectorized i2i16 with 8bit vec8 source The extract index is in dwords, not bytes. Fixes: `92d433c54a` ("aco: vectorize conversions from 8bit to 16bit") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36869>	2025-08-20 10:13:22 +00:00
Daniel Schürmann	7e63251d1f	aco/isel: refactor store_shared() by directly matching NIR intrinsics to ACO opcodes Totals from 1435 (1.80% of 79839) affected shaders: (Navi48) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:15 +00:00
Daniel Schürmann	1fde289539	aco/isel: refactor load_shared() by directly matching NIR intrinsics to ACO opcodes Totals from 3 (0.00% of 79839) affected shaders: (Navi48) Instrs: 700 -> 698 (-0.29%) CodeSize: 3860 -> 3852 (-0.21%) Latency: 2351 -> 2349 (-0.09%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:15 +00:00
Daniel Schürmann	4632ee4c37	aco/isel: rename emit_readfirstlane() -> emit_vector_as_uniform() Also allow to use p_as_uniform and improve vector splitting. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:14 +00:00
Daniel Schürmann	26595577b3	aco/isel: allow for large 8-bit vectors in extract_8_16_bit_sgpr_element() Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:14 +00:00
Georg Lehmann	9ed94371f7	amd: stop using custom gl_access_qualifier for access type Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>	2025-08-15 08:26:10 +00:00
Georg Lehmann	f17cb6b714	amd: replace ACCESS_TYPE_SMEM with ACCESS_SMEM_AMD Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>	2025-08-15 08:26:10 +00:00
Georg Lehmann	fc53cf146c	aco: disable wqm for sampled buffer loads when not needed Foz-DB GFX1201: Totals from 318 (0.40% of 80287) affected shaders: Instrs: 313039 -> 314064 (+0.33%); split: -0.00%, +0.33% CodeSize: 1684104 -> 1688212 (+0.24%); split: -0.00%, +0.24% VGPRs: 15120 -> 15144 (+0.16%) Latency: 2515023 -> 2518610 (+0.14%); split: -0.06%, +0.20% InvThroughput: 447468 -> 447615 (+0.03%); split: -0.02%, +0.05% VClause: 4866 -> 4914 (+0.99%) SClause: 6564 -> 6559 (-0.08%); split: -0.09%, +0.02% Copies: 23577 -> 23673 (+0.41%); split: -0.04%, +0.45% PreSGPRs: 16019 -> 16029 (+0.06%) VALU: 172157 -> 172143 (-0.01%) SALU: 52816 -> 53867 (+1.99%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:47 +00:00
Georg Lehmann	883b1ca364	aco: disable wqm for tex loads when not needed By only executing VMEM loads for lanes where the result is used, we can save bandwidth. The NIR pass only handles tex for now, but those are most common anyway. We can extend it handle image/ssbo/ubo/global loads in the future. Foz-DB GFX1201: Totals from 32633 (40.66% of 80251) affected shaders: Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46% CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81% VGPRs: 1481868 -> 1481712 (-0.01%) SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45% Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30% InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12% VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13% SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51% Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96% Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04% PreSGPRs: 1330303 -> 1416540 (+6.48%) PreVGPRs: 964864 -> 964863 (-0.00%) VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01% SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	7159fd21f8	aco: don't restrict vmem load scheduling by inserting p_end_wqm early Foz-DB GFX1201: Totals from 7 (0.01% of 80251) affected shaders: Instrs: 703 -> 729 (+3.70%) CodeSize: 4032 -> 4136 (+2.58%) Latency: 5840 -> 4715 (-19.26%) InvThroughput: 441 -> 405 (-8.16%) Copies: 61 -> 67 (+9.84%) PreSGPRs: 216 -> 218 (+0.93%) SALU: 93 -> 113 (+21.51%) When reordered after the next commit: Foz-DB GFX1201: Totals from 1609 (2.00% of 80251) affected shaders: MaxWaves: 47984 -> 47986 (+0.00%) Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50% CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41% VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30% SpillSGPRs: 182 -> 184 (+1.10%) Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15% InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06% VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32% SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66% Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16% Branches: 21599 -> 21727 (+0.59%) PreSGPRs: 69963 -> 70732 (+1.10%) VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03% SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	c1b29174b4	aco: use a smaller wqm section for strict_wqm sampling It's only important that the coordinate is created in WQM, the sample itself doesn't care. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	11cee3d634	aco: use new disable_wqm for p_dual_src_export_gfx11 No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00

1 2 3

127 commits