fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-17 11:48:05 +02:00

Author	SHA1	Message	Date
Rhys Perry	7a09e4a740	aco: use correct addition opcodes in gfx6-8 RT prolog Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `60dd9d797e` ("aco: Swizzle ray launch IDs in the RT prolog") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39232>	2026-01-14 11:23:42 +00:00
Rhys Perry	da728d5a1a	aco: micro-optimize ray launch ID swizzling Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39232>	2026-01-14 11:23:42 +00:00
Natalie Vock	473cf6046a	aco/spill_preserved: Preserve linear VGPRs even if they aren't p_spill operands Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157>	2026-01-12 21:46:50 +00:00
Natalie Vock	1ef2691221	aco/spill: Fix preserved reload operand update p_logical_end is actually after p_reload_preserved, so this didn't do anything. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157>	2026-01-12 21:46:50 +00:00
Natalie Vock	548062f10e	aco/insert_waitcnt: Don't determine linearity by reg number VGPRs can be linear too, and RT function calls will add VMEM instructions acting on linear VGPRs. Using the linear VGPR in a block with only linear preds will cause the pass to incorrectly skip inserting s_waitcnt. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157>	2026-01-12 21:46:50 +00:00
Natalie Vock	7c12603933	aco/lower_to_hw_instr: Preserve linearity of lowered linear VGPRs So subsequent passes like waitcnt insertion can know these are linear. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157>	2026-01-12 21:46:50 +00:00
Natalie Vock	0d93e8ce54	aco: Don't insert p_reload_preserved in loops This can't work. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157>	2026-01-12 21:46:50 +00:00
Natalie Vock	c816f699b2	aco/spill_preserved: Only reload linear VGPRs at end Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157>	2026-01-12 21:46:50 +00:00
Natalie Vock	897c95c37e	aco: Include arbitrarily fixed registers in max_reg_demand Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39157>	2026-01-12 21:46:50 +00:00
Georg Lehmann	daf235c607	aco/tests: don't destroy vk_device if it was never created Happens if you only run one test that doesn't need a vk_device. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39268>	2026-01-12 16:16:54 +00:00
Georg Lehmann	fad95030a7	aco/tests: test VALUMaskWriteHazard with v_cmpx Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39252>	2026-01-12 15:48:39 +00:00
Georg Lehmann	1d85552745	aco/tests: test VALUReadSGPRHazard with v_cmpx To avoid regressing this in a future rework. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39252>	2026-01-12 15:48:39 +00:00
Georg Lehmann	3e10ab34e1	aco/insert_NOPs: explicitly wait for sa_sdst to resolve SALU -> VALU hazards The assumption that these waits are not required has been proven incorrect in at least some cases. Totals from 190 (0.24% of 79825) affected shaders: (Navi31) Instrs: 499718 -> 500491 (+0.15%) CodeSize: 2658228 -> 2661916 (+0.14%) Latency: 5964632 -> 5965453 (+0.01%); split: -0.00%, +0.01% InvThroughput: 794221 -> 794289 (+0.01%) Totals from 17093 (21.41% of 79839) affected shaders: (Navi48) Instrs: 22805214 -> 22854313 (+0.22%) CodeSize: 121240428 -> 121432904 (+0.16%); split: -0.00%, +0.16% Latency: 166500300 -> 166530529 (+0.02%); split: -0.00%, +0.02% InvThroughput: 28770053 -> 28772870 (+0.01%); split: -0.00%, +0.01% Fixes: `018f45f981` ("aco/insert_NOPs: remove redundant VALUReadSGPRHazard waits") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14516 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39252>	2026-01-12 15:48:38 +00:00
Konstantin Seurer	39d58a55a7	aco: Add support to f2f16 with rtpi/rtni Those rounding modes are needed when computing 16-bit bounding boxes since the bounding box must not get smaller. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37883>	2026-01-10 11:34:12 +01:00
Natalie Vock	60dd9d797e	aco: Swizzle ray launch IDs in the RT prolog This converts from 1D workgroups to 2D ray launch IDs entirely via shader ALU, including handling partial/cut-off workgroups optimally. Doing this entirely in-shader means it Just Works(TM) with indirect dispatches as well. Previous approaches manipulating various things on CPU depending on the dispatch size couldn't handle indirect dispatches. The swizzle implemented here also swizzles with a recursive Z-order pattern, which should be a little more optimal than arranging invocations linearly within the wave. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142>	2026-01-08 19:49:55 +01:00
Natalie Vock	1f6ac3fa93	radv/rt,aco: Always dispatch 1D workgroups for RT We will swizzle the workgroups ourselves in the next commit. Removes the need for 1D dispatch workarounds. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39142>	2026-01-08 19:49:54 +01:00
Georg Lehmann	eb4737a1dd	nir: add nir_alu_instr_is_exact helper Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>	2026-01-07 09:40:57 +00:00
Daniel Schürmann	2d0d5fc104	aco/validate: validate constant bus limit after register allocation based on PhysReg Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>	2026-01-05 14:54:00 +00:00
Daniel Schürmann	eb16f701a6	aco/tests: Add new test to pack 2x16 SGPRs into VGPR Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>	2026-01-05 14:54:00 +00:00
Daniel Schürmann	61c1ec541d	aco/tests: Add test for subdword extraction from SGPR Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>	2026-01-05 14:54:00 +00:00
Daniel Schürmann	0674c9d30e	aco/validate: Validate correct RegisterClasses after lowering to HW instructions Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>	2026-01-05 14:53:59 +00:00
Daniel Schürmann	b087bf2fbf	aco/lower_to_hw: Fix SGPR Operand RegClasses for pack_2x16 Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>	2026-01-05 14:53:59 +00:00
Daniel Schürmann	9f5996ae8a	aco/lower_to_hw: Don't use 2 SGPR operands before GFX10 in a single VOP3 instruction in do_pack_2x16() Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>	2026-01-05 14:53:58 +00:00
Daniel Schürmann	d8481fd7cc	aco/lower_to_hw: Fix SGPR Operand RegClasses of subdword copies Extracting from an SGPR could cause a wrong RegClass on the operand which could later lead to selecting VOPD instructions which falsely operate on the corresponding VGPR. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39107>	2026-01-05 14:53:58 +00:00
Georg Lehmann	0c42141299	aco: allow opsel for last v_alignbyte/bit operand For completeness' sake. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13285 Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39061>	2025-12-31 08:58:24 +00:00
Daniel Schürmann	7b1f6fa6fc	aco: remove radeon_family from aco::Program Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:48 +00:00
Daniel Schürmann	1e8d367537	amd: add and use ac_cu_info::has_vtx_format_alpha_adjust_bug Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:48 +00:00
Daniel Schürmann	febc29907c	amd: add and use ac_cu_info::has_gfx6_mrt_export_bug Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:47 +00:00
Daniel Schürmann	7b7bdb76ab	amd: add ac_cu_info::has_point_sample_accel flag and use in ACO Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:47 +00:00
Daniel Schürmann	cfb745592d	amd: add ac_cu_info::has_mad32 flag and use in ACO Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:47 +00:00
Daniel Schürmann	1e3db50170	aco: use additional flags from ac_cu_info Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:46 +00:00
Daniel Schürmann	f791e46c47	aco: add ac_cu_info to aco_compiler_options Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:46 +00:00
Daniel Schürmann	addd4ea59f	aco: pass aco_compiler_options to init_program() Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:46 +00:00
Daniel Schürmann	bf9bec07c2	aco/tests: don't pass CHIP_UNKNOWN to ACO Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:46 +00:00
Daniel Schürmann	0db1ae1f01	aco: disable XNACK on all GPUs Affects code generation on GFX8 and GFX9 APUs where we misunderstood the feature. XNACK replay is not being used with graphics APIs. Totals from 41759 (65.90% of 63370) affected shaders: (Raven) MaxWaves: 298672 -> 299000 (+0.11%) Instrs: 19200726 -> 19138227 (-0.33%); split: -0.33%, +0.00% CodeSize: 98501904 -> 98253196 (-0.25%); split: -0.26%, +0.00% SGPRs: 3058544 -> 2831492 (-7.42%) VGPRs: 1644896 -> 1643660 (-0.08%) Latency: 193383803 -> 193224047 (-0.08%); split: -0.08%, +0.00% InvThroughput: 92741082 -> 92698975 (-0.05%); split: -0.05%, +0.00% SClause: 678580 -> 630107 (-7.14%); split: -7.15%, +0.00% Copies: 1863375 -> 1863406 (+0.00%); split: -0.04%, +0.04% VALU: 13791245 -> 13791267 (+0.00%); split: -0.00%, +0.00% SALU: 2066726 -> 2066741 (+0.00%); split: -0.04%, +0.04% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38701>	2025-12-22 07:34:43 +00:00
Georg Lehmann	0478021fdc	aco/optimizer: reassociate rcp(mul(a, const)) into rcp_omod(a) Foz-DB Navi48: Totals from 2484 (2.54% of 97637) affected shaders: Instrs: 10368279 -> 10361892 (-0.06%); split: -0.06%, +0.00% CodeSize: 55161104 -> 55150752 (-0.02%); split: -0.02%, +0.00% SpillSGPRs: 14665 -> 14666 (+0.01%) Latency: 87694014 -> 87689324 (-0.01%); split: -0.01%, +0.00% InvThroughput: 16595764 -> 16594448 (-0.01%); split: -0.01%, +0.00% VClause: 209922 -> 209918 (-0.00%); split: -0.01%, +0.00% SClause: 205195 -> 205251 (+0.03%); split: -0.01%, +0.04% Copies: 843771 -> 843765 (-0.00%); split: -0.01%, +0.01% Branches: 275985 -> 275962 (-0.01%); split: -0.01%, +0.00% PreVGPRs: 170608 -> 170494 (-0.07%) VALU: 5840893 -> 5838038 (-0.05%); split: -0.05%, +0.00% SALU: 1481388 -> 1479037 (-0.16%); split: -0.16%, +0.00% VOPD: 7496 -> 7485 (-0.15%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38730>	2025-12-17 08:41:32 +00:00
Georg Lehmann	a8f5ced670	aco/optimizer: reassociate mul(mul(a, const), b) into mul_omod(a, b) Foz-DB Navi48: Totals from 14608 (14.96% of 97637) affected shaders: MaxWaves: 364201 -> 364421 (+0.06%) Instrs: 28051720 -> 28022503 (-0.10%); split: -0.13%, +0.03% CodeSize: 148938740 -> 148943480 (+0.00%); split: -0.04%, +0.04% VGPRs: 994520 -> 994004 (-0.05%); split: -0.05%, +0.00% SpillSGPRs: 45182 -> 45179 (-0.01%) Latency: 187734461 -> 187725301 (-0.00%); split: -0.07%, +0.06% InvThroughput: 33967002 -> 33949881 (-0.05%); split: -0.11%, +0.06% VClause: 495237 -> 495207 (-0.01%); split: -0.03%, +0.02% Copies: 2048324 -> 2047937 (-0.02%); split: -0.12%, +0.10% Branches: 598445 -> 598431 (-0.00%); split: -0.01%, +0.01% PreSGPRs: 877715 -> 877684 (-0.00%) PreVGPRs: 778146 -> 776383 (-0.23%); split: -0.23%, +0.00% VALU: 16413380 -> 16391508 (-0.13%); split: -0.15%, +0.01% SALU: 3685279 -> 3677655 (-0.21%); split: -0.23%, +0.02% VOPD: 26219 -> 25926 (-1.12%); split: +0.43%, -1.55% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38730>	2025-12-17 08:41:31 +00:00
Alyssa Rosenzweig	079e9ae606	treewide: use BITSET_*_COUNT Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff as-if hand written: Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>	2025-12-16 17:42:10 +00:00
Timur Kristóf	f001515c87	aco: Use only VGPR offset on buffer atomics on GFX6-7 SGPR offset is not included in the bounds check according to the ISA documentation of GFX6-7 and indeed it can trigger VM faults on OOB access. Note that ACO already doesn't use the SGPR offset on GFX6-7 for buffer loads and stores. This commit just does the same for buffer atomics. This commit mitigates a ton of VM faults that are exposed by: `24e75fea4b` Fossil DB stats on Hawaii (GFX7): Totals from 148 (0.24% of 61818) affected shaders: Instrs: 324004 -> 327352 (+1.03%) CodeSize: 1556468 -> 1514100 (-2.72%); split: -2.74%, +0.02% Latency: 1271480 -> 1276894 (+0.43%) InvThroughput: 396850 -> 397740 (+0.22%) VClause: 6861 -> 6858 (-0.04%) Copies: 34083 -> 37430 (+9.82%) PreVGPRs: 5705 -> 5706 (+0.02%) VALU: 147529 -> 150898 (+2.28%) SALU: 98194 -> 98172 (-0.02%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958>	2025-12-15 21:03:19 +00:00
Georg Lehmann	a2b70ce4ec	aco/isel: remove uniform reduce/scan optimization This is now done in NIR, with the exception of exclusive min/max/and/or scans. But those are not really useful, and if we ever come across them we can optimize them in NIR using write_invocation_amd. No Foz-DB changes on Navi21. Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38902>	2025-12-15 12:22:32 +00:00
Georg Lehmann	17e597093d	radv: eliminate unused FS output channels For formats that don't have all color channels, there is no reason to output all of them. Games often write to R only or RGB formats with non trivial remaining channels. Foz-DB Navi21: Totals from 10270 (10.55% of 97347) affected shaders: MaxWaves: 249166 -> 250950 (+0.72%); split: +0.73%, -0.01% Instrs: 8442016 -> 8354715 (-1.03%); split: -1.05%, +0.01% CodeSize: 45939644 -> 45487156 (-0.98%); split: -1.01%, +0.02% VGPRs: 472584 -> 463784 (-1.86%); split: -1.98%, +0.12% SpillSGPRs: 1502 -> 1448 (-3.60%) LDS: 6024192 -> 6011904 (-0.20%) Inputs: 42463 -> 41773 (-1.62%) Outputs: 24601 -> 23955 (-2.63%) Latency: 78011745 -> 77653907 (-0.46%); split: -0.56%, +0.10% InvThroughput: 19767826 -> 19274046 (-2.50%); split: -2.53%, +0.03% VClause: 177891 -> 176681 (-0.68%); split: -0.80%, +0.12% SClause: 236784 -> 235324 (-0.62%); split: -0.72%, +0.10% Copies: 621048 -> 616096 (-0.80%); split: -1.03%, +0.23% Branches: 202608 -> 201811 (-0.39%); split: -0.44%, +0.05% PreSGPRs: 441032 -> 437698 (-0.76%); split: -0.77%, +0.01% PreVGPRs: 378067 -> 369564 (-2.25%); split: -2.26%, +0.01% VALU: 5906415 -> 5833179 (-1.24%); split: -1.25%, +0.01% SALU: 973428 -> 968088 (-0.55%); split: -0.61%, +0.06% VMEM: 298277 -> 296504 (-0.59%); split: -0.61%, +0.01% SMEM: 402244 -> 399612 (-0.65%); split: -0.71%, +0.06% Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38853>	2025-12-12 17:00:51 +00:00
Georg Lehmann	072815e5cb	aco/gfx6: move mrtz writemask workaround to assembler and handle all mrt Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38853>	2025-12-12 17:00:51 +00:00
Rhys Perry	156ae6195e	aco: print large p_parallelcopy using several lines Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Emre Cecanpunar <emreleno@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38695>	2025-12-11 16:51:21 +00:00
Rhys Perry	21414e0898	aco/ra: add first loop header phi operand to temp_to_phi_resources If the first operand is a CSSA copy, we might want to add this to temp_to_phi_resources, so that we later mark it as the last-seen phi operand. fossil-db (navi31): Totals from 284 (0.36% of 79825) affected shaders: Instrs: 4160233 -> 4157517 (-0.07%); split: -0.09%, +0.03% CodeSize: 21546420 -> 21532884 (-0.06%); split: -0.09%, +0.02% VGPRs: 31404 -> 31416 (+0.04%) Latency: 40266308 -> 40253731 (-0.03%); split: -0.06%, +0.02% InvThroughput: 8140751 -> 8139724 (-0.01%); split: -0.05%, +0.04% VClause: 99849 -> 99835 (-0.01%); split: -0.02%, +0.01% Copies: 344512 -> 341732 (-0.81%); split: -1.08%, +0.28% Branches: 113620 -> 113629 (+0.01%); split: -0.02%, +0.03% VALU: 2502619 -> 2499836 (-0.11%); split: -0.15%, +0.04% SALU: 499245 -> 499341 (+0.02%); split: -0.02%, +0.04% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Emre Cecanpunar <emreleno@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38695>	2025-12-11 16:51:21 +00:00
Rhys Perry	43b3901362	aco/ra: copy vector_info to affinities This eliminates some copies in BVH traversal loops. fossil-db (navi31): Totals from 200 (0.25% of 79825) affected shaders: Instrs: 734931 -> 732521 (-0.33%); split: -0.34%, +0.01% CodeSize: 3801080 -> 3791692 (-0.25%); split: -0.26%, +0.01% VGPRs: 13704 -> 13728 (+0.18%); split: -0.44%, +0.61% Latency: 6094605 -> 6082060 (-0.21%); split: -0.24%, +0.03% InvThroughput: 1081982 -> 1080121 (-0.17%); split: -0.19%, +0.02% VClause: 18835 -> 18837 (+0.01%); split: -0.01%, +0.02% Copies: 64602 -> 62239 (-3.66%); split: -3.75%, +0.09% Branches: 20111 -> 20112 (+0.00%); split: -0.01%, +0.02% VALU: 438618 -> 436257 (-0.54%); split: -0.55%, +0.01% SALU: 85092 -> 85085 (-0.01%); split: -0.01%, +0.00% VOPD: 76 -> 74 (-2.63%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Emre Cecanpunar <emreleno@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38695>	2025-12-11 16:51:21 +00:00
Georg Lehmann	ef246aaf72	aco/isel: emit register copies for workgroup ids This way, we don't overestimate SGPR pressure. Foz-DB Navi48: Totals from 1413 (1.45% of 97637) affected shaders: Instrs: 3468375 -> 3468585 (+0.01%); split: -0.01%, +0.02% CodeSize: 18643064 -> 18643520 (+0.00%); split: -0.01%, +0.01% VGPRs: 71776 -> 71788 (+0.02%) SpillSGPRs: 18575 -> 18561 (-0.08%) Latency: 23207643 -> 23207998 (+0.00%); split: -0.00%, +0.01% InvThroughput: 8116806 -> 8116541 (-0.00%); split: -0.01%, +0.00% VClause: 75250 -> 75252 (+0.00%); split: -0.00%, +0.00% SClause: 65274 -> 65283 (+0.01%); split: -0.02%, +0.04% Copies: 275750 -> 275942 (+0.07%); split: -0.03%, +0.10% PreSGPRs: 70246 -> 69072 (-1.67%) VALU: 1892111 -> 1892092 (-0.00%); split: -0.00%, +0.00% SALU: 523460 -> 523648 (+0.04%); split: -0.02%, +0.05% VOPD: 41097 -> 41102 (+0.01%) Sadly the RA noise is slightly negative for instruction count. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38830>	2025-12-11 08:06:59 +00:00
Georg Lehmann	839a035564	aco/optimizer: propagate fixed regs to copy/extract/insert Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38830>	2025-12-11 08:06:58 +00:00
Georg Lehmann	d32dd5e1df	aco/optimizer: propagate fixed registers Foz-DB Navi48: Totals from 351 (0.36% of 97637) affected shaders: Instrs: 3568192 -> 3567166 (-0.03%) CodeSize: 18890368 -> 18886304 (-0.02%) Latency: 17047945 -> 17048185 (+0.00%); split: -0.00%, +0.00% InvThroughput: 3185739 -> 3185813 (+0.00%); split: -0.00%, +0.00% SClause: 61544 -> 61536 (-0.01%) Copies: 271592 -> 270845 (-0.28%) PreSGPRs: 17186 -> 17094 (-0.54%) PreVGPRs: 21897 -> 21901 (+0.02%) VALU: 2003976 -> 2003980 (+0.00%) SALU: 468403 -> 467664 (-0.16%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38830>	2025-12-11 08:06:58 +00:00
Georg Lehmann	b798ace443	aco/optimizer: fix skip_smem_offset_align with non temp register operands Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38830>	2025-12-11 08:06:58 +00:00
Georg Lehmann	911e1ce168	aco/isel: emit exec copy for ballot(true) Once copy propagated in the optimizer, this will allow using nir_opt_uniform_subgroup without too many regressions. Foz-DB Navi48: Totals from 405 (0.41% of 97637) affected shaders: Instrs: 3796716 -> 3796894 (+0.00%); split: -0.00%, +0.00% CodeSize: 20116136 -> 20116652 (+0.00%); split: -0.00%, +0.00% Latency: 18326661 -> 18327114 (+0.00%); split: -0.00%, +0.00% InvThroughput: 3353206 -> 3353268 (+0.00%); split: -0.00%, +0.00% Copies: 292307 -> 293830 (+0.52%) SALU: 507523 -> 507738 (+0.04%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38830>	2025-12-11 08:06:58 +00:00

1 2 3 4 5 ...

4161 commits