fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 05:08:06 +02:00

Author	SHA1	Message	Date
Daniel Schürmann	37299a8d1a	aco/scheduler: Stop downwards scheduling after encountering the first clause Totals from 9899 (12.40% of 79839) affected shaders: (Navi48) MaxWaves: 276355 -> 276317 (-0.01%); split: +0.01%, -0.02% Instrs: 8781768 -> 8766504 (-0.17%); split: -0.25%, +0.07% CodeSize: 46297556 -> 46236104 (-0.13%); split: -0.19%, +0.06% VGPRs: 574680 -> 574800 (+0.02%); split: -0.00%, +0.03% Latency: 54261324 -> 54357916 (+0.18%); split: -0.14%, +0.32% InvThroughput: 9122700 -> 9121115 (-0.02%); split: -0.07%, +0.05% VClause: 222062 -> 218499 (-1.60%); split: -2.33%, +0.73% SClause: 167138 -> 163233 (-2.34%); split: -2.43%, +0.09% Copies: 602395 -> 598560 (-0.64%); split: -1.21%, +0.57% Branches: 161939 -> 161932 (-0.00%); split: -0.01%, +0.00% VALU: 5063999 -> 5060199 (-0.08%); split: -0.14%, +0.07% SALU: 988254 -> 988285 (+0.00%); split: -0.02%, +0.02% VOPD: 2478 -> 2443 (-1.41%); split: +0.40%, -1.82% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>	2025-08-19 16:59:09 +00:00
Daniel Schürmann	fb6b95517e	aco/scheduler: check dependencies of entire clause upfront and bail if any instruction of the clause can't be moved. Totals from 4310 (5.40% of 79839) affected shaders: MaxWaves: 115826 -> 115834 (+0.01%) Instrs: 6256436 -> 6257599 (+0.02%); split: -0.05%, +0.07% CodeSize: 32816488 -> 32820768 (+0.01%); split: -0.04%, +0.05% VGPRs: 260184 -> 260172 (-0.00%) Latency: 41207213 -> 41052150 (-0.38%); split: -0.45%, +0.07% InvThroughput: 6822608 -> 6815208 (-0.11%); split: -0.14%, +0.03% VClause: 148412 -> 147133 (-0.86%); split: -1.03%, +0.17% SClause: 120854 -> 120856 (+0.00%); split: -0.01%, +0.01% Copies: 425910 -> 427276 (+0.32%); split: -0.25%, +0.57% VALU: 3572293 -> 3573647 (+0.04%); split: -0.03%, +0.07% VOPD: 2803 -> 2816 (+0.46%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36599>	2025-08-19 16:59:08 +00:00
Daniel Schürmann	7e63251d1f	aco/isel: refactor store_shared() by directly matching NIR intrinsics to ACO opcodes Totals from 1435 (1.80% of 79839) affected shaders: (Navi48) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:15 +00:00
Daniel Schürmann	1fde289539	aco/isel: refactor load_shared() by directly matching NIR intrinsics to ACO opcodes Totals from 3 (0.00% of 79839) affected shaders: (Navi48) Instrs: 700 -> 698 (-0.29%) CodeSize: 3860 -> 3852 (-0.21%) Latency: 2351 -> 2349 (-0.09%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:15 +00:00
Daniel Schürmann	4632ee4c37	aco/isel: rename emit_readfirstlane() -> emit_vector_as_uniform() Also allow to use p_as_uniform and improve vector splitting. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:14 +00:00
Daniel Schürmann	26595577b3	aco/isel: allow for large 8-bit vectors in extract_8_16_bit_sgpr_element() Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36133>	2025-08-19 14:28:14 +00:00
Georg Lehmann	9ed94371f7	amd: stop using custom gl_access_qualifier for access type Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>	2025-08-15 08:26:10 +00:00
Georg Lehmann	f17cb6b714	amd: replace ACCESS_TYPE_SMEM with ACCESS_SMEM_AMD Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36764>	2025-08-15 08:26:10 +00:00
Georg Lehmann	6ba462bf26	aco/disable_wqm: optimize local mask creation Foz-DB Navi48: Totals from 7861 (9.79% of 80287) affected shaders: Instrs: 13276809 -> 13183483 (-0.70%) CodeSize: 71221260 -> 70852500 (-0.52%); split: -0.52%, +0.00% Latency: 124001421 -> 123976480 (-0.02%); split: -0.02%, +0.00% InvThroughput: 17820119 -> 17817551 (-0.01%); split: -0.01%, +0.00% SALU: 1736356 -> 1666673 (-4.01%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:47 +00:00
Georg Lehmann	fc53cf146c	aco: disable wqm for sampled buffer loads when not needed Foz-DB GFX1201: Totals from 318 (0.40% of 80287) affected shaders: Instrs: 313039 -> 314064 (+0.33%); split: -0.00%, +0.33% CodeSize: 1684104 -> 1688212 (+0.24%); split: -0.00%, +0.24% VGPRs: 15120 -> 15144 (+0.16%) Latency: 2515023 -> 2518610 (+0.14%); split: -0.06%, +0.20% InvThroughput: 447468 -> 447615 (+0.03%); split: -0.02%, +0.05% VClause: 4866 -> 4914 (+0.99%) SClause: 6564 -> 6559 (-0.08%); split: -0.09%, +0.02% Copies: 23577 -> 23673 (+0.41%); split: -0.04%, +0.45% PreSGPRs: 16019 -> 16029 (+0.06%) VALU: 172157 -> 172143 (-0.01%) SALU: 52816 -> 53867 (+1.99%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:47 +00:00
Georg Lehmann	883b1ca364	aco: disable wqm for tex loads when not needed By only executing VMEM loads for lanes where the result is used, we can save bandwidth. The NIR pass only handles tex for now, but those are most common anyway. We can extend it handle image/ssbo/ubo/global loads in the future. Foz-DB GFX1201: Totals from 32633 (40.66% of 80251) affected shaders: Instrs: 22635910 -> 23193509 (+2.46%); split: -0.00%, +2.46% CodeSize: 122880044 -> 125093428 (+1.80%); split: -0.00%, +1.81% VGPRs: 1481868 -> 1481712 (-0.01%) SpillSGPRs: 3877 -> 4301 (+10.94%); split: -0.52%, +11.45% Latency: 171480552 -> 171685219 (+0.12%); split: -0.18%, +0.30% InvThroughput: 24364743 -> 24373441 (+0.04%); split: -0.08%, +0.12% VClause: 388318 -> 388557 (+0.06%); split: -0.06%, +0.13% SClause: 774781 -> 776492 (+0.22%); split: -0.29%, +0.51% Copies: 1416586 -> 1541199 (+8.80%); split: -0.16%, +8.96% Branches: 419591 -> 419673 (+0.02%); split: -0.02%, +0.04% PreSGPRs: 1330303 -> 1416540 (+6.48%) PreVGPRs: 964864 -> 964863 (-0.00%) VALU: 12919601 -> 12920254 (+0.01%); split: -0.01%, +0.01% SALU: 2685402 -> 3224147 (+20.06%); split: -0.00%, +20.07% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	7159fd21f8	aco: don't restrict vmem load scheduling by inserting p_end_wqm early Foz-DB GFX1201: Totals from 7 (0.01% of 80251) affected shaders: Instrs: 703 -> 729 (+3.70%) CodeSize: 4032 -> 4136 (+2.58%) Latency: 5840 -> 4715 (-19.26%) InvThroughput: 441 -> 405 (-8.16%) Copies: 61 -> 67 (+9.84%) PreSGPRs: 216 -> 218 (+0.93%) SALU: 93 -> 113 (+21.51%) When reordered after the next commit: Foz-DB GFX1201: Totals from 1609 (2.00% of 80251) affected shaders: MaxWaves: 47984 -> 47986 (+0.00%) Instrs: 1326847 -> 1332797 (+0.45%); split: -0.05%, +0.50% CodeSize: 7248720 -> 7275364 (+0.37%); split: -0.04%, +0.41% VGPRs: 74968 -> 75148 (+0.24%); split: -0.06%, +0.30% SpillSGPRs: 182 -> 184 (+1.10%) Latency: 10370602 -> 10172524 (-1.91%); split: -2.06%, +0.15% InvThroughput: 1446508 -> 1445920 (-0.04%); split: -0.11%, +0.06% VClause: 23567 -> 23559 (-0.03%); split: -0.35%, +0.32% SClause: 43143 -> 43203 (+0.14%); split: -0.52%, +0.66% Copies: 80948 -> 81622 (+0.83%); split: -0.32%, +1.16% Branches: 21599 -> 21727 (+0.59%) PreSGPRs: 69963 -> 70732 (+1.10%) VALU: 778968 -> 779024 (+0.01%); split: -0.02%, +0.03% SALU: 159797 -> 165329 (+3.46%); split: -0.01%, +3.47% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	c1b29174b4	aco: use a smaller wqm section for strict_wqm sampling It's only important that the coordinate is created in WQM, the sample itself doesn't care. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	de4b345949	aco/insert_exec: remove per instruction wqm/exact exec handling No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	11cee3d634	aco: use new disable_wqm for p_dual_src_export_gfx11 No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	8e53ba9a0a	aco: use new disable_wqm for exp No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	bd6647e21e	aco/builder: support new disable_wqm Create the additional undef operands that are filled by insert_exec. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	0e66f2b2cc	aco: use new disable_wqm for mimg Foz-DB GFX1201: Totals from 88 (0.11% of 80251) affected shaders: Instrs: 81954 -> 82218 (+0.32%); split: -0.02%, +0.34% CodeSize: 451824 -> 452880 (+0.23%); split: -0.02%, +0.25% Latency: 308818 -> 308746 (-0.02%); split: -0.05%, +0.02% VClause: 1324 -> 1318 (-0.45%) Copies: 2795 -> 2784 (-0.39%) PreSGPRs: 4029 -> 4035 (+0.15%) SALU: 6563 -> 6809 (+3.75%); split: -0.15%, +3.90% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	922f559c3c	aco: use new disable_wqm for flatlike No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	a4c537c5b3	aco: use new disable_wqm for mubuf/mtbuf Foz-DB GFX1201: Totals from 66 (0.08% of 80251) affected shaders: Instrs: 45373 -> 45663 (+0.64%); split: -0.01%, +0.65% CodeSize: 251708 -> 252900 (+0.47%); split: -0.00%, +0.48% Latency: 278977 -> 278652 (-0.12%); split: -0.14%, +0.02% InvThroughput: 38259 -> 38245 (-0.04%); split: -0.05%, +0.02% VClause: 982 -> 962 (-2.04%) Copies: 2882 -> 2808 (-2.57%) PreSGPRs: 2564 -> 2599 (+1.37%) SALU: 4748 -> 5010 (+5.52%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	63af48ae2e	aco/insert_exec: new way to handle instructions that need wqm disabled Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	ca25553b92	aco: add a post-RA pass to disable wqm By disabling WQM post-RA, we don't have RA/spilling mov issues with image_sample operands that need to be computed in WQM. We also don't restrict scheduling by inserting exec writes. The only downside is more scalar ALU usage, but the SALU is almost always underutilized. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Georg Lehmann	34b154866f	aco/insert_exec: remove p_jump_to_epilog from needs exact p_end_wqm will always be emitted before it by isel. No Foz-DB changes on GFX1201. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35970>	2025-08-15 07:03:46 +00:00
Yonggang Luo	fc1b26f4dc	aco: Fixes warning note: ambiguity is between a regular call to this operator and a call with the argument order reversed Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details ../../src/amd/compiler/aco_util.h:300:9: note: ambiguity is between a regular call to this operator and a call with the argument order reversed 300 \| bool operator==(const monotonic_buffer_resource& other) { return buffer == other.buffer; } Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36722>	2025-08-13 19:49:37 +00:00
Rhys Perry	08f088479a	aco/ra: set late-kill for operands of temporary p_create_vector Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13543 Fixes: `c279dd6e61` ("aco: Support vector-aligned ops fixed to defs") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36469>	2025-08-06 09:44:01 +00:00
Daniel Schürmann	d3743dd7ba	aco/scheduler: improve scheduling heuristic Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The heuristic we are currently using still stems from the GCN era with the only adjustments being made for RDNA was to double (or triple) the wave count. This rewrite aims to detangle some concepts and provide more consistent results. - wave_factor: The purpose of this value is to reflect that RDNA SIMDs can accomodate twice as many waves as GCN SIMDs. - reg_file_multiple: This value accounts for the larger register file of wave32 and some RDNA3 families. - wave_minimum: Below this value, we don't sacrifice any waves. It corresponds to a register demand of 64 VGPRs in wave64. - occupancy_factor: Depending on target_waves and wave_factor, this controls the scheduling window sizes and number of moves. The main differences from the previous heuristic is a lower wave minimum and a slightly less aggressive reduction of waves. It also increases SMEM_MAX_MOVES in order to mitigate some of the changes from targeting less waves. Totals from 62777 (78.63% of 79839) affected shaders: (Navi48) MaxWaves: 1880983 -> 1848028 (-1.75%); split: +0.01%, -1.76% Instrs: 40904711 -> 40800797 (-0.25%); split: -0.39%, +0.14% CodeSize: 217132208 -> 216748832 (-0.18%); split: -0.29%, +0.12% VGPRs: 3019304 -> 3099596 (+2.66%); split: -0.11%, +2.77% Latency: 268857129 -> 265951122 (-1.08%); split: -1.33%, +0.25% InvThroughput: 40960938 -> 41044533 (+0.20%); split: -0.18%, +0.39% VClause: 794000 -> 782913 (-1.40%); split: -2.24%, +0.84% SClause: 1192476 -> 1150831 (-3.49%); split: -3.94%, +0.45% Copies: 2720470 -> 2700148 (-0.75%); split: -1.84%, +1.09% Branches: 785926 -> 785951 (+0.00%); split: -0.01%, +0.01% VALU: 22918411 -> 22890189 (-0.12%); split: -0.19%, +0.06% SALU: 5281201 -> 5289486 (+0.16%); split: -0.21%, +0.36% VOPD: 8790 -> 8685 (-1.19%); split: +1.08%, -2.28% Totals from 62081 (77.77% of 79825) affected shaders: (Navi31) MaxWaves: 1848555 -> 1812347 (-1.96%); split: +0.01%, -1.97% Instrs: 39794460 -> 39704180 (-0.23%); split: -0.39%, +0.16% CodeSize: 208987052 -> 208621524 (-0.17%); split: -0.31%, +0.13% VGPRs: 3046284 -> 3135156 (+2.92%); split: -0.11%, +3.03% Latency: 268863465 -> 265218186 (-1.36%); split: -1.59%, +0.23% InvThroughput: 41101515 -> 41167075 (+0.16%); split: -0.22%, +0.38% VClause: 795316 -> 774899 (-2.57%); split: -3.17%, +0.61% SClause: 1177294 -> 1135451 (-3.55%); split: -4.06%, +0.51% Copies: 2743254 -> 2725127 (-0.66%); split: -1.90%, +1.24% Branches: 801395 -> 801428 (+0.00%); split: -0.01%, +0.02% VALU: 23898938 -> 23871294 (-0.12%); split: -0.20%, +0.08% SALU: 3908807 -> 3919130 (+0.26%); split: -0.23%, +0.50% VOPD: 8529 -> 8500 (-0.34%); split: +1.29%, -1.63% Totals from 44996 (71.01% of 63370) affected shaders: (Vega10) MaxWaves: 307074 -> 304808 (-0.74%); split: +0.63%, -1.37% Instrs: 22743534 -> 22716240 (-0.12%); split: -0.22%, +0.10% CodeSize: 117284856 -> 117173212 (-0.10%); split: -0.19%, +0.09% SGPRs: 3249008 -> 3330480 (+2.51%); split: -0.36%, +2.87% VGPRs: 1901400 -> 1943880 (+2.23%); split: -0.60%, +2.83% Latency: 224839126 -> 222878477 (-0.87%); split: -1.19%, +0.31% InvThroughput: 114389570 -> 114316559 (-0.06%); split: -0.17%, +0.11% VClause: 482012 -> 473304 (-1.81%); split: -2.86%, +1.05% SClause: 757799 -> 717092 (-5.37%); split: -5.64%, +0.27% Copies: 2182735 -> 2183598 (+0.04%); split: -1.17%, +1.21% Branches: 396026 -> 395996 (-0.01%); split: -0.03%, +0.02% VALU: 16740283 -> 16728098 (-0.07%); split: -0.14%, +0.07% SALU: 2133575 -> 2145863 (+0.58%); split: -0.29%, +0.86% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30720>	2025-08-06 09:16:33 +00:00
Qiang Yu	196569b1a4	all: rename gl_shader_stage to mesa_shader_stage It's not only for GL, change to a generic name. Use command: find . -type f -not -path '/.git/' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} + Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>	2025-08-06 10:28:40 +08:00
Rhys Perry	76c96bf558	aco: fix possible scratch offset overflow We split vector load/store, so consider that we might add to the constant offset. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>	2025-08-04 15:06:44 +00:00
Rhys Perry	44ab4ad732	aco: align scratch size after isel Make it safe for VGPR spilling if it's not a multiple of 4. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>	2025-08-04 15:06:43 +00:00
Rhys Perry	ab10604924	aco/gfx12: fix printing of temporal hints Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36406>	2025-08-04 15:06:41 +00:00
Rhys Perry	cec845079e	ac/nir/lower_ps: remove barrier for end_invocation_interlock Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details SPIR-V->NIR now inserts this barrier itself. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513>	2025-08-04 09:30:06 +00:00
Daniel Schürmann	4ca3cc5a1a	aco/ra: propagate precolor affinities through parallelcopies and tied definitions Totals from 214 (0.27% of 79839) affected shaders: (Navi48) Instrs: 65339 -> 65311 (-0.04%); split: -0.05%, +0.00% CodeSize: 352616 -> 350952 (-0.47%); split: -0.55%, +0.07% VGPRs: 9984 -> 9960 (-0.24%) Latency: 207556 -> 207508 (-0.02%); split: -0.03%, +0.01% InvThroughput: 40422 -> 40397 (-0.06%) Copies: 3180 -> 3155 (-0.79%) VALU: 38347 -> 38322 (-0.07%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	a667d9a68d	aco/ra: propagate precolor affinities through phis Totals from 917 (1.15% of 79839) affected shaders: (Navi48) Instrs: 3217861 -> 3216947 (-0.03%); split: -0.04%, +0.01% CodeSize: 17427204 -> 17432264 (+0.03%); split: -0.06%, +0.09% VGPRs: 65328 -> 65316 (-0.02%) Latency: 35336268 -> 35335528 (-0.00%); split: -0.01%, +0.01% InvThroughput: 7305032 -> 7302187 (-0.04%); split: -0.04%, +0.00% SClause: 120537 -> 120553 (+0.01%); split: -0.01%, +0.02% Copies: 307257 -> 306852 (-0.13%); split: -0.21%, +0.08% Branches: 115744 -> 115743 (-0.00%) VALU: 1572522 -> 1572183 (-0.02%); split: -0.02%, +0.00% SALU: 574229 -> 574155 (-0.01%); split: -0.05%, +0.04% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	2ddd8ef0a3	aco/ra: don't optimize encodings on precolor affinity mismatch Totals from 238 (0.30% of 79839) affected shaders: (Navi48) Instrs: 137836 -> 137176 (-0.48%); split: -0.50%, +0.02% CodeSize: 728616 -> 728668 (+0.01%); split: -0.06%, +0.07% Latency: 1503248 -> 1500202 (-0.20%); split: -0.56%, +0.36% InvThroughput: 297725 -> 296715 (-0.34%); split: -0.70%, +0.36% Copies: 9390 -> 8825 (-6.02%); split: -6.33%, +0.31% VALU: 89861 -> 89296 (-0.63%); split: -0.66%, +0.03% SALU: 13166 -> 13167 (+0.01%); split: -0.05%, +0.06% Suggested-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	93606a19c6	aco/ra: collect register affinities for all precolored operands. Totals from 1280 (1.60% of 79839) affected shaders: (Navi48) Instrs: 817363 -> 812639 (-0.58%); split: -0.58%, +0.00% CodeSize: 4262644 -> 4243540 (-0.45%); split: -0.45%, +0.00% VGPRs: 61692 -> 61668 (-0.04%) Latency: 4354318 -> 4347818 (-0.15%); split: -0.15%, +0.00% InvThroughput: 711914 -> 707698 (-0.59%); split: -0.59%, +0.00% VClause: 14685 -> 14677 (-0.05%); split: -0.09%, +0.03% SClause: 25623 -> 25621 (-0.01%) Copies: 50663 -> 46242 (-8.73%); split: -8.73%, +0.00% VALU: 427744 -> 423323 (-1.03%); split: -1.03%, +0.00% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	e32eec52f0	aco/ra: generalize register affinities Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Daniel Schürmann	caa2c22d8b	aco/tests: Fix p_startpgm definitions to registers Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>	2025-08-01 17:15:54 +00:00
Alyssa Rosenzweig	cc6e3b84cb	treewide: use nir_def_as_* Via Coccinelle patch: @@ expression definition; @@ -nir_instr_as_alu(definition->parent_instr) +nir_def_as_alu(definition) @@ expression definition; @@ -nir_instr_as_intrinsic(definition->parent_instr) +nir_def_as_intrinsic(definition) @@ expression definition; @@ -nir_instr_as_phi(definition->parent_instr) +nir_def_as_phi(definition) @@ expression definition; @@ -nir_instr_as_load_const(definition->parent_instr) +nir_def_as_load_const(definition) @@ expression definition; @@ -nir_instr_as_deref(definition->parent_instr) +nir_def_as_deref(definition) @@ expression definition; @@ -nir_instr_as_tex(definition->parent_instr) +nir_def_as_tex(definition) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Marek Olšák <maraeo@gmail.com> Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>	2025-08-01 15:34:24 +00:00
Antonio Ospite	ddf2aa3a4d	build: avoid redefining unreachable() which is standard in C23 In the C23 standard unreachable() is now a predefined function-like macro in <stddef.h> See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in And this causes build errors when building for C23: ----------------------------------------------------------------------- In file included from ../src/util/log.h:30, from ../src/util/log.c:30: ../src/util/macros.h:123:9: warning: "unreachable" redefined 123 \| #define unreachable(str) \ \| ^~~~~~~~~~~ In file included from ../src/util/macros.h:31: /usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition 456 \| #define unreachable() (__builtin_unreachable ()) \| ^~~~~~~~~~~ ----------------------------------------------------------------------- So don't redefine it with the same name, but use the name UNREACHABLE() to also signify it's a macro. Using a different name also makes sense because the behavior of the macro was extending the one of __builtin_unreachable() anyway, and it also had a different signature, accepting one argument, compared to the standard unreachable() with no arguments. This change improves the chances of building mesa with the C23 standard, which for instance is the default in recent AOSP versions. All the instances of the macro, including the definition, were updated with the following command line: git grep -l '[^_]unreachable(' -- "src/**" \| sort \| uniq \| \ while read file; \ do \ sed -e 's/$[^_]$unreachable(/\1UNREACHABLE(/g' -i "$file"; \ done && \ sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>	2025-07-31 17:49:42 +00:00
Georg Lehmann	a6a6c2f691	aco/ra: convert bitwise instruction to gfx11+ 16bit on demand Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The 32bit versions are smaller, allow more optimizations and VOPD, so only use the 16bit opcodes if nessecary. Foz-DB Navi31: Totals from 84 (0.10% of 80237) affected shaders: Instrs: 176673 -> 176347 (-0.18%); split: -0.20%, +0.01% CodeSize: 970148 -> 969716 (-0.04%); split: -0.08%, +0.03% VGPRs: 5876 -> 5864 (-0.20%) Latency: 2805974 -> 2805674 (-0.01%); split: -0.02%, +0.01% InvThroughput: 769007 -> 768738 (-0.03%); split: -0.04%, +0.01% VClause: 2593 -> 2597 (+0.15%) Copies: 23749 -> 23487 (-1.10%); split: -1.11%, +0.00% VALU: 107124 -> 106862 (-0.24%); split: -0.25%, +0.00% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919>	2025-07-31 12:07:07 +00:00
Georg Lehmann	404e1f13e8	aco/print_asm: use real true16 instr on gfx11+ Fake16 doesn't print opsel on v_cndmask_b16, so it looks really broken. Restrict to LLVM20+ because older versions have incomplete tru16 support. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919>	2025-07-31 12:07:07 +00:00
Georg Lehmann	b12db991eb	aco/gfx10: optimize subgroupRotate(x, 32) and subgroupShuffleXor(x, 32) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We don't have v_permlane64_b32 yet, but we can still optimize it using shared vgprs. Using the DPP16 row mask, we can even avoid writing exec. With v0 input/output and v24/v25 as shared vgprs, this results in: v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>	2025-07-29 06:33:20 +00:00
Georg Lehmann	eb4df58a3d	aco/isel: refactor shared vgpr usage Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>	2025-07-29 06:33:20 +00:00
Georg Lehmann	8a2aca8d6f	aco/select_alu: avoid vector get_alu_src for instructions with scalar operands Foz-DB Navi21: Totals from 1 (0.00% of 80237) affected shaders: Instrs: 22 -> 21 (-4.55%) CodeSize: 112 -> 108 (-3.57%) Latency: 392 -> 386 (-1.53%) InvThroughput: 25 -> 24 (-4.00%) Copies: 4 -> 3 (-25.00%) PreVGPRs: 8 -> 4 (-50.00%) VALU: 10 -> 9 (-10.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728>	2025-07-29 06:07:15 +00:00
Georg Lehmann	ad9c340d86	aco: insert VALU s_delay_alu for WMMA This should avoid some SIMD stalls. I think this special case was added to try to handle this case: First Instruction: WMMA Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction If I read it correctly, we shouldn't need a delay if the type is the same and no modifier is used. That's kind of complex to handle, so leave it for now. Not inserting any delays likely hurts more than this. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>	2025-07-29 05:48:29 +00:00
Georg Lehmann	413d0d2ec8	aco/statistics: update GFX12 WMMA cost Based on marketing numbers, but they seem to match RGP. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>	2025-07-29 05:48:29 +00:00
Georg Lehmann	8f61c85880	aco/statistics: add latency to WMMA Assume the normal VALU latency of 4 cycles. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>	2025-07-29 05:48:29 +00:00
Georg Lehmann	004f8aa2f4	aco: optimize get_alu_src with constant source and size > 1 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Emulated FSR4, Navi31: Totals from 14 (100.00% of 14) affected shaders: MaxWaves: 130 -> 131 (+0.77%) Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09% CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07% VGPRs: 2544 -> 2520 (-0.94%) SpillVGPRs: 92 -> 89 (-3.26%) Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66% InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02% VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80% Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51% PreSGPRs: 504 -> 519 (+2.98%) PreVGPRs: 2460 -> 2448 (-0.49%) VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05% SALU: 1231 -> 1384 (+12.43%) VMEM: 3807 -> 3801 (-0.16%) VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67% Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090>	2025-07-25 11:33:00 +00:00
Alyssa Rosenzweig	8a1a410389	treewide: use SWAP macro Via Coccinelle patch + manual clean up: @@ identifier temporary, a, b; type T; @@ -T temporary = a; -a = b; -b = temporary; +SWAP(a, b); Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297>	2025-07-23 19:49:47 +00:00
Georg Lehmann	c80daf934c	aco: supported 64bit or vectorized bitfield_select No Foz-DB changes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36141>	2025-07-21 20:42:32 +00:00

1 2 3 4 5 ...

3894 commits