fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-20 11:18:11 +02:00

Author	SHA1	Message	Date
Georg Lehmann	a706769a0b	nir: move exact bit to nir_fp_math_control Unifies nir per instruction float control. In the future this can be split into contract/reassoc/transform like SPIR-V. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (except SPIR-V) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>	2026-01-07 09:40:57 +00:00
Georg Lehmann	369a3b22b4	nir/opt_uniform_subgroup: optimize uniform ddx/ddy Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details We can't just use 0.0 as the replacement because of NaN/Inf. But turning the intrinsic into a simple fsub should still be better or at least equal. Foz-DB Navi48: Totals from 128 (0.10% of 125402) affected shaders: MaxWaves: 3684 -> 3708 (+0.65%) Instrs: 111150 -> 111055 (-0.09%); split: -0.20%, +0.11% CodeSize: 587176 -> 590800 (+0.62%); split: -0.01%, +0.63% VGPRs: 6540 -> 6480 (-0.92%) Latency: 382775 -> 383332 (+0.15%); split: -0.15%, +0.29% InvThroughput: 80909 -> 80530 (-0.47%); split: -0.51%, +0.04% VClause: 1433 -> 1430 (-0.21%) SClause: 1834 -> 1841 (+0.38%); split: -0.11%, +0.49% Copies: 6130 -> 6096 (-0.55%); split: -1.29%, +0.73% PreSGPRs: 7352 -> 7356 (+0.05%) PreVGPRs: 4797 -> 4721 (-1.58%) VALU: 71892 -> 71435 (-0.64%); split: -0.64%, +0.01% SALU: 12665 -> 13056 (+3.09%); split: -0.06%, +3.14% Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39112>	2026-01-01 08:43:55 +00:00
Georg Lehmann	71f0c0d6a6	nir/opt_uniform_subgroup: optimize add/xor reduce of bcsel(div, con, con) Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Foz-DB Navi48: Totals from 12 (0.01% of 97623) affected shaders: Instrs: 9207 -> 8973 (-2.54%) CodeSize: 54192 -> 52832 (-2.51%) VGPRs: 768 -> 480 (-37.50%) Latency: 39516 -> 38507 (-2.55%) InvThroughput: 10155 -> 9859 (-2.91%) PreSGPRs: 329 -> 332 (+0.91%) PreVGPRs: 268 -> 263 (-1.87%) VALU: 4393 -> 4257 (-3.10%) SALU: 1037 -> 1019 (-1.74%) VOPD: 602 -> 599 (-0.50%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	0e5e1cb9b0	nir/opt_uniform_subgroup: optimize min/max/and/or reduce of bcsel(div, con, con) Foz-DB Navi48: Totals from 1 (0.00% of 97397) affected shaders: Instrs: 1848 -> 1834 (-0.76%) CodeSize: 9996 -> 9908 (-0.88%) VGPRs: 96 -> 72 (-25.00%) Latency: 17371 -> 17358 (-0.07%) Copies: 190 -> 191 (+0.53%) PreVGPRs: 43 -> 41 (-4.65%) VALU: 657 -> 648 (-1.37%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>	2025-12-19 20:23:23 +00:00
Georg Lehmann	621465e417	nir/opt_uniform_subgroup: handle more trivial shuffles/votes Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	e648e551c1	nir/opt_uniform_subgroup: wire up mbcnt_amd path Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	5778436e99	nir/opt_uniform_subgroup: use nir_shader_intrinsics_pass Nothing here needs the recursion of the full lower_instructions pass. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	1fc38d8539	nir/opt_uniform_subgroup: fix swizzle_amd without fetch_inactive Fixes: `ad5be40303` ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	e11d7f06d0	nir/opt_uniform_subgroup: don't try to optimize non trivial clustered reduce Fixes: `535caaf3e0` ("nir: Optimize uniform iadd, fadd, and ixor reduction operations") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>	2025-12-10 13:32:08 +00:00
Georg Lehmann	f8633511be	nir: make ballot find_lsb/msb/bit_count 32bit only The lowering is 32bit only too. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37178>	2025-09-04 14:03:58 +00:00
Job Noorman	ae66bd1c00	nir/opt_uniform_subgroup: use ballot_bit_count Using bit_count on the result of ballot doesn't work for targets where ballot's num_components > 1. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Emma Anholt <emma@anholt.net> Fixes: `d2e1e4442a` ("ir3: enable nir_opt_uniform_subgroup") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35669>	2025-08-05 17:09:27 +00:00
Mel Henning	8795006994	nir/opt_uniform_subgroup: Handle vote_feq Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Brings the vertex shader in dEQP-VK.subgroups.vote.framebuffer.subgroupallequal_dvec4_vertex from 234 to 169 instructions on NAK. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>	2025-06-28 16:10:50 +00:00
Mel Henning	70fccc59fc	nir/opt_uniform_subgroup: Handle vote_ieq No shader-db changes here, but it does improve some cts shaders, eg. the vertex shader in dEQP-VK.subgroups.vote.framebuffer.subgroupallequal_i64vec4_vertex goes from 80 to 56 instructions with NAK Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>	2025-06-28 16:10:50 +00:00
Alyssa Rosenzweig	91872c9c51	nir: clang-format Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>	2025-02-26 15:19:53 +00:00
Daniel Schürmann	86fd673ade	nir: require nir_metadata_divergence if needed Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30814>	2025-02-13 10:08:43 +00:00
Daniel Schürmann	c8348139fd	nir: change signature of nir_src_is_divergent() Now, it takes nir_src * instead of nir_src. Also move the implementation to nir_divergence_analysis.c. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00
Ian Romanick	a2292f53b5	nir: Optimize uniform vote_all and vote_any No shader-db changes on any Intel platform. fossil-db: All Ice Lake and newer platforms had similar results. (Ice Lake) Totals: Instrs: 165513303 -> 165511820 (-0.00%) Cycles: 15125314947 -> 15125211500 (-0.00%); split: -0.00%, +0.00% Totals from 82 (0.01% of 656120) affected shaders: Instrs: 544627 -> 543144 (-0.27%) Cycles: 22616493 -> 22513046 (-0.46%); split: -0.46%, +0.00% No fossil-db changes on Gfx9. Suggested-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 09:44:32 -08:00
Ian Romanick	535caaf3e0	nir: Optimize uniform iadd, fadd, and ixor reduction operations This adds optimizations for iadd, fadd, and ixor with reduce, inclusive scan, and exclusive scan. NOTE: The fadd and ixor optimizations had no shader-db or fossil-db changes on any Intel platform. NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size and base-local-size.shader_test on DG2 and MTL. This is just changing the code path taken to not use whatever path was not working properly before. This is a subset of the things optimized by ACO. See also https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The min, max, iand, and ior exclusive_scan optimizations are not implemented. Broadwell on shader-db is not happy. I have not investigated. v2: Silence some warnings about discarding const. v3: Rename mbcnt to count_active_invocations. Add a big comment explaining the differences between the two paths. Suggested by Rhys. shader-db: All Gfx9 and newer platforms had similar results. (Ice Lake shown) total instructions in shared programs: 20300384 -> 20299545 (<.01%) instructions in affected programs: 19167 -> 18328 (-4.38%) helped: 35 / HURT: 0 total cycles in shared programs: 842809750 -> 842766381 (<.01%) cycles in affected programs: 2160249 -> 2116880 (-2.01%) helped: 33 / HURT: 2 total spills in shared programs: 4632 -> 4626 (-0.13%) spills in affected programs: 206 -> 200 (-2.91%) helped: 3 / HURT: 0 total fills in shared programs: 5594 -> 5581 (-0.23%) fills in affected programs: 664 -> 651 (-1.96%) helped: 3 / HURT: 1 fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Totals: Instrs: 165551893 -> 165513303 (-0.02%) Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00% Spill count: 45258 -> 45204 (-0.12%) Fill count: 74286 -> 74157 (-0.17%) Scratch Memory Size: 2467840 -> 2451456 (-0.66%) Totals from 712 (0.11% of 656120) affected shaders: Instrs: 598931 -> 560341 (-6.44%) Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04% Spill count: 983 -> 929 (-5.49%) Fill count: 2274 -> 2145 (-5.67%) Scratch Memory Size: 52224 -> 35840 (-31.37%) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 09:44:11 -08:00
Ian Romanick	f10d1ef372	nir: Initial framework for optimizing uniform subgroup operations The first commit just optimizes operation where the result of the subgroup operation is the same as each of the individual channel results. This is a subset of the things optimized by ACO. See also https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>	2024-02-27 08:38:31 -08:00

19 commits