Commit graph

19 commits

Author SHA1 Message Date
Georg Lehmann
a706769a0b nir: move exact bit to nir_fp_math_control
Unifies nir per instruction float control.

In the future this can be split into contract/reassoc/transform
like SPIR-V.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (except SPIR-V)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39103>
2026-01-07 09:40:57 +00:00
Georg Lehmann
369a3b22b4 nir/opt_uniform_subgroup: optimize uniform ddx/ddy
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We can't just use 0.0 as the replacement because of NaN/Inf.
But turning the intrinsic into a simple fsub should still be better
or at least equal.

Foz-DB Navi48:
Totals from 128 (0.10% of 125402) affected shaders:
MaxWaves: 3684 -> 3708 (+0.65%)
Instrs: 111150 -> 111055 (-0.09%); split: -0.20%, +0.11%
CodeSize: 587176 -> 590800 (+0.62%); split: -0.01%, +0.63%
VGPRs: 6540 -> 6480 (-0.92%)
Latency: 382775 -> 383332 (+0.15%); split: -0.15%, +0.29%
InvThroughput: 80909 -> 80530 (-0.47%); split: -0.51%, +0.04%
VClause: 1433 -> 1430 (-0.21%)
SClause: 1834 -> 1841 (+0.38%); split: -0.11%, +0.49%
Copies: 6130 -> 6096 (-0.55%); split: -1.29%, +0.73%
PreSGPRs: 7352 -> 7356 (+0.05%)
PreVGPRs: 4797 -> 4721 (-1.58%)
VALU: 71892 -> 71435 (-0.64%); split: -0.64%, +0.01%
SALU: 12665 -> 13056 (+3.09%); split: -0.06%, +3.14%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39112>
2026-01-01 08:43:55 +00:00
Georg Lehmann
71f0c0d6a6 nir/opt_uniform_subgroup: optimize add/xor reduce of bcsel(div, con, con)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Foz-DB Navi48:
Totals from 12 (0.01% of 97623) affected shaders:
Instrs: 9207 -> 8973 (-2.54%)
CodeSize: 54192 -> 52832 (-2.51%)
VGPRs: 768 -> 480 (-37.50%)
Latency: 39516 -> 38507 (-2.55%)
InvThroughput: 10155 -> 9859 (-2.91%)
PreSGPRs: 329 -> 332 (+0.91%)
PreVGPRs: 268 -> 263 (-1.87%)
VALU: 4393 -> 4257 (-3.10%)
SALU: 1037 -> 1019 (-1.74%)
VOPD: 602 -> 599 (-0.50%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>
2025-12-19 20:23:23 +00:00
Georg Lehmann
0e5e1cb9b0 nir/opt_uniform_subgroup: optimize min/max/and/or reduce of bcsel(div, con, con)
Foz-DB Navi48:
Totals from 1 (0.00% of 97397) affected shaders:
Instrs: 1848 -> 1834 (-0.76%)
CodeSize: 9996 -> 9908 (-0.88%)
VGPRs: 96 -> 72 (-25.00%)
Latency: 17371 -> 17358 (-0.07%)
Copies: 190 -> 191 (+0.53%)
PreVGPRs: 43 -> 41 (-4.65%)
VALU: 657 -> 648 (-1.37%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38974>
2025-12-19 20:23:23 +00:00
Georg Lehmann
621465e417 nir/opt_uniform_subgroup: handle more trivial shuffles/votes
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
e648e551c1 nir/opt_uniform_subgroup: wire up mbcnt_amd path
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
5778436e99 nir/opt_uniform_subgroup: use nir_shader_intrinsics_pass
Nothing here needs the recursion of the full lower_instructions pass.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
1fc38d8539 nir/opt_uniform_subgroup: fix swizzle_amd without fetch_inactive
Fixes: ad5be40303 ("nir: add fetch inactive index to quad_swizzle_amd/masked_swizzle_amd")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
e11d7f06d0 nir/opt_uniform_subgroup: don't try to optimize non trivial clustered reduce
Fixes: 535caaf3e0 ("nir: Optimize uniform iadd, fadd, and ixor reduction operations")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38867>
2025-12-10 13:32:08 +00:00
Georg Lehmann
f8633511be nir: make ballot find_lsb/msb/bit_count 32bit only
The lowering is 32bit only too.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37178>
2025-09-04 14:03:58 +00:00
Job Noorman
ae66bd1c00 nir/opt_uniform_subgroup: use ballot_bit_count
Using bit_count on the result of ballot doesn't work for targets where
ballot's num_components > 1.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Fixes: d2e1e4442a ("ir3: enable nir_opt_uniform_subgroup")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35669>
2025-08-05 17:09:27 +00:00
Mel Henning
8795006994 nir/opt_uniform_subgroup: Handle vote_feq
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Brings the vertex shader in
dEQP-VK.subgroups.vote.framebuffer.subgroupallequal_dvec4_vertex
from 234 to 169 instructions on NAK.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
2025-06-28 16:10:50 +00:00
Mel Henning
70fccc59fc nir/opt_uniform_subgroup: Handle vote_ieq
No shader-db changes here, but it does improve some cts shaders, eg. the
vertex shader in
dEQP-VK.subgroups.vote.framebuffer.subgroupallequal_i64vec4_vertex
goes from 80 to 56 instructions with NAK

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
2025-06-28 16:10:50 +00:00
Alyssa Rosenzweig
91872c9c51 nir: clang-format
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
2025-02-26 15:19:53 +00:00
Daniel Schürmann
86fd673ade nir: require nir_metadata_divergence if needed
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30814>
2025-02-13 10:08:43 +00:00
Daniel Schürmann
c8348139fd nir: change signature of nir_src_is_divergent()
Now, it takes nir_src * instead of nir_src.
Also move the implementation to nir_divergence_analysis.c.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
2024-10-24 10:06:17 +00:00
Ian Romanick
a2292f53b5 nir: Optimize uniform vote_all and vote_any
No shader-db changes on any Intel platform.

fossil-db:

All Ice Lake and newer platforms had similar results. (Ice Lake)
Totals:
Instrs: 165513303 -> 165511820 (-0.00%)
Cycles: 15125314947 -> 15125211500 (-0.00%); split: -0.00%, +0.00%

Totals from 82 (0.01% of 656120) affected shaders:
Instrs: 544627 -> 543144 (-0.27%)
Cycles: 22616493 -> 22513046 (-0.46%); split: -0.46%, +0.00%

No fossil-db changes on Gfx9.

Suggested-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 09:44:32 -08:00
Ian Romanick
535caaf3e0 nir: Optimize uniform iadd, fadd, and ixor reduction operations
This adds optimizations for iadd, fadd, and ixor with reduce,
inclusive scan, and exclusive scan.

NOTE: The fadd and ixor optimizations had no shader-db or fossil-db
changes on any Intel platform.

NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size
and base-local-size.shader_test on DG2 and MTL. This is just changing
the code path taken to not use whatever path was not working properly
before.

This is a subset of the things optimized by ACO. See also
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The
min, max, iand, and ior exclusive_scan optimizations are not
implemented.

Broadwell on shader-db is not happy. I have not investigated.

v2: Silence some warnings about discarding const.

v3: Rename mbcnt to count_active_invocations. Add a big comment
explaining the differences between the two paths. Suggested by Rhys.

shader-db:

All Gfx9 and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300384 -> 20299545 (<.01%)
instructions in affected programs: 19167 -> 18328 (-4.38%)
helped: 35 / HURT: 0

total cycles in shared programs: 842809750 -> 842766381 (<.01%)
cycles in affected programs: 2160249 -> 2116880 (-2.01%)
helped: 33 / HURT: 2

total spills in shared programs: 4632 -> 4626 (-0.13%)
spills in affected programs: 206 -> 200 (-2.91%)
helped: 3 / HURT: 0

total fills in shared programs: 5594 -> 5581 (-0.23%)
fills in affected programs: 664 -> 651 (-1.96%)
helped: 3 / HURT: 1

fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165551893 -> 165513303 (-0.02%)
Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00%
Spill count: 45258 -> 45204 (-0.12%)
Fill count: 74286 -> 74157 (-0.17%)
Scratch Memory Size: 2467840 -> 2451456 (-0.66%)

Totals from 712 (0.11% of 656120) affected shaders:
Instrs: 598931 -> 560341 (-6.44%)
Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04%
Spill count: 983 -> 929 (-5.49%)
Fill count: 2274 -> 2145 (-5.67%)
Scratch Memory Size: 52224 -> 35840 (-31.37%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 09:44:11 -08:00
Ian Romanick
f10d1ef372 nir: Initial framework for optimizing uniform subgroup operations
The first commit just optimizes operation where the result of the
subgroup operation is the same as each of the individual channel
results.

This is a subset of the things optimized by ACO. See also
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:38:31 -08:00