Commit graph

11845 commits

Author SHA1 Message Date
Georg Lehmann
d77c2a1ece nir/opt_algebraic: take advantage of range helpers including nnan
No Foz-DB changes.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389>
2026-03-16 13:03:49 +00:00
Caio Oliveira
74b8fb330e spirv: Use SPDX annotations
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40285>
2026-03-13 18:20:23 +00:00
Alyssa Rosenzweig
373358da45 nir/opt_sink: sink pack_64_2x32_split
This comes up in lowered load_ubo sequences (observed in OpenCL test
test_api min_max_parameter_size). Hopefully the pack gets coalesced,
it's like nir_op_vec2 on most backends, so it should usually be ok to
sink even though the register pressure heuristic will reject it.
Allowing it to sink allows the UBO load to sink.

Intel's backend scheduler can optimize the relevant sequences locally
but there should still be a win here for global load sinking.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
2026-03-13 17:03:00 +00:00
Alyssa Rosenzweig
507e7a04bf nir/opt_sink: sink Intel UBO loads
Acts like load_ubo, handle it in the same path.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
2026-03-13 17:03:00 +00:00
Rhys Perry
3c67225afa nir/range_analysis: cache results of non-alu fp class queries
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The dense array should be much faster than the previous hash table.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
2026-03-13 15:38:55 +00:00
Rhys Perry
84eeecf822 nir/range_analysis: use a dense array
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
    -201983 +/- 1064.87
    -9.31575% +/- 0.0468505%
    (Student's t, pooled s = 1257.67)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
2026-03-13 15:38:54 +00:00
Rhys Perry
cebf60e059 nir/range_analysis: use uint16_t for sparse array elements
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
    -4484.55 +/- 1288.68
    -0.205419% +/- 0.0589514%
    (Student's t, pooled s = 1521.99)

This should also use less memory.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
2026-03-13 15:38:54 +00:00
Icenowy Zheng
514e0d7de7 glsl: support adding point size to io_lowered shaders
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Currently the fixed function vertex shader is built as io_lowered
shaders; however the gl_nir_add_point_size() function currently expects
the original shader to be not io_lowered, and this function is called to
lower the fixed function vertex shader.

Add support for adding point size store_output intrinsics for io_lowered
shaders.

This fixes fixed function rendering on Zink with a Vulkan driver w/o
VK_KHR_maintence5.

Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40373>
2026-03-13 11:44:31 +00:00
Georg Lehmann
cadc74b5e2 nir/search_helpers: assume float sources without preserve flag can't be inf/nan
For example, this should let us avoid needing one pattern with is_a_number
and one with nnan.

Foz-DB Navi48:
Totals from 3564 (3.11% of 114655) affected shaders:
Instrs: 8256755 -> 8255042 (-0.02%); split: -0.02%, +0.00%
CodeSize: 43143184 -> 43123192 (-0.05%); split: -0.05%, +0.00%
VGPRs: 268252 -> 268240 (-0.00%)
Latency: 218890225 -> 218881157 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 31044516 -> 31042297 (-0.01%); split: -0.01%, +0.00%
VClause: 96074 -> 96067 (-0.01%); split: -0.01%, +0.00%
SClause: 218042 -> 218037 (-0.00%); split: -0.00%, +0.00%
Copies: 508677 -> 508661 (-0.00%); split: -0.01%, +0.01%
Branches: 148570 -> 148569 (-0.00%)
PreSGPRs: 228110 -> 228082 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 231996 -> 231982 (-0.01%)
VALU: 4516327 -> 4515321 (-0.02%); split: -0.02%, +0.00%
SALU: 1353696 -> 1353590 (-0.01%); split: -0.01%, +0.00%
VMEM: 182189 -> 182179 (-0.01%)
SMEM: 344771 -> 344756 (-0.00%)
VOPD: 29463 -> 29438 (-0.08%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
19fa9bd152 nir/tests: test algebraic patterns with maximum fp_math_ctrl
This means we don't run into undefined behavior when testing nan/inf inputs.
Also make sure that patterns using is_only_used_as_float are signed zero correct.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
aad2b9bfc7 nir/opt_algebraic: be more strict when optimizing fcmp(a + #b, #c)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
45345de2bb glsl: make flt/fge/fabs/fneg inf preserving
More bandaid for infinity tests.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
0b51ed736d glsl: reset fp_math_ctrl when changing it per alu
I missed that the fp_math_ctrl is otherwise only reset at the next assignment.
What a strange IR.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:09 +00:00
Georg Lehmann
624313d35d nir/opt_algebraic: lower ninf fisfinite correctly
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:09 +00:00
Georg Lehmann
15eadc1253 nir/lower_frexp: preserve fp_math_ctrl
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:09 +00:00
Faith Ekstrand
f2f792996d Revert "nir: Add a type parameter to nir_lower_point_size()"
This reverts commit 6ee4ea5ea3.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
2026-03-12 22:59:13 +00:00
Faith Ekstrand
ceacec4cc9 nir: Allow 8-bit vertex output stores
These can never come from the API but there's a few cases where panvk
wants them.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
2026-03-12 22:59:13 +00:00
Mike Blumenkrantz
3dbb7e896d mesa/st: fix unlower_io_to_vars to work with mesh shaders
cc: mesa-stable

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15034
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/15040

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37408>
2026-03-12 22:02:57 +00:00
Mike Blumenkrantz
e604a8f617 nir: fix nir_is_io_compact for mesh shaders
cc: mesa-stable

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37408>
2026-03-12 22:02:57 +00:00
Daivik Bhatia
33092de196 nir: Handle format swizzles for OOB image loads
When masking out of bounds image loads, we previously returned a vector
of all zeros. However, for robustImageAccess2, depending on the format,
some components such as the alpha channel in an RGB format
should evaluate to 1.

This corrects the replacement value based on the format swizzle.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39430>
2026-03-12 19:14:24 +00:00
Georg Lehmann
9219c6bc31 nir/gather_info: use nir_intrinsic_has_io_semantics
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40338>
2026-03-12 17:00:25 +00:00
Georg Lehmann
eb111bca2c nir/opt_load_store_vectorize: use nir_intrinsic_has_align_mul
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40338>
2026-03-12 17:00:25 +00:00
Lorenzo Rossi
75425f36dc nir/opt_varyings: Skip code-motion for upconversions
Code-motion should not move back upconversions without any other
instruction, that would only increase memory pressure without any
significant performance benefit (conversions are usually cheap).
This should also help lowering mediump varyings early by not reversing
their work.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40273>
2026-03-11 23:52:10 +00:00
Mary Guillemard
73dba1e151 nir, nvk, nak: Add base to isbewr_nv and isberd_nv
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On SM86+, we can use a 16-bit unsigned offset along side the register
for it.

This adds a new base indice that will be used for it, integration with
nir_opt_offsets and a lowering pass to get ride of the base on
unsupported generations.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>
2026-03-11 19:41:34 +00:00
Mary Guillemard
6a8d09972e nir: Add isbewr_nv intrinsic and extends isberd_nv
Adds a new intrinsic allowing to do raw write in the various ISBE spaces
where attributes are stored.

This also adapt isberd_nv to map to what we have since SM70+.

This will be used to support mesh shaders.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>
2026-03-11 19:41:33 +00:00
Georg Lehmann
769606e2e6 nir/opt_fp_math_ctrl: handle input/output no_signed_zero flag
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40323>
2026-03-11 16:47:15 +00:00
Georg Lehmann
0d747eee88 nir: add no_signed_zero flag to io semantics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40323>
2026-03-11 16:47:15 +00:00
Georg Lehmann
26f5a6d6cc nir: fix nir_intrinsic_copy_const_indices for large indices
Fixes: 4ba581887e ("nir: support intrinsic indicies larger than 32 bits")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40323>
2026-03-11 16:47:15 +00:00
Faith Ekstrand
5de5987678 nir,panfrost: Move lower_bool_to_bitsize to panfrost
It's the only driver that uses the pass so it may as well go there.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40307>
2026-03-10 20:54:44 +00:00
Faith Ekstrand
3fd471dca5 nir/lower_bool_to_bitsize: Make all bN_csel sources match
Previously, we assumed that the selector for bcsel could be whatever,
regardless of the bit sizes of the data and we'd just fix it in the
back-end.  This works okay for scalars but falls over the moment we
vectorize because all our vector handling assumes bit sizes match.
Since matching bit sizes is what the hardware wants anyway, it's better
to do the right thing in NIR and hope copy-propagation can fold in
conversions if needed.

Unfortunately, copy prop isn't that smart yet so this does hurt a bit:

    Instrs: 1193679 -> 1198086 (+0.37%); split: -0.06%, +0.43%
    CodeSize: 11915136 -> 11950592 (+0.30%); split: -0.05%, +0.34%
    Full: 160985 -> 160941 (-0.03%); split: -0.04%, +0.01%
    Estimated normalized CVT cycles: 4456.938557000181 -> 4480.876069000186 (+0.54%); split: -0.13%, +0.67%
    Estimated normalized SFU cycles: 6350.9375 -> 6392.21875 (+0.65%)
    Estimated normalized Load/Store cycles: 205773.0 -> 205795.0 (+0.01%)
    Maximum number of threads: 12864 -> 12863 (-0.01%)
    Number of spill instructions: 22487 -> 22489 (+0.01%)
    Number of fill instructions: 52179 -> 52219 (+0.08%)

Hurt shaders:

    google-meet-clvk/BgBlur
    google-meet-clvk/Relight
    parallel-rdp/small_subgroup
    parallel-rdp/small_uber_subgroup

The proper solution here is to teach copy-prop about this stuff so that
it can propagate swizzles into ALU ops when they're supported:
https://gitlab.freedesktop.org/panfrost/mesa/-/issues/265

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14945
Cc: mesa-stable
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40307>
2026-03-10 20:54:43 +00:00
Lionel Landwerlin
f508c6acbb brw/nir: improve shader_indirect_data_intel handling
Use is_scalar to know if we can do transpose loading.

Also enable vectorization if 2 intrinsics share the same source (it
means the only difference is the base).

Fixes: e14d6b535c ("brw/nir: add new intrinsics to load data from the indirect address")
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>
2026-03-10 18:24:04 +00:00
Georg Lehmann
452025f75e nir: add free bits in nir_io_semantics for future use
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:22 +00:00
Georg Lehmann
a25f00eaed nir: merge xfb and xfb2 into one 64bit intrinsic index
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:22 +00:00
Georg Lehmann
4ba581887e nir: support intrinsic indicies larger than 32 bits
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:21 +00:00
Georg Lehmann
abfd6a4df9 nir: don't assume indicies are always 32bit when accessing them as raw data
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40299>
2026-03-10 07:46:20 +00:00
Georg Lehmann
aa831b6690 nir/opt_algebraic: skip more redundant alignment iand
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Useful for smaller/larger loads. Also there is no reason to be bitsize
specific here if we use an signed constant.

Foz-DB Navi48:
Totals from 8 (0.01% of 114655) affected shaders:
Instrs: 7629 -> 7612 (-0.22%)
CodeSize: 40772 -> 40692 (-0.20%)
Latency: 54880 -> 54944 (+0.12%)
InvThroughput: 8879 -> 8880 (+0.01%); split: -0.08%, +0.09%
VALU: 4029 -> 4027 (-0.05%); split: -0.15%, +0.10%
SALU: 1260 -> 1249 (-0.87%)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40292>
2026-03-10 06:57:50 +00:00
Karol Herbst
9d90cbc314 nak: add input predicate to load_global_nv and OpLd
This is new in SM75 (Turing). Let's use it because it allows us to get rid
of the if/else around bound checked global loads.

There are some changes in fossils, but it seems that's mostly due to CFG
optimizations doing things a bit differently?

Totals:
CodeSize: 9442152688 -> 9442133184 (-0.00%); split: -0.00%, +0.00%
Static cycle count: 6120910991 -> 6120907718 (-0.00%); split: -0.00%, +0.00%
Spills to reg: 184789 -> 184810 (+0.01%)
Fills from reg: 223831 -> 223860 (+0.01%); split: -0.00%, +0.01%

Totals from 334 (0.03% of 1163204) affected shaders:
CodeSize: 22020752 -> 22001248 (-0.09%); split: -0.10%, +0.01%
Static cycle count: 26582978 -> 26579705 (-0.01%); split: -0.01%, +0.00%
Spills to reg: 3110 -> 3131 (+0.68%)
Fills from reg: 3401 -> 3430 (+0.85%); split: -0.03%, +0.88%

Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
2026-03-10 00:10:05 +00:00
Georg Lehmann
6936282bd3 nir/opt_algebraic: remove min(a, >= 1.0) before fsat
Foz-DB Navi48:
Totals from 86 (0.08% of 114655) affected shaders:
Instrs: 217553 -> 217408 (-0.07%); split: -0.07%, +0.01%
CodeSize: 1159992 -> 1159380 (-0.05%); split: -0.06%, +0.01%
Latency: 1657600 -> 1657533 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 203205 -> 203178 (-0.01%); split: -0.02%, +0.00%
SClause: 5245 -> 5244 (-0.02%)
Copies: 13726 -> 13716 (-0.07%); split: -0.14%, +0.07%
VALU: 130151 -> 130039 (-0.09%); split: -0.09%, +0.00%
SALU: 26476 -> 26474 (-0.01%); split: -0.02%, +0.01%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40281>
2026-03-09 21:11:25 +00:00
Georg Lehmann
108a4d4341 nir: create more fsat using range analysis
Foz-DB Navi48:
Totals from 5922 (5.17% of 114655) affected shaders:
Instrs: 5188307 -> 5184193 (-0.08%); split: -0.09%, +0.01%
CodeSize: 27852544 -> 27843252 (-0.03%); split: -0.05%, +0.01%
Latency: 28723967 -> 28714268 (-0.03%); split: -0.04%, +0.01%
InvThroughput: 4745002 -> 4742298 (-0.06%); split: -0.07%, +0.01%
VClause: 68649 -> 68650 (+0.00%)
SClause: 103932 -> 103917 (-0.01%); split: -0.02%, +0.00%
Copies: 244683 -> 244706 (+0.01%); split: -0.01%, +0.02%
PreSGPRs: 272361 -> 272362 (+0.00%); split: -0.00%, +0.00%
VALU: 3248960 -> 3245520 (-0.11%); split: -0.11%, +0.00%
SALU: 516784 -> 516796 (+0.00%); split: -0.01%, +0.01%
VOPD: 8910 -> 8895 (-0.17%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40281>
2026-03-09 21:11:25 +00:00
Alyssa Rosenzweig
edccd06a0b nir/lower_subgroups: fix boolean clustered reductions
It is legal to have a cluster size larger than the subgroup/ballot size,
but our lowering would blow up in this case due to the nir_ishl_imm
overflowing in the lowering. Fortunately, this is easy to handle.

Fixes sub_group_clustered_reduce_logical_and()

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40224>
2026-03-09 14:50:37 +00:00
Kenneth Graunke
952bf55483 nir: Fix divergence of Intel URB input/output handle intrinsics
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Tessellation evaluation shaders have a single convergent URB handle
(for the common patch data) used by all lanes.  Every other stage's
IO handles have separate handles in each lane.

Thanks to Alyssa Rosenzweig for catching this bug.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40280>
2026-03-09 02:38:59 +00:00
Georg Lehmann
7c217e540c nir: add a pass to optimize fp_math_ctrl
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40098>
2026-03-07 08:16:27 +01:00
Georg Lehmann
f474e9853e nir: add fp class analysis tests
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:45 +00:00
Georg Lehmann
4885e5cf3a nir: remove more fsat using range analysis
Foz-DB Navi48:
Totals from 3018 (3.65% of 82636) affected shaders:
MaxWaves: 69274 -> 69280 (+0.01%)
Instrs: 7165414 -> 7157581 (-0.11%); split: -0.12%, +0.01%
CodeSize: 38890212 -> 38823132 (-0.17%); split: -0.18%, +0.00%
VGPRs: 228672 -> 228624 (-0.02%)
Latency: 64789026 -> 64784877 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 11805156 -> 11802642 (-0.02%); split: -0.02%, +0.00%
VClause: 136900 -> 136886 (-0.01%); split: -0.03%, +0.02%
SClause: 150135 -> 150130 (-0.00%); split: -0.01%, +0.01%
Copies: 574690 -> 574894 (+0.04%); split: -0.03%, +0.06%
Branches: 187169 -> 187086 (-0.04%); split: -0.04%, +0.00%
PreSGPRs: 190074 -> 190067 (-0.00%); split: -0.00%, +0.00%
PreVGPRs: 189564 -> 189538 (-0.01%); split: -0.02%, +0.00%
VALU: 3955188 -> 3949411 (-0.15%); split: -0.15%, +0.00%
SALU: 1114659 -> 1114729 (+0.01%); split: -0.02%, +0.03%
SMEM: 231080 -> 231077 (-0.00%); split: -0.00%, +0.00%
VOPD: 116150 -> 116180 (+0.03%); split: +0.04%, -0.02%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:45 +00:00
Georg Lehmann
506bb5a609 nir/search_helpers: use fp class analysis more
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:45 +00:00
Georg Lehmann
a9e75d8ee4 nir: remove nir_analyze_fp_range
Use fp class analysis instead.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Georg Lehmann
eb431efc19 nir/search_helpers: switch to fp class analysis
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Georg Lehmann
58799c4e7c nir/gather_tcs_info: use nir_analyze_fp_class directly
The information around positive one helps in theory.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Georg Lehmann
0ecf2c322e nir: add fp class analysis for fround_even
Foz-DB Navi48:
Totals from 383 (0.33% of 114655) affected shaders:
MaxWaves: 9806 -> 9808 (+0.02%)
Instrs: 502508 -> 501762 (-0.15%); split: -0.16%, +0.01%
CodeSize: 2711404 -> 2707604 (-0.14%); split: -0.15%, +0.01%
VGPRs: 24360 -> 24348 (-0.05%)
Latency: 2068105 -> 2066817 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 370962 -> 370081 (-0.24%)
VClause: 7045 -> 7041 (-0.06%)
SClause: 10551 -> 10559 (+0.08%); split: -0.08%, +0.15%
Copies: 29135 -> 29117 (-0.06%); split: -0.12%, +0.05%
Branches: 17333 -> 17328 (-0.03%)
PreSGPRs: 21511 -> 21510 (-0.00%)
PreVGPRs: 18555 -> 18545 (-0.05%)
VALU: 274445 -> 273874 (-0.21%); split: -0.21%, +0.00%
SALU: 78819 -> 78779 (-0.05%); split: -0.07%, +0.02%
VMEM: 10918 -> 10913 (-0.05%)
SMEM: 17662 -> 17656 (-0.03%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00
Georg Lehmann
7509b4a199 nir: add fp class analysis for fsub
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39987>
2026-03-07 05:01:44 +00:00