Commit graph

7272 commits

Author SHA1 Message Date
Daniel Schürmann
0089d81fb3 nir/tests: change opt_loop_peel_initial_break test to not use nir_jump_continue
We are going to disallow continue statements without
loop continue constructs.

Replaced with a test that checks that the optimization is not
applied in absense of actual work after the conditional break.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
2026-03-21 07:42:55 +00:00
Daniel Schürmann
ff8c8858dc nir/lower_goto_ifs: Add and lower loop continue constructs
We are going to disallow continue statements without
loop continue constructs.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
2026-03-21 07:42:55 +00:00
Daniel Schürmann
f159669cf3 nir/lower_continue_constructs: Remove unnecessary handling of multiple continue statements
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
2026-03-21 07:42:55 +00:00
Daniel Schürmann
31af989270 nir/lower_continue_constructs: Simplify loops before lowering continue constructs
The idea is inspired by LLVM's LoopSimplify pass. Before
lowering continue constructs, the pass now also lowers
all continue statements, leaving only the trivial continue.
This ensures that loops will always only have one back-edge.

Totals from 396 (0.47% of 84383) affected shaders: (Navi48)
Instrs: 900330 -> 899850 (-0.05%); split: -0.17%, +0.12%
CodeSize: 4727216 -> 4727508 (+0.01%); split: -0.13%, +0.13%
Latency: 7276816 -> 7097199 (-2.47%); split: -2.53%, +0.06%
InvThroughput: 1580718 -> 1558646 (-1.40%); split: -1.42%, +0.03%
VClause: 12872 -> 12879 (+0.05%); split: -0.01%, +0.06%
SClause: 22237 -> 22240 (+0.01%); split: -0.00%, +0.02%
Copies: 67359 -> 65723 (-2.43%); split: -2.56%, +0.14%
Branches: 24252 -> 24163 (-0.37%); split: -0.52%, +0.15%
PreSGPRs: 34371 -> 34399 (+0.08%)
PreVGPRs: 25268 -> 25280 (+0.05%); split: -0.00%, +0.05%
VALU: 512493 -> 511580 (-0.18%); split: -0.33%, +0.15%
SALU: 122767 -> 122993 (+0.18%); split: -0.13%, +0.32%
VMEM: 22181 -> 22213 (+0.14%)
SMEM: 41370 -> 41376 (+0.01%)

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39942>
2026-03-21 07:42:55 +00:00
Mary Guillemard
c6d8f7ce0c nir/dead_cf: Add missing load_global_nv handling
This was missing when this intrinsic was added.
Fix some issue with FSI lowering and probably more.

Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: e779538ad2 ("nir: add nvidia IO intrinsics")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40543>
2026-03-20 20:19:35 +00:00
Mary Guillemard
bb6fc8cc20 nir/dead_cf: Add missing load_global_bounded handling
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: caa0854da8 ("nir: plumb load_global_bounded")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40543>
2026-03-20 20:19:34 +00:00
Mary Guillemard
6013667d61 nir/dead_cf: Add missing load_ssbo_ir3 handling
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 0092edfec0 ("nir/dead_cf: Do not remove loops with loads that can't be reordered")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40543>
2026-03-20 20:19:34 +00:00
Connor Abbott
ec37fed52b tu, ir3, nir: Plumb through driver param for alpha-to-coverage
We will need this when alpha-to-coverage is dynamic and we need to
emulate it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39335>
2026-03-20 18:09:49 +00:00
Connor Abbott
22a061fb91 nir: Use better calculation for alpha-to-coverage mask
The old calculation depended on the sample count, and gave subpar
results for 8x MSAA with standard sample locations. The new calculation
is based on the Intel pass, with some changing of the constants so that
the sample count is always proportional to alpha for 2xMSAA and 4xMSAA
and the addition of rotating the sample mask based on the pixel.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39335>
2026-03-20 18:09:48 +00:00
Georg Lehmann
643dd510d4 nir/opt_algebraic: optimize b2f(a) * b
When the multiplication is only used by fadd, it's not a clear win
because of potential fma fusion.

Totals from 8015 (6.99% of 114655) affected shaders:
MaxWaves: 199394 -> 199466 (+0.04%); split: +0.04%, -0.01%
Instrs: 17461518 -> 17451076 (-0.06%); split: -0.10%, +0.04%
CodeSize: 94779552 -> 94769828 (-0.01%); split: -0.07%, +0.06%
VGPRs: 526012 -> 525532 (-0.09%); split: -0.10%, +0.01%
SpillSGPRs: 12466 -> 12517 (+0.41%); split: -0.09%, +0.50%
Latency: 191274766 -> 191297394 (+0.01%); split: -0.03%, +0.04%
InvThroughput: 31465968 -> 31456785 (-0.03%); split: -0.07%, +0.04%
VClause: 312081 -> 312073 (-0.00%); split: -0.10%, +0.09%
SClause: 366914 -> 366906 (-0.00%); split: -0.02%, +0.01%
Copies: 1222482 -> 1221933 (-0.04%); split: -0.20%, +0.15%
Branches: 376651 -> 376577 (-0.02%); split: -0.03%, +0.01%
PreSGPRs: 442974 -> 443240 (+0.06%); split: -0.01%, +0.07%
PreVGPRs: 415964 -> 415668 (-0.07%); split: -0.09%, +0.02%
VALU: 9403517 -> 9393916 (-0.10%); split: -0.12%, +0.02%
SALU: 2799420 -> 2800430 (+0.04%); split: -0.13%, +0.16%
VOPD: 472826 -> 472347 (-0.10%); split: +0.09%, -0.19%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
2026-03-20 08:50:41 +00:00
Georg Lehmann
d2b37b667e nir/opt_algebraic: optimize more fmulz(1.0, a) remains
If dxvk's opencoded fmulz gets partially constant folded,
it leaves this mess behind.

It's important to do this before the more general fmul+b2f patterns added
in the next commit, because they change the signed zero behavior in a way
that can't be optimized back.

Foz-DB Navi48:
Totals from 36 (0.03% of 114655) affected shaders:

Instrs: 16513 -> 15706 (-4.89%)
CodeSize: 99756 -> 95760 (-4.01%)
Latency: 45165 -> 44151 (-2.25%)
InvThroughput: 8344 -> 7886 (-5.49%)
VClause: 395 -> 401 (+1.52%)
Copies: 639 -> 634 (-0.78%)
PreSGPRs: 1158 -> 1154 (-0.35%)
PreVGPRs: 1227 -> 1225 (-0.16%)
VALU: 11310 -> 10769 (-4.78%)
SALU: 813 -> 809 (-0.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
2026-03-20 08:50:41 +00:00
Georg Lehmann
3ad142d4d7 nir/search: never insert movs for alu uses
This means we respect the pattern order better because
simple replacements like bcsel(False, a, b) -> b no longer
insert movs that can block more specialized patterns.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
2026-03-20 08:50:41 +00:00
Georg Lehmann
1626df7a90 nir: rework nir_alu_src_is_trivial_ssa to take an alu src
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
2026-03-20 08:50:41 +00:00
Georg Lehmann
b96c42c916 nir/opt_algebraic: optimize more near useless bcsel
Foz-DB Navi48:
Totals from 327 (0.29% of 114655) affected shaders:
Instrs: 732971 -> 731642 (-0.18%); split: -0.19%, +0.01%
CodeSize: 3696020 -> 3689824 (-0.17%); split: -0.17%, +0.00%
Latency: 4405319 -> 4403413 (-0.04%); split: -0.06%, +0.01%
InvThroughput: 650209 -> 649659 (-0.08%); split: -0.10%, +0.01%
Copies: 53872 -> 53736 (-0.25%); split: -0.27%, +0.02%
Branches: 15598 -> 15571 (-0.17%)
VALU: 262391 -> 261969 (-0.16%)
SALU: 268112 -> 267699 (-0.15%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40399>
2026-03-20 08:50:41 +00:00
Georg Lehmann
6cfe6eaa79 nir/opt_algebraic: create ldexp from exp2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
ldexp uses the full width VALU path, exp2 the transcendental SIMD8.

Foz-DB Navi21:
Totals from 729 (0.64% of 114627) affected shaders:
MaxWaves: 20071 -> 20103 (+0.16%); split: +0.18%, -0.02%
Instrs: 869129 -> 867654 (-0.17%); split: -0.17%, +0.00%
CodeSize: 4709000 -> 4708460 (-0.01%); split: -0.02%, +0.00%
VGPRs: 31184 -> 31128 (-0.18%); split: -0.23%, +0.05%
Latency: 7610726 -> 7597238 (-0.18%); split: -0.18%, +0.00%
InvThroughput: 1822323 -> 1819815 (-0.14%); split: -0.14%, +0.00%
VClause: 22494 -> 22493 (-0.00%); split: -0.03%, +0.02%
SClause: 20520 -> 20509 (-0.05%)
Copies: 72025 -> 72024 (-0.00%); split: -0.01%, +0.01%
Branches: 22028 -> 22029 (+0.00%)
PreVGPRs: 21601 -> 21602 (+0.00%)
VALU: 604821 -> 603339 (-0.25%); split: -0.25%, +0.00%
SALU: 114258 -> 114262 (+0.00%); split: -0.00%, +0.01%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33900>
2026-03-20 08:15:08 +00:00
Georg Lehmann
ec331cc48a nir: replace lower_ldexp with has_ldexp
I can be bothered to fix all the backends that don't set lower_ldexp,
and only two backends have ldexp anyway.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33900>
2026-03-20 08:15:08 +00:00
Faith Ekstrand
3418525a82 pan/bi: Lower VS outputs in NIR
Co-authored-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
2026-03-19 11:25:32 +00:00
Lorenzo Rossi
43ffcf06f4 pan/bi,nir: Divide memory_access from segments
Valhall removed Bifrost's memory segments and added in its place memory
access.  Those were bolted on reserved bits as "pseudo-segments" and the
emitter would catch these and emit the right memory access.  This commit
cleans it up a bit by making memory_access available directly and
exposing it to NIR (this will be useful later).

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
2026-03-19 11:25:30 +00:00
Lorenzo Rossi
c730e41ed5 pan/bi: Add is_psiz_store flag in bi_instr
This removes the previous hack that searched the psiz write by looking
for 16-bit stores with the correct pseudo segment.  We also add a new
intrinsic that mimicks global stores but tags psiz writes, this will be
used later in the series.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
2026-03-19 11:25:30 +00:00
Faith Ekstrand
de338dc908 pan,nir: Rework converted_mem_pan intrinsics
First, rename them to make them a bit more clear.  They act on global
memory so they should be _global and they map to ld/st_cvt so so _cvt is
nice and obvious.  Second, they don't need IO semantics as they're not
IO.  But they do need ACCESS so that we can better control things like
CAN_REORDER.  Third, add a src_type to store_global_cvt even though it
won't be used just yet because we'll want it for lowering VS stores.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
2026-03-19 11:25:29 +00:00
Faith Ekstrand
d2f430bea9 pan/bi: Add new FS input load intrinsics
Unlike load[_interpolated]_input, which has to deal with all sorts of
ABI nonsense between driver and compiler, these new intrinsics are
dumber than bricks.  They're literally just the HW ops as NIR
intrinsics.  These will allow us do the lowering in NIR and put the
driver in total control over what goes down what path.  Among other
things, a driver could choose to lower some things to ld_var and others
to ld_var_buf.

Co-authored-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
2026-03-19 11:25:28 +00:00
Georg Lehmann
57c05f72f9 nir/opt_large_constants: only use 16bit float alu when supported
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:18 +00:00
Georg Lehmann
5f37788ae9 nir/opt_large_constants: handle floating point power of two fractions
Foz-DB Navi48:
Totals from 365 (0.32% of 114655) affected shaders:
MaxWaves: 10020 -> 10016 (-0.04%)
Instrs: 486252 -> 486097 (-0.03%); split: -0.21%, +0.18%
CodeSize: 2629536 -> 2628452 (-0.04%); split: -0.19%, +0.14%
VGPRs: 19884 -> 19896 (+0.06%); split: -0.06%, +0.12%
SpillSGPRs: 210 -> 212 (+0.95%)
Latency: 3818610 -> 3765549 (-1.39%); split: -1.50%, +0.11%
InvThroughput: 598445 -> 596281 (-0.36%); split: -0.58%, +0.22%
VClause: 10053 -> 9698 (-3.53%); split: -3.54%, +0.01%
SClause: 17548 -> 17334 (-1.22%); split: -1.24%, +0.02%
Copies: 43196 -> 42249 (-2.19%); split: -2.34%, +0.14%
Branches: 16695 -> 16628 (-0.40%); split: -0.47%, +0.07%
PreSGPRs: 17988 -> 17971 (-0.09%)
PreVGPRs: 13552 -> 13520 (-0.24%)
VALU: 244842 -> 246611 (+0.72%); split: -0.02%, +0.74%
SALU: 79163 -> 77778 (-1.75%); split: -2.05%, +0.30%
VMEM: 13468 -> 13084 (-2.85%)
SMEM: 23571 -> 23393 (-0.76%)
VOPD: 8384 -> 8372 (-0.14%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:18 +00:00
Georg Lehmann
372c1a23dc nir/opt_large_constants: support negative small constants
Foz-DB Navi48:
Totals from 511 (0.45% of 114655) affected shaders:
MaxWaves: 14554 -> 14552 (-0.01%)
Instrs: 767577 -> 768334 (+0.10%); split: -0.17%, +0.27%
CodeSize: 4171036 -> 4181400 (+0.25%); split: -0.10%, +0.35%
VGPRs: 27676 -> 27724 (+0.17%)
SpillSGPRs: 144 -> 183 (+27.08%)
Latency: 4053919 -> 4027092 (-0.66%); split: -0.88%, +0.22%
InvThroughput: 817990 -> 819490 (+0.18%); split: -0.21%, +0.39%
VClause: 11573 -> 11172 (-3.46%); split: -3.47%, +0.01%
SClause: 14418 -> 14579 (+1.12%); split: -0.46%, +1.57%
Copies: 71638 -> 71365 (-0.38%); split: -1.54%, +1.16%
Branches: 20212 -> 20425 (+1.05%); split: -0.39%, +1.44%
PreSGPRs: 21765 -> 21743 (-0.10%); split: -0.23%, +0.12%
PreVGPRs: 19475 -> 19307 (-0.86%); split: -0.91%, +0.05%
VALU: 411365 -> 413642 (+0.55%); split: -0.02%, +0.57%
SALU: 126940 -> 125411 (-1.20%); split: -1.53%, +0.32%
VMEM: 20574 -> 20062 (-2.49%)
SMEM: 23724 -> 23677 (-0.20%); split: -0.25%, +0.05%
VOPD: 19838 -> 19847 (+0.05%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:18 +00:00
Georg Lehmann
a9f3efcae0 nir/opt_large_constants: optimize small vector constant arrays
Foz-DB Navi48:
Totals from 2956 (2.58% of 114655) affected shaders:
MaxWaves: 85080 -> 85110 (+0.04%)
Instrs: 5167735 -> 5170572 (+0.05%); split: -0.12%, +0.17%
CodeSize: 28882716 -> 28867340 (-0.05%); split: -0.14%, +0.08%
VGPRs: 164484 -> 164616 (+0.08%); split: -0.09%, +0.18%
SpillSGPRs: 612 -> 611 (-0.16%)
Latency: 35017837 -> 34391146 (-1.79%); split: -1.80%, +0.01%
InvThroughput: 6336245 -> 6323807 (-0.20%); split: -0.49%, +0.29%
VClause: 112504 -> 111117 (-1.23%); split: -1.32%, +0.09%
SClause: 121125 -> 117618 (-2.90%); split: -3.04%, +0.15%
Copies: 392203 -> 384977 (-1.84%); split: -1.88%, +0.04%
Branches: 155578 -> 155376 (-0.13%); split: -0.13%, +0.01%
PreSGPRs: 127654 -> 127205 (-0.35%); split: -0.39%, +0.04%
PreVGPRs: 112486 -> 112449 (-0.03%); split: -0.04%, +0.00%
VALU: 2577362 -> 2586379 (+0.35%); split: -0.00%, +0.35%
SALU: 889569 -> 888472 (-0.12%); split: -1.01%, +0.89%
VMEM: 167203 -> 165750 (-0.87%)
SMEM: 190438 -> 187313 (-1.64%)
VOPD: 194411 -> 194344 (-0.03%); split: +0.01%, -0.04%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:18 +00:00
Georg Lehmann
f782524c36 nir/opt_large_constants: enable small constant optimization for non trivial strides
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:17 +00:00
Georg Lehmann
568b96f8b2 nir/opt_large_constants: set fp_math_ctrl for bit exact results
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:17 +00:00
Georg Lehmann
e810382a1e nir/opt_large_constants: don't add constants implemented with ALU to the constant data
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:16 +00:00
Konstantin Seurer
581df90a89 nir/tests: Test nir_opt_large_constants
Tests a whole bunch of cases that can be turned into literals.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33002>
2026-03-19 06:59:16 +00:00
Caio Oliveira
a2cbdfbde3 nir: Add intrinsics for ShuffleUpINTEL and ShuffleDownINTEL
Move lowering to nir_lower_subgroups.  At some point Intel
backend might want to skip that and lower at the backend IR
boundary, but for now lowering always applies.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40376>
2026-03-17 17:21:52 +00:00
Erik Faye-Lund
5127568b98 compiler/nir: use common ycbcr math
Let's use the common code, so we have a single place to update in case
we want to add features etc.

Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40175>
2026-03-17 15:00:54 +00:00
Mike Blumenkrantz
fbf3305c1b nir/print: print per_vertex for variables
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40412>
2026-03-16 14:42:11 +00:00
Georg Lehmann
85021cb5f0 nir/algebraic/tests: invert all excluded fp_math_ctrl flags
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Randomly thought about that, but of course only after marge was already done.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389>
2026-03-16 13:03:50 +00:00
Georg Lehmann
98ff0a394a nir/opt_algebraic: move some fsat patterns next to the other fsat patterns
I almost missed that they already exist multiple times.

No Foz-DB chagnes.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389>
2026-03-16 13:03:50 +00:00
Georg Lehmann
607f26814f nir/opt_algebraic: remove manual patterns that optimizes flt([0.0, 1.0], 0.0)
Range analysis can figure this out.

No Foz-DB changes.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389>
2026-03-16 13:03:50 +00:00
Georg Lehmann
530bb4278c nir/opt_algebraic: remove manual pattern that removes fmax(..., 0.0)
Range analysis will figure this out.

No Foz-DB changes.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389>
2026-03-16 13:03:50 +00:00
Georg Lehmann
4d176c8ea5 nir/opt_algebraic: turn fabs(a) into fneg(a) if a is not positive
fneg is usually more optimizable.

Foz-DB Navi48:
Totals from 214 (0.19% of 114655) affected shaders:
Instrs: 694279 -> 694155 (-0.02%); split: -0.02%, +0.00%
CodeSize: 3749268 -> 3748024 (-0.03%); split: -0.03%, +0.00%
VGPRs: 18252 -> 18264 (+0.07%)
Latency: 5453691 -> 5453503 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 1024436 -> 1024314 (-0.01%); split: -0.01%, +0.00%
VALU: 453136 -> 453041 (-0.02%); split: -0.02%, +0.00%

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389>
2026-03-16 13:03:50 +00:00
Georg Lehmann
d77c2a1ece nir/opt_algebraic: take advantage of range helpers including nnan
No Foz-DB changes.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40389>
2026-03-16 13:03:49 +00:00
Alyssa Rosenzweig
373358da45 nir/opt_sink: sink pack_64_2x32_split
This comes up in lowered load_ubo sequences (observed in OpenCL test
test_api min_max_parameter_size). Hopefully the pack gets coalesced,
it's like nir_op_vec2 on most backends, so it should usually be ok to
sink even though the register pressure heuristic will reject it.
Allowing it to sink allows the UBO load to sink.

Intel's backend scheduler can optimize the relevant sequences locally
but there should still be a win here for global load sinking.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
2026-03-13 17:03:00 +00:00
Alyssa Rosenzweig
507e7a04bf nir/opt_sink: sink Intel UBO loads
Acts like load_ubo, handle it in the same path.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40267>
2026-03-13 17:03:00 +00:00
Rhys Perry
3c67225afa nir/range_analysis: cache results of non-alu fp class queries
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The dense array should be much faster than the previous hash table.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
2026-03-13 15:38:55 +00:00
Rhys Perry
84eeecf822 nir/range_analysis: use a dense array
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
    -201983 +/- 1064.87
    -9.31575% +/- 0.0468505%
    (Student's t, pooled s = 1257.67)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
2026-03-13 15:38:54 +00:00
Rhys Perry
cebf60e059 nir/range_analysis: use uint16_t for sparse array elements
ministat (nir_analyze_fp_class):
Difference at 95.0% confidence
    -4484.55 +/- 1288.68
    -0.205419% +/- 0.0589514%
    (Student's t, pooled s = 1521.99)

This should also use less memory.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40346>
2026-03-13 15:38:54 +00:00
Georg Lehmann
cadc74b5e2 nir/search_helpers: assume float sources without preserve flag can't be inf/nan
For example, this should let us avoid needing one pattern with is_a_number
and one with nnan.

Foz-DB Navi48:
Totals from 3564 (3.11% of 114655) affected shaders:
Instrs: 8256755 -> 8255042 (-0.02%); split: -0.02%, +0.00%
CodeSize: 43143184 -> 43123192 (-0.05%); split: -0.05%, +0.00%
VGPRs: 268252 -> 268240 (-0.00%)
Latency: 218890225 -> 218881157 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 31044516 -> 31042297 (-0.01%); split: -0.01%, +0.00%
VClause: 96074 -> 96067 (-0.01%); split: -0.01%, +0.00%
SClause: 218042 -> 218037 (-0.00%); split: -0.00%, +0.00%
Copies: 508677 -> 508661 (-0.00%); split: -0.01%, +0.01%
Branches: 148570 -> 148569 (-0.00%)
PreSGPRs: 228110 -> 228082 (-0.01%); split: -0.01%, +0.00%
PreVGPRs: 231996 -> 231982 (-0.01%)
VALU: 4516327 -> 4515321 (-0.02%); split: -0.02%, +0.00%
SALU: 1353696 -> 1353590 (-0.01%); split: -0.01%, +0.00%
VMEM: 182189 -> 182179 (-0.01%)
SMEM: 344771 -> 344756 (-0.00%)
VOPD: 29463 -> 29438 (-0.08%)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
19fa9bd152 nir/tests: test algebraic patterns with maximum fp_math_ctrl
This means we don't run into undefined behavior when testing nan/inf inputs.
Also make sure that patterns using is_only_used_as_float are signed zero correct.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
aad2b9bfc7 nir/opt_algebraic: be more strict when optimizing fcmp(a + #b, #c)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:10 +00:00
Georg Lehmann
624313d35d nir/opt_algebraic: lower ninf fisfinite correctly
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:09 +00:00
Georg Lehmann
15eadc1253 nir/lower_frexp: preserve fp_math_ctrl
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40291>
2026-03-13 07:13:09 +00:00
Faith Ekstrand
f2f792996d Revert "nir: Add a type parameter to nir_lower_point_size()"
This reverts commit 6ee4ea5ea3.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
2026-03-12 22:59:13 +00:00
Faith Ekstrand
ceacec4cc9 nir: Allow 8-bit vertex output stores
These can never come from the API but there's a few cases where panvk
wants them.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38681>
2026-03-12 22:59:13 +00:00