Commit graph

7455 commits

Author SHA1 Message Date
squidbus
5b34d1ff34 nir: Only attempt subgroups lower_boolean_reduce for single component.
lower_boolean_reduce only works if the number of components is 1, and even
asserts on this in its prologue. Otherwise, given a boolean vector type, it
may produce output using ballot/vote with a boolean vector input.

Acked-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41186>
2026-05-11 09:50:27 +00:00
Daniel Schürmann
0832f3251c nir/opt_algebraic: extend some extract_u8 pattern to extract_i8
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
and remove some duplicate extract pattern.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41385>
2026-05-09 21:23:40 +00:00
Daniel Schürmann
9895b5e5da nir/opt_algebraic: optimize downcast followed by upcast to extract
Totals from 217 (0.10% of 208640) affected shaders: (Navi48)

Instrs: 283561 -> 282870 (-0.24%)
CodeSize: 1604864 -> 1601136 (-0.23%); split: -0.24%, +0.01%
Latency: 2992301 -> 2990107 (-0.07%); split: -0.09%, +0.02%
InvThroughput: 602722 -> 601316 (-0.23%); split: -0.23%, +0.00%
Copies: 26490 -> 26471 (-0.07%); split: -0.10%, +0.03%
VALU: 147735 -> 147176 (-0.38%)
SALU: 51545 -> 51541 (-0.01%)
VOPD: 11140 -> 11204 (+0.57%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41385>
2026-05-09 21:23:40 +00:00
Georg Lehmann
1716cbff37 nir,amd: reassociate fadd to create more fma/mad
ACO's backend fusing is quite competent, but it cannot reorder adds.
This adds a simple algebraic pass to do that for us.

Foz-DB Navi10:
Totals from 13568 (18.76% of 72319) affected shaders:
MaxWaves: 304722 -> 304004 (-0.24%); split: +0.10%, -0.33%
Instrs: 15084252 -> 14993010 (-0.60%); split: -0.61%, +0.00%
CodeSize: 81480188 -> 81372600 (-0.13%); split: -0.17%, +0.04%
VGPRs: 741580 -> 743680 (+0.28%); split: -0.10%, +0.38%
SpillSGPRs: 9418 -> 9434 (+0.17%)
Latency: 154602014 -> 154312940 (-0.19%); split: -0.29%, +0.10%
InvThroughput: 44628554 -> 44442595 (-0.42%); split: -0.47%, +0.05%
VClause: 300035 -> 300054 (+0.01%); split: -0.31%, +0.31%
SClause: 370992 -> 370640 (-0.09%); split: -0.15%, +0.06%
Copies: 1162401 -> 1162800 (+0.03%); split: -0.30%, +0.33%
Branches: 300646 -> 300654 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 673675 -> 675057 (+0.21%); split: -0.00%, +0.21%
PreVGPRs: 633017 -> 634768 (+0.28%); split: -0.29%, +0.57%
VALU: 10800351 -> 10712041 (-0.82%); split: -0.82%, +0.00%
SALU: 1752917 -> 1753203 (+0.02%); split: -0.04%, +0.06%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41348>
2026-05-08 11:49:43 +00:00
Georg Lehmann
9e87090db4 nir/loop_analyze: do not count fmul towards the limit when only used by fadd
As always with loop unrolling, don't look too closely at stats, but they
confirm more loops are now unrolled.

Foz-DB Navi10:
Totals from 66 (0.09% of 72319) affected shaders:
MaxWaves: 1464 -> 1424 (-2.73%); split: +0.82%, -3.55%
Instrs: 101778 -> 173128 (+70.10%)
CodeSize: 544148 -> 905392 (+66.39%)
VGPRs: 3652 -> 3788 (+3.72%); split: -0.77%, +4.49%
SpillSGPRs: 105 -> 75 (-28.57%)
Latency: 1197088 -> 1033471 (-13.67%); split: -17.08%, +3.41%
InvThroughput: 315257 -> 293245 (-6.98%); split: -13.29%, +6.31%
VClause: 1663 -> 3057 (+83.82%); split: -0.12%, +83.94%
SClause: 2797 -> 4496 (+60.74%); split: -0.21%, +60.96%
Copies: 6472 -> 11219 (+73.35%); split: -0.08%, +73.42%
Branches: 2695 -> 4697 (+74.29%); split: -0.56%, +74.84%
PreSGPRs: 3418 -> 3619 (+5.88%); split: -0.79%, +6.67%
PreVGPRs: 3305 -> 3423 (+3.57%); split: -1.06%, +4.63%
VALU: 73061 -> 124934 (+71.00%)
SALU: 11775 -> 20803 (+76.67%); split: -0.99%, +77.66%
VMEM: 2729 -> 4627 (+69.55%)
SMEM: 3796 -> 5869 (+54.61%); split: -0.18%, +54.79%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41348>
2026-05-08 11:49:43 +00:00
Georg Lehmann
25add9cbd1 nir/opt_peephole_select: do not count fmul towards the limit when only used by fadd
Foz-DB Navi10:
Totals from 4077 (5.64% of 72319) affected shaders:
MaxWaves: 84057 -> 83325 (-0.87%); split: +0.07%, -0.94%
Instrs: 6019711 -> 6007338 (-0.21%); split: -0.27%, +0.07%
CodeSize: 32373984 -> 32356152 (-0.06%); split: -0.18%, +0.13%
VGPRs: 236588 -> 238172 (+0.67%); split: -0.05%, +0.72%
SpillSGPRs: 7341 -> 7367 (+0.35%); split: -0.65%, +1.01%
Latency: 61833147 -> 61386674 (-0.72%); split: -0.91%, +0.19%
InvThroughput: 22328993 -> 22364077 (+0.16%); split: -0.16%, +0.32%
VClause: 97803 -> 97832 (+0.03%); split: -0.29%, +0.32%
SClause: 147544 -> 146274 (-0.86%); split: -1.19%, +0.33%
Copies: 606083 -> 593887 (-2.01%); split: -2.27%, +0.26%
Branches: 171344 -> 164203 (-4.17%); split: -4.17%, +0.00%
PreSGPRs: 234116 -> 234922 (+0.34%); split: -0.17%, +0.52%
PreVGPRs: 211250 -> 211374 (+0.06%); split: -0.00%, +0.06%
VALU: 4130666 -> 4132669 (+0.05%); split: -0.11%, +0.16%
SALU: 854007 -> 852585 (-0.17%); split: -0.77%, +0.61%
VMEM: 162718 -> 162755 (+0.02%); split: -0.00%, +0.03%
SMEM: 237856 -> 236323 (-0.64%); split: -0.65%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41348>
2026-05-08 11:49:43 +00:00
Georg Lehmann
0dd50a426e nir: fix fp_math_ctrl in fisnan
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Otherwise, nir_opt_algebraic will replace it with false.

Fixes: 63d199a01e ("nir: remove special fp_math_ctrl rules")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41420>
2026-05-08 08:20:16 +00:00
Kenneth Graunke
4018aea9fa nir: Set FRAG_RESULT_DUAL_SRC_BLEND in outputs_written when lowering
Detecting dual source blending is currently annoying: you can either
look at info->fs.color_is_dual_source, or FRAG_RESULT_DUAL_SRC_BLEND
being in the info->outputs_written bitfield.

The former is only set if nir_shader_gather_info runs prior to
nir_lower_io lowering it to FRAG_RESULT_DUAL_SRC_BLEND.

The latter is only set if nir_shader_gather_info runs after the
nir_lower_io lowering.

Just make the IO lowering also set the outputs_written flag so if
you're trying to use FRAG_RESULT_DUAL_SRC_BLEND, you can always
check outputs_written without worrying about pass ordering.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
2026-05-07 08:29:40 +00:00
Karol Herbst
3df48dec23 nir/lower_cl_images: call nir_progress on every function
llvmpipe supports real function calls, so we need to call nir_progress on
every function, not just the entry point.

Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41404>
2026-05-07 05:35:12 +00:00
Pavel Ondračka
0f75fa5bfd nir/tests: add partial unroll OOB tests
Assisted-by: OpenAI Codex (GPT-5.5)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41203>
2026-05-06 20:08:13 +00:00
Pavel Ondračka
e517e3da0b nir/tests: add helpers for counting used/unused instructions
Assisted-by: OpenAI Codex (GPT-5.5)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41203>
2026-05-06 20:08:12 +00:00
Pavel Ondračka
959f59b3f0 nir: fix partial loop unroll OOB check for loops not starting at 0
is_access_out_of_bounds() decides whether the residual loop (created
by partial_unroll) will access arrays out of bounds by checking whether
array_length is less than or equal to trip_count. That assumes the
induction variable starts at 0. For example glamor gradient shader
shader-db/shaders/glamor/4.shader_test:

   uniform float stops[18];
   for (i = 1; i < n_stop; i++)
      if (stop_len < stops[i]) break;

trip_count is guessed as 17 from the array indexing, so the residual
loop's index begins at 18, out of bounds for the 18-element array, yet
18 <= 17 is false, so the OOB removal is skipped and the residual loop
is not eliminated.

Correctly consider the start value for the OOB check. This lets glamor
gradient shaders with loops starting at i=1 unroll the same way as i=0
loops.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41203>
2026-05-06 20:08:12 +00:00
Rhys Perry
ec59b59b97 nir: rename nir_src_parent_instr to nir_src_use_instr
sed -i "s/nir_src_parent_instr/nir_src_use_instr/" `find ./ -type f`
sed -i "s/nir_src_parent_if/nir_src_use_if/" `find ./ -type f`
sed -i "s/nir_src_set_parent/nir_src_set_use/" `find ./ -type f`

There are two kinds of "parent" in relation to a src/def:
- the instruction where the def or src's def is defined
- the instruction which the src is a part of and where the def is used

Clarify that the parent here is where the src's def is used, not where
it's defined.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41344>
2026-05-06 17:09:22 +00:00
Lionel Landwerlin
c30a4d4fdb anv/brw/nir: fix wa_18019110168
Several things were wrong :
  - incorrect offset in the FS push constant data
  - incorrect encoding of the 32bit values with 2 fields (remap table offset & provoking vertex)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
2026-05-06 09:49:41 +00:00
Job Noorman
0703f27d6a nir/opt_offsets: add support for @load/store_global_ir3
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41342>
2026-05-05 06:25:49 +02:00
Job Noorman
c784af5ca0 ir3: always use byte offset for @load/store_global_ir3
Before a7xx, ldg/stg.a use an offset in units of their type size while
on a7xx and later, the offset is always in bytes. Currently,
@load/store_global_ir3 take their offset in dwords (32-bits). This has a
few downsides: offsets need an extra shl during codegen on a7xx and
addressing sub-dword-aligned addresses is only possible by doing 64-bit
math on the base address.

Improve the situation by always using a byte offset for
@load/store_global_ir3 and adding the offset_shift index to support type
units pre-a7xx. While we're at it, add the base index as well to support
all ldg/stg.g features in @load/store_global_ir3.

Supporting these renewed intrinsics consists of two parts:
- ir3_nir_lower_io_offsets legalizes the offset_shift on a6xx: for
  ldg.a/stg.a, the offset has to be in units of the type size so extra
  shifts are inserted to accomplish this if necessary. On a7xx, offsets
  are always in bytes so nothing needs to be done.
- The intrinsics are emitted as ldg/stg if the offset is a small enough
  constant and as ldg.a/stg.a otherwise. a6xx supports an extra shift
  for ldg.a/stg.a that only applies to the GPR offset (not the immediate
  base); NIR is pattern matched at this point to extract this if
  possible.

All users of @load/store_global_ir3 are updated to generate the offset
in units of bytes. ir3_nir_analyze_ubo_ranges is updated to take the new
offset_shift into account.

Totals from 2029 (1.15% of 176266) affected shaders:
MaxWaves: 26728 -> 26660 (-0.25%); split: +0.01%, -0.26%
Instrs: 1314089 -> 1278603 (-2.70%); split: -2.72%, +0.02%
CodeSize: 2739108 -> 2633236 (-3.87%); split: -3.87%, +0.01%
NOPs: 197537 -> 200843 (+1.67%); split: -1.62%, +3.30%
MOVs: 43771 -> 44025 (+0.58%); split: -1.11%, +1.69%
Full: 31849 -> 31948 (+0.31%); split: -0.03%, +0.34%
(ss): 37965 -> 42027 (+10.70%); split: -3.47%, +14.17%
(sy): 13752 -> 13566 (-1.35%); split: -4.04%, +2.68%
(ss)-stall: 154238 -> 170353 (+10.45%); split: -1.72%, +12.16%
(sy)-stall: 804442 -> 806518 (+0.26%); split: -4.65%, +4.91%
Preamble Instrs: 326728 -> 293488 (-10.17%)
Cat0: 217926 -> 220947 (+1.39%); split: -1.58%, +2.96%
Cat1: 50182 -> 50446 (+0.53%); split: -0.97%, +1.49%
Cat2: 460987 -> 452101 (-1.93%); split: -2.26%, +0.33%
Cat3: 390696 -> 361271 (-7.53%)
Cat7: 39148 -> 38688 (-1.18%); split: -1.24%, +0.06%

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41342>
2026-05-05 06:25:49 +02:00
Job Noorman
53d96aed05 nir/get_io_offset_src_number: support @load/store_global_ir3
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41342>
2026-05-05 06:25:49 +02:00
Faith Ekstrand
84bbfaa7e5 pan/bi: Delete the old texel buffer intrinsics
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
2026-05-05 01:27:16 +00:00
Faith Ekstrand
7d5cb2884c pan/bi: Allow setting the table on lea_attr_pan
Also allow us to set AUTO32 while we're at it.

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
2026-05-05 01:27:16 +00:00
Faith Ekstrand
2369808cd1 pan,nir: Add Bifrost texturing intrinsics
These are funky enough that they make more sense as intrinsics than
texture opcodes.

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
2026-05-05 01:27:16 +00:00
Faith Ekstrand
0d549f5bde nir: Add a new nir_op_f2u32_rtne
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
2026-05-05 01:27:16 +00:00
Faith Ekstrand
58cba7887a nir: Add a new nir_texop_gradient_pan
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
2026-05-05 01:27:16 +00:00
Faith Ekstrand
e0fffabda7 nir/builder: Allow backend1/2 in nir_build_tex()
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
2026-05-05 01:27:16 +00:00
Faith Ekstrand
337aaa0ab9 pan,nir: Add cube face intrinsics
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41036>
2026-05-05 01:27:15 +00:00
Rhys Perry
081feabf9c nir/search: fix nir_algebraic_automaton after constant folding op(bcsel)
Likely fixes https://gitlab.freedesktop.org/mesa/mesa/-/jobs/98917704

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f4812dc11d ("nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2)")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41343>
2026-05-04 17:27:38 +00:00
Daniel Schürmann
f4812dc11d nir/opt_constant_folding: constant-fold op(bcsel(), #c) -> bcsel(.., #c1, #c2)
for all ALU instructions except fneg instead of using nir_opt_algebraic
for a small subset.

Totals from 17711 (8.49% of 208640) affected shaders: (Navi48)
MaxWaves: 364391 -> 364397 (+0.00%); split: +0.01%, -0.01%
Instrs: 33873994 -> 33780398 (-0.28%); split: -0.31%, +0.03%
CodeSize: 198627596 -> 198259724 (-0.19%); split: -0.23%, +0.05%
VGPRs: 1435516 -> 1435144 (-0.03%); split: -0.04%, +0.02%
SpillSGPRs: 652827 -> 654577 (+0.27%); split: -0.00%, +0.27%
SpillVGPRs: 594840 -> 593598 (-0.21%); split: -0.28%, +0.07%
Scratch: 31791360 -> 31543552 (-0.78%)
Latency: 417824569 -> 415881858 (-0.46%); split: -0.48%, +0.02%
InvThroughput: 80376232 -> 80307996 (-0.08%); split: -0.10%, +0.01%
VClause: 557238 -> 554770 (-0.44%); split: -0.50%, +0.06%
SClause: 688297 -> 688125 (-0.02%); split: -0.04%, +0.02%
Copies: 3571756 -> 3566704 (-0.14%); split: -0.44%, +0.29%
Branches: 628710 -> 628576 (-0.02%); split: -0.07%, +0.05%
PreSGPRs: 1100316 -> 1103478 (+0.29%); split: -0.02%, +0.30%
PreVGPRs: 1132139 -> 1128765 (-0.30%); split: -0.30%, +0.00%
VALU: 18944830 -> 18912030 (-0.17%); split: -0.20%, +0.03%
SALU: 4363054 -> 4342748 (-0.47%); split: -0.57%, +0.10%
VMEM: 1894420 -> 1891754 (-0.14%); split: -0.19%, +0.05%
SMEM: 1073860 -> 1073741 (-0.01%); split: -0.01%, +0.00%
VOPD: 1734659 -> 1735718 (+0.06%); split: +0.20%, -0.14%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40848>
2026-05-04 09:42:59 +00:00
Daniel Schürmann
8b1c60add4 nir/opt_constant_folding: create const_value_for_alu() helper
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40848>
2026-05-04 09:42:59 +00:00
Georg Lehmann
52b195b4e8 nir/opt_algebraic: add more fmulz pattern
Totals from 3 (0.00% of 202440) affected shaders: (Navi48)
Instrs: 5684 -> 5641 (-0.76%); split: -0.77%, +0.02%
CodeSize: 30952 -> 30708 (-0.79%); split: -0.80%, +0.01%
Latency: 9236 -> 9199 (-0.40%); split: -0.42%, +0.02%
InvThroughput: 2287 -> 2273 (-0.61%)
VALU: 3900 -> 3884 (-0.41%)
SALU: 305 -> 289 (-5.25%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40848>
2026-05-04 09:42:59 +00:00
Georg Lehmann
38e691fc0a nir/opt_varyings: do no_signed_zero linking even for non removable stores
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
E.g. position in VS.

Foz-DB Navi48:
Totals from 948 (0.79% of 120695) affected shaders:
MaxWaves: 26816 -> 26828 (+0.04%)
Instrs: 799692 -> 796993 (-0.34%); split: -0.34%, +0.01%
CodeSize: 3855744 -> 3846816 (-0.23%); split: -0.24%, +0.01%
VGPRs: 50256 -> 50220 (-0.07%)
Latency: 2209359 -> 2207667 (-0.08%); split: -0.09%, +0.01%
InvThroughput: 305260 -> 303519 (-0.57%); split: -0.57%, +0.00%
VClause: 11640 -> 11643 (+0.03%); split: -0.01%, +0.03%
SClause: 21152 -> 21149 (-0.01%)
Copies: 51658 -> 51675 (+0.03%); split: -0.11%, +0.14%
Branches: 18656 -> 18655 (-0.01%)
PreVGPRs: 37999 -> 37984 (-0.04%)
VALU: 469752 -> 467406 (-0.50%); split: -0.50%, +0.00%
SALU: 105433 -> 105323 (-0.10%); split: -0.11%, +0.00%

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41292>
2026-05-03 19:55:10 +00:00
Georg Lehmann
fac4edbcba nir/opt_varyings: back propagate signed zero information to outputs
Foz-DB Navi48:
Totals from 809 (0.67% of 120695) affected shaders:
MaxWaves: 21804 -> 21808 (+0.02%)
Instrs: 863131 -> 861310 (-0.21%); split: -0.22%, +0.01%
CodeSize: 4535500 -> 4523232 (-0.27%); split: -0.30%, +0.03%
VGPRs: 47304 -> 47280 (-0.05%)
SpillSGPRs: 170 -> 82 (-51.76%)
Latency: 6791484 -> 6786880 (-0.07%); split: -0.07%, +0.00%
InvThroughput: 906281 -> 905301 (-0.11%); split: -0.11%, +0.00%
VClause: 16910 -> 16917 (+0.04%); split: -0.01%, +0.05%
SClause: 21856 -> 21827 (-0.13%); split: -0.14%, +0.01%
Copies: 61890 -> 61436 (-0.73%); split: -0.80%, +0.06%
Branches: 19725 -> 19640 (-0.43%)
PreSGPRs: 38011 -> 37851 (-0.42%)
PreVGPRs: 36482 -> 36454 (-0.08%)
VALU: 465316 -> 464323 (-0.21%); split: -0.22%, +0.00%
SALU: 143757 -> 143395 (-0.25%); split: -0.33%, +0.08%
VMEM: 36827 -> 36806 (-0.06%)
SMEM: 37769 -> 37768 (-0.00%)

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41292>
2026-05-03 19:55:10 +00:00
Georg Lehmann
b2bc57551a nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics
Just like for ALU.

No Foz-DB changes.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41292>
2026-05-03 19:55:10 +00:00
Marek Olšák
f583f6e717 nir: use nir_build_frag_coord everywhere
nir_build_frag_coord generates the correct sysval loads based on NIR
options. nir_load_frag_coord shouldn't be used directly because drivers
don't have to support it.

v2: RADV can't use it because nir->options isn't set, so use load_pixel_coord.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>
2026-05-03 13:03:01 +00:00
Marek Olšák
b63a9a8b39 nir: add direct lowered frag_coord building to replace lowering passes
Instead of lowering frag_coord 4 times during compilation,
just use this.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>
2026-05-03 13:03:00 +00:00
Marek Olšák
9c5ad16819 nir/opt_frag_coord_to_pixel_coord: handle frag_coord_xy
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>
2026-05-03 13:03:00 +00:00
Marek Olšák
076b0aaf1d nir/lower_wpos_ytransform: handle frag_coord_xy
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>
2026-05-03 13:03:00 +00:00
Marek Olšák
e49f29f25e nir: add frag_coord_xy
to strengthen and simplify pixel_coord lowering

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41227>
2026-05-03 13:03:00 +00:00
Daniel Schürmann
012d72f2b0 nir/opt_algebraic: add some imul24_relaxed pattern
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41178>
2026-05-01 10:07:26 +00:00
Daniel Schürmann
708093d830 nir/opt_algebraic: use imul24_relaxed for lowered dot4x8_add
Totals from 28 (0.04% of 72819) affected shaders: (Navi10)

MaxWaves: 181 -> 186 (+2.76%)
Instrs: 406735 -> 338360 (-16.81%)
CodeSize: 2913588 -> 2469712 (-15.23%)
VGPRs: 5520 -> 5468 (-0.94%)
SpillVGPRs: 32 -> 0 (-inf%)
LDS: 64512 -> 62464 (-3.17%)
Scratch: 10240 -> 0 (-inf%)
Latency: 11028252 -> 4357120 (-60.49%)
InvThroughput: 11004126 -> 4079018 (-62.93%)
VClause: 1686 -> 2055 (+21.89%); split: -0.89%, +22.78%
SClause: 890 -> 852 (-4.27%)
Copies: 4516 -> 2644 (-41.45%); split: -41.59%, +0.13%
PreSGPRs: 982 -> 974 (-0.81%)
PreVGPRs: 5356 -> 4284 (-20.01%)
VALU: 370529 -> 330201 (-10.88%)
SALU: 28850 -> 1170 (-95.94%)
VMEM: 2616 -> 2560 (-2.14%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41178>
2026-05-01 10:07:25 +00:00
Lorenzo Rossi
63aceb07ff nir/opt_sink: Add pan-specific load_input
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:10 +00:00
Lorenzo Rossi
30d8f9c554 nir/lower_point_size: Handle 16-bit point sizes
panfrost has float16 point size, handling that precision too allows the
compiler to call lower_point_size later in the compilation pipeline

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:10 +00:00
Lorenzo Rossi
2a7d817591 nir/opt_algebraic: optimize fadd/fmul with 16-bit source and constant
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>
2026-04-30 17:33:09 +00:00
Lorenzo Rossi
89436db611 nir: Extract float_is_half tests in common code
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>
2026-04-30 17:33:09 +00:00
Karol Herbst
4e67582ddf nir: add fmul_rtz optimizations
NVK is only going to use it for `fmul_rtz(frcp(ipa), ipa)` patterns, so
try not too hard to optimize this.

Totals from 10 (0.00% of 1212873) affected shaders:
CodeSize: 34480 -> 34288 (-0.56%); split: -0.60%, +0.05%
Static cycle count: 6225 -> 6132 (-1.49%); split: -1.57%, +0.08%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>
2026-04-30 15:42:40 +00:00
Karol Herbst
2e09b4ac68 nir: handle fmul_rtz in a couple of places
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>
2026-04-30 15:42:40 +00:00
Karol Herbst
4e520f671c nir: add fmul_rtz
It's needed in NVK for correctness with interpolation.

Backport-to: 26.1
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>
2026-04-30 15:42:40 +00:00
Marek Olšák
a3e3bf0ac2 nir/opt_dce: add shader_info::assert_inputs_not_dead
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41166>
2026-04-30 07:07:32 +00:00
Marek Olšák
7bd5856cc6 nir/opt_dce: factor out dead instruction removal into a helper
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41166>
2026-04-30 07:07:32 +00:00
Alyssa Rosenzweig
0c49738211 nir/opt_reassociate: fix exactness bug
For an inexact-associative operation (fadd or fmul), can_reassociate ensures the
root of the chain is inexact to allow reassociating. However, build_chain just
checks for opcodes to match up after, although we do sum up exactness across the
chain. Although an Effort Was Made, it still seems incorrect to reassociate

   %3 = fadd! %0, %1
   %4 = fadd %3, %2

to instead be (ex.)

   %3 = fadd! %0, %2
   %4 = fadd! %3, %1

Closes: #14418
Fixes: e0b0f7e73c ("nir: add ALU reassocation pass")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41162>
2026-04-28 21:14:56 +00:00
Georg Lehmann
599a52174b nir: disable fp class analysis for 64bit transcendentals
Some backends have terrible precision for these fp64 opcodes, so don't try to
do anything clever.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15334
Fixes: 5a298f3560 ("nir: rewrite fp range analysis as a fp class analysis")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41206>
2026-04-28 13:26:42 +00:00
Simon Perretta
57791c4a99 pco: track how many tg4/raw sample comps are needed
Rather than always emitting and swizzling 16 components for raw samples,
scale it by the number actually needed as defined by the selected tg4
channel/components.

Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
2026-04-28 12:04:03 +01:00