Commit graph

7406 commits

Author SHA1 Message Date
Simon Perretta
57791c4a99 pco: track how many tg4/raw sample comps are needed
Rather than always emitting and swizzling 16 components for raw samples,
scale it by the number actually needed as defined by the selected tg4
channel/components.

Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
2026-04-28 12:04:03 +01:00
Marek Olšák
3dcba87ca3 nir/opt_licm: hoist instructions across multiple levels of nested loops
radv gfx12:

Totals:
Instrs: 42861311 -> 42861476 (+0.00%); split: -0.00%, +0.00%
CodeSize: 227917476 -> 227918160 (+0.00%); split: -0.00%, +0.00%
Latency: 265381068 -> 265373506 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 42954018 -> 42952350 (-0.00%)
VClause: 819026 -> 819024 (-0.00%)
SClause: 1210348 -> 1210293 (-0.00%)
Copies: 2919525 -> 2919597 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 2889432 -> 2889406 (-0.00%)
VALU: 23757371 -> 23757377 (+0.00%); split: -0.00%, +0.00%
SALU: 5981417 -> 5981485 (+0.00%); split: -0.00%, +0.00%
VOPD: 8966 -> 8964 (-0.02%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>
2026-04-27 23:58:21 +00:00
Marek Olšák
8e036fcaec nir/opt_licm: use nir_metadata_control_flow
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>
2026-04-27 23:58:21 +00:00
Marek Olšák
e0112be522 nir/opt_licm: add a private state structure for the pass
The structure will grow in later commits.

The major change is that the preheader and exit blocks are replaced
by tracking just the innermost optimized nir_loop * and getting the
predecessor and successor blocks out of it.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>
2026-04-27 23:58:20 +00:00
Timothy Arceri
a42c55da46 amd/radeonsi: dont clamp packed user varyings
ac_nir_optimize_outputs() might pack user varyings into the color
built-ins. If this happens we skip adding clamping to the
components that contain the user varying.

This change also fixes a second bug where a color built-in can be
packed into a non-color slot and was no longer being clamped.

Fixes: 3777a5d7 ("radeonsi: assign param export indices before compilation")
Closes: #14443

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40594>
2026-04-27 22:59:58 +00:00
Simon Perretta
af1669d9e2 pco: reserve additional outputs for trilinear sampled coeffs
Sampling coeffs with trilinear filtering will output 2x sets of data.
Whether bilinear or trilinear filtering is in use can't be determined
without checking state words, so unconditionally reserve 2x to avoid
clobbering output regs.

Fixes: 7df32ba09d ("pco: initial texture/sampler compiler support")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41051>
2026-04-27 11:32:29 +00:00
squidbus
a41f0e62bb asahi,nir: Move asahi dynamic clipz pass to common.
Acked-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41088>
2026-04-27 11:00:59 +00:00
Rhys Perry
91d555c2cb radv: lower indirect derefs after linking
Scratch access isn't very optimizable, so more stores are optimized away
if we lower indirect derefs after both linking and radv_optimize_nir.

fossil-db (navi21):
Totals from 1264 (0.62% of 202427) affected shaders:
Instrs: 1504703 -> 1504708 (+0.00%); split: -0.02%, +0.02%
CodeSize: 8031388 -> 8031020 (-0.00%); split: -0.02%, +0.02%
SpillSGPRs: 1865 -> 1869 (+0.21%)
Latency: 12106362 -> 12106464 (+0.00%); split: -0.01%, +0.01%
InvThroughput: 4056269 -> 4056044 (-0.01%); split: -0.01%, +0.00%
VClause: 13927 -> 13940 (+0.09%)
SClause: 32382 -> 32396 (+0.04%); split: -0.03%, +0.08%
Copies: 188004 -> 187897 (-0.06%); split: -0.17%, +0.11%
Branches: 39045 -> 39052 (+0.02%); split: -0.01%, +0.03%
PreSGPRs: 79885 -> 79814 (-0.09%); split: -0.11%, +0.02%
VALU: 1072639 -> 1072532 (-0.01%); split: -0.01%, +0.00%
SALU: 187317 -> 187375 (+0.03%); split: -0.11%, +0.14%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31265>
2026-04-24 11:01:03 +00:00
Alyssa Rosenzweig
6a43e6c9e0 nir/opt_algebraic: add redundant u2u32/unpack_64_2x32_split_x patterns
reduces hello world kernel 57 -> 44 inst on jay. why do we have two opcodes that
do literally the same thing? :/

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41085>
2026-04-23 19:54:21 +00:00
Daniel Schürmann
806fcc6193 nir/opt_loop: always try to peel initial break from loops with unrolling hint
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This allows to unroll these loops, even if loop analyze is unable
to calculate the iteration count.
As always with loops, the throughput stats are meaningless.

Totals from 6 (0.00% of 202440) affected shaders: (Navi48)
Instrs: 7825 -> 6201 (-20.75%)
CodeSize: 37056 -> 30412 (-17.93%)
Latency: 21563 -> 16934 (-21.47%)
InvThroughput: 144649 -> 77962 (-46.10%)
SClause: 139 -> 133 (-4.32%)
Copies: 536 -> 388 (-27.61%)
Branches: 156 -> 84 (-46.15%)
PreVGPRs: 298 -> 296 (-0.67%); split: -1.01%, +0.34%
VALU: 2493 -> 2378 (-4.61%); split: -4.65%, +0.04%
SALU: 3263 -> 2199 (-32.61%)
SMEM: 188 -> 183 (-2.66%)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>
2026-04-22 10:34:58 +00:00
Daniel Schürmann
738cc6a7db nir/opt_loop: stop recursion at loop header phi in can_constant_fold()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>
2026-04-22 10:34:58 +00:00
Daniel Schürmann
1f9a0490c6 nir/opt_loop: Don't peel initial break from do-while loops
As the main purpose of this optimization is to transform
while- into do-while loops, don't apply on loops which are
already in do-while form. Also set nir_loop::do_while after
this transformation, so that it is only applied once.

Totals from 576 (0.28% of 202440) affected shaders: (Navi48)
Instrs: 1337529 -> 1253438 (-6.29%); split: -6.36%, +0.07%
CodeSize: 8390852 -> 7837328 (-6.60%); split: -6.61%, +0.01%
VGPRs: 50856 -> 50844 (-0.02%)
SpillSGPRs: 42198 -> 35395 (-16.12%); split: -16.13%, +0.01%
SpillVGPRs: 47608 -> 44620 (-6.28%)
Latency: 31043828 -> 44143753 (+42.20%); split: -0.06%, +42.26%
InvThroughput: 6973433 -> 10079000 (+44.53%); split: -0.08%, +44.61%
VClause: 26839 -> 24718 (-7.90%); split: -7.91%, +0.00%
SClause: 21831 -> 21583 (-1.14%); split: -1.52%, +0.38%
Copies: 183503 -> 150040 (-18.24%); split: -18.84%, +0.61%
Branches: 27738 -> 26848 (-3.21%); split: -5.12%, +1.91%
PreSGPRs: 40233 -> 39083 (-2.86%); split: -2.88%, +0.02%
PreVGPRs: 38745 -> 38903 (+0.41%); split: -0.02%, +0.43%
VALU: 688396 -> 645948 (-6.17%); split: -6.17%, +0.01%
SALU: 189792 -> 177642 (-6.40%); split: -6.97%, +0.57%
VMEM: 121500 -> 112748 (-7.20%)
SMEM: 38765 -> 37767 (-2.57%); split: -2.58%, +0.00%
VOPD: 102488 -> 89071 (-13.09%); split: +0.24%, -13.33%

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>
2026-04-22 10:34:58 +00:00
Daniel Schürmann
32436731a3 nir: add nir_loop::do_while to indicate do-while loops
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40349>
2026-04-22 10:34:58 +00:00
Eric R. Smith
4ae192a3d9 glsl, spirv: Improve accuracy of asin() and acos()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The polynomial used for asin_expr() was suboptimal (and its source was
not documented).

A better approximation is found in the _Handbook_of_Mathematical_Functions_
by Abramowitz and Stegun, which is used in Nvidia's Cg toolkit. However,
while this approximation gives a good absolute error bound, its relative
error exceeds the 4096 ulp allowed by the Vulkan spec. Taking a page
from the spirv implementation of asin(), we implement a piecewise
approximation where a Taylor series is used for small values of |x|.
This patch also harmonizes the GLSL and Vulkan implementations by moving
the implementation to common code (nir_builder).

Running tests on asin() with a grid of 64000 samples between 0.0 and +1.0,
the original asin() at 32 bits has:
```
                       glsl                       spirv
  RMSE:            1.756451e-04                 1.609091e-04
  worst abs error: 3.904104e-04 at 0.937001     3.904104e-04 at 0.937001
  worst ulp error: 11800 at 6.2499e-05          3826 at 0.841331
```
whereas the new implementation has for both:
```
  RMSE:            2.528056e-05
  worst abs error: 4.962087e-05 at 0.451149
  worst ulp error: 2379 at 0.215106
```

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40862>
2026-04-21 21:10:22 +00:00
Brandon Jones
d1dd65d425 nir/opt_algebraic: fix fabs optimization
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes a regression found in blender's unit testing, which called
fabs(-0.0) and invoked an NIR optimization that is was not valid for
the parameter -0.0. IEEE 754 requires that abs clear the sign bit
for the value -0.0.

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41060>
2026-04-21 04:10:29 +00:00
Lionel Landwerlin
bbeb6be6eb nir: expose nir_opt_dce_impl
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41047>
2026-04-20 21:53:35 +03:00
Patrick Lerda
9815901f86 r600: implement tes and tcs instanced gl_PrimitiveID support
This change extends r600_lds_constant_buffer to
implement a fully conformant gl_PrimitiveID at
the tes and tcs stages.

This change was tested on cayman and barts. Here are the tests fixed:
spec/arb_tessellation_shader/execution/tcs-primitiveid-instanced: fail pass
spec/arb_tessellation_shader/execution/tes-no-tcs-primitiveid-instanced: fail pass
spec/arb_tessellation_shader/execution/tes-primitiveid-instanced: fail pass
khr-gl4[4-6]/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
khr-gles31/core/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
khr-glesext/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass

Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40297>
2026-04-20 13:21:55 +00:00
Janne Grunau
98a97cb413 nir/gather_info: clear interpolation qualifiers only in fragment stage
Asahi wants the the interpolation qualifiers from the shader info in the
vertex shader. Clear them only in the fragment stage so they can
propagate back.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15288
Backport-to: 26.0
Fixes: a72704d0fb ("nir/gather_info: clear interpolation qualifiers before gathering")
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41040>
2026-04-19 10:10:15 +00:00
Alyssa Rosenzweig
4b81cb6206 nir/opt_generate_bfi: avoid trivial instructions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
With the pass order shuffling, code like `(x & 0xf) + (x & 0xfffffff0)` gets
optimized to bitfield_select(0xF, x, x). But it would be much better to optimize
simply to x. nir_opt_algebraic would do that for us but we run this pass too
late for algebraic to save us from ourselves, so be smarter.

Observed on dEQP-GLES31.functional.compute.basic.image_atomic_op_local_size_8
with Jay, this saves an instruction there.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40956>
2026-04-16 13:54:41 +00:00
Georg Lehmann
f949e3b819 nir: remove nir_link_xfb_varyings
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
RADV was the last user.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40977>
2026-04-16 08:49:23 +00:00
Marek Olšák
835f5faf14 nir: add back color0/1 system values and VARYING_SLOT_PARAM_GEN_AMD
It turns out we need the color sysvals recorded in system_values_read,
and PARAM_GEN is for point smoothing.

Acked-by: Pierre-Eric
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40556>
2026-04-15 18:12:07 +00:00
Natalie Vock
57f796752d nir/deref: Elide loads/stores from deref cast of undef
These can never be meaningful. DOOM: The Dark Ages also relies on this.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40799>
2026-04-15 08:42:12 +00:00
Samuel Pitoiset
0b016f4bff nir: add new system values for descriptor heap RT traversal inputs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39483>
2026-04-14 10:10:22 +00:00
Georg Lehmann
d2d86b83fb nir/opt_reassociate: use nir_fp_no_reassoc instead of exact
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40872>
2026-04-12 17:10:27 +00:00
Georg Lehmann
6549b993e6 nir/algebraic: actually seperate contract and inexact
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40872>
2026-04-12 17:10:27 +00:00
Georg Lehmann
a42ac8dec6 nir: split exact bit into no_contract/reassoc/transform
Just like float control2.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40872>
2026-04-12 17:10:27 +00:00
Job Noorman
b8437e30bf nir/lower_atomics: add support for bindless_image_atomic
Initial data is loaded via bindless_image_load, atomic swap via
bindless_image_atomic_swap.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39932>
2026-04-11 19:46:13 +00:00
Alyssa Rosenzweig
4356ad1bf5 nir: add pixel_coord_intel
This is a 2x16 bitpacked version of load_pixel_coord which maps directly to the
hardware value and is much easier for Jay to consume due to the sadness that is
true 16-bit on Intel. Jay will lower to this internally.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
2026-04-10 18:21:21 +00:00
Alyssa Rosenzweig
bd6d210386 nir: add shuffle_intel
Jay will use this to lower & optimize subgroup shuffles. This is closer to
how Intel hardware works but still much higher level than the hardware
primitive. This gets us NIR optimizations on the multiply however.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
2026-04-10 18:21:21 +00:00
Alyssa Rosenzweig
b840b178af nir: add Intel RT write intrinsic
This exposes the underlying render target write message directly, which Jay will
use to lower RT writes in NIR. I'm still on the fence about what exactly this
should look like but this is good enough for GLES3.0 (so, multiple render
targets but not necessarily dual source blending).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
2026-04-10 18:21:21 +00:00
Alyssa Rosenzweig
566047222e nir: add frag_coord_w_rcp intrinsic
This maps directly to what Intel's thread payload gives us, allowing us to
optimize out frcp's in some cases. Jay will use this.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
2026-04-10 18:21:21 +00:00
Kenneth Graunke
09089fdd13 nir: Add nir_texop_sparse_residency[_txf]_intel operations
These lowered versions map to what Jay can deal with. The hardware is more
flexible but we're not due to data model restrictions. We choose to lower to get
us off the ground, we can revisit later.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
2026-04-10 18:21:21 +00:00
Faith Ekstrand
91e6507665 nir: Add a nir_alu_src_comp_as_uint() helper
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
2026-04-09 18:08:40 -04:00
Faith Ekstrand
fb347b8458 nir: Add a couple is_zero() helpers
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40769>
2026-04-09 18:05:20 -04:00
Kenneth Graunke
0b99c88337 nir, brw: lower scratch in NIR
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This will let us share a common scratch swizzling between brw and jay.

Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
  info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch.  These take an
  offset in DWords, and our offsets are now always in bytes.  This also
  means that we no longer create MEMORY_OPCODE_* IR with inconsistent
  units of either bytes or dwords.  Yikes.  We use byte scattered
  messages now.

fossil-db stats on Battlemage:

   Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
   CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
   Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
   Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
   Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
   Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)

   Totals from 99 (0.01% of 1581969) affected shaders:
   Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
   CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
   Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
   Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
   Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
   Non SSA regs after NIR: 37894 -> 24155 (-36.26%)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
2026-04-09 21:02:16 +00:00
Kenneth Graunke
0c0fea1cdb nir: Increase tex opcode bits from 5 to 6 in nir_instr_set
We are already at our limit of 31 texture opcodes, and cannot add any
more without expanding the opcode hashing in nir_instr_set.  Thankfully,
it's at 29 bits, so adding one here is possible still.

Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
2026-04-08 16:07:35 +00:00
Rhys Perry
01516746eb nir: use a u_dynarray for block predecessors
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
A set is large and expensive to iterate.

This is faster (overall fossilize-replay difference):
Difference at 95.0% confidence
        -250 +/- 28.9257
        -2.04849% +/- 0.235211%
        (Student's t, pooled s = 34.1626)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
2026-04-08 15:06:34 +00:00
Rhys Perry
99c5f48da9 nir/cf: don't remove block predecessors while iterating
This would have been fine for a set, but we're going to use a dynarray for
the predecessors in the future.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
2026-04-08 15:06:33 +00:00
Rhys Perry
ea148c9136 nir: add nir_loop_has_back_edge helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
2026-04-08 15:06:32 +00:00
Rhys Perry
463e3643f2 nir: add and use block predecessor helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
2026-04-08 15:06:32 +00:00
Samuel Pitoiset
b402b98347 nir: allow heap image intrinsics in nir_rewrite_image_intrinsic()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40842>
2026-04-08 13:56:04 +00:00
Samuel Pitoiset
a41724e923 nir: remove nir_intrinsic_global_addr_to_descriptor
It's no longer emitted by vk_nir_lower_descriptor_heaps().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40826>
2026-04-08 09:46:01 +00:00
Georg Lehmann
0066328cf1 nir/opt_algebraic: create more 64bit bit test
Foz-DB GFX1201:
Totals from 2 (0.00% of 205032) affected shaders:
Instrs: 3429 -> 3425 (-0.12%)
CodeSize: 19580 -> 19568 (-0.06%)
Latency: 13629 -> 13628 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 1853 -> 1847 (-0.32%)
Copies: 235 -> 237 (+0.85%)
VALU: 1901 -> 1898 (-0.16%)
SALU: 381 -> 380 (-0.26%)
VOPD: 307 -> 309 (+0.65%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40705>
2026-04-08 08:44:20 +00:00
Rhys Perry
54515bf8d1 nir/propagate_invariant: handle images
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
2026-04-08 07:10:26 +00:00
Rhys Perry
4d65a7d16e nir/propagate_invariant: be more conservative with aliasing variables
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
2026-04-08 07:10:26 +00:00
Rhys Perry
310c8a0a17 nir/propagate_invariant: be more conservative with NULL variables
If we can't determine what variable is accessed, we should assume it could
be any.

This might make things worse with Vulkan since it does
vulkan_resource_index+load_vulkan_descriptor, but I don't think it matters
much. SSBO stores are rarely used in vertex shaders.

fossil-db (navi21):
Totals from 1 (0.00% of 202427) affected shaders:
Instrs: 442 -> 445 (+0.68%)
Latency: 2038 -> 2043 (+0.25%)
InvThroughput: 432 -> 437 (+1.16%)
VALU: 295 -> 298 (+1.02%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
2026-04-08 07:10:26 +00:00
Rhys Perry
b6f9a9092a nir/propagate_invariant: include derefs
In case an array index is calculated using floating point.

fossil-db (navi21):
Totals from 1339 (0.66% of 202427) affected shaders:
MaxWaves: 30368 -> 30396 (+0.09%)
Instrs: 805468 -> 817035 (+1.44%); split: -0.00%, +1.44%
CodeSize: 4084432 -> 4130548 (+1.13%); split: -0.01%, +1.14%
VGPRs: 62000 -> 61952 (-0.08%)
Latency: 3885607 -> 3903915 (+0.47%); split: -0.01%, +0.48%
InvThroughput: 726350 -> 742176 (+2.18%); split: -0.03%, +2.21%
VClause: 22187 -> 22676 (+2.20%); split: -0.14%, +2.34%
SClause: 14663 -> 14616 (-0.32%); split: -0.42%, +0.10%
Copies: 65726 -> 65840 (+0.17%); split: -0.09%, +0.26%
Branches: 21075 -> 21076 (+0.00%); split: -0.02%, +0.03%
PreVGPRs: 48970 -> 48941 (-0.06%)
VALU: 454510 -> 466082 (+2.55%); split: -0.01%, +2.55%
SALU: 130543 -> 130521 (-0.02%); split: -0.03%, +0.01%
VMEM: 43876 -> 43896 (+0.05%)

Most of these changes are elden_ring shaders which can no longer optimize
f2u32(u2f32()).

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
2026-04-08 07:10:26 +00:00
Rhys Perry
75ccfd8fb2 nir/propagate_invariant: set fp_math_ctrl for intrinsics
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
2026-04-08 07:10:26 +00:00
Rhys Perry
20e85604ad nir/propagate_invariant: include intrinsics
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
2026-04-08 07:10:26 +00:00
Rhys Perry
b30c0d8264 nir/algebraic: optimize exact f2u32(fmul(unpack_norm))
fossil-db (navi21):
Totals from 16 (0.01% of 202427) affected shaders:
Instrs: 17730 -> 17226 (-2.84%)
CodeSize: 97500 -> 95708 (-1.84%)
InvThroughput: 44437 -> 44419 (-0.04%)
Copies: 1502 -> 1446 (-3.73%)
VALU: 9973 -> 9525 (-4.49%)
SALU: 3509 -> 3453 (-1.60%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
2026-04-08 07:10:26 +00:00