It's a sysval in mesh shader, but it share the same
slot number with VARYING_SLOT_TESS_LEVEL_INNER.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
this is more explicit than vec2's and hence has fewer footguns. in particular
it's easier to handle with preambles in a sane way.
modelled on what ir3 does.
there's probably room for more clean up but for now this unblocks what I want to
do.
stats don't seem concerning.
Totals from 692 (1.29% of 53701) affected shaders:
MaxWaves: 441920 -> 442112 (+0.04%)
Instrs: 1588748 -> 1589304 (+0.03%); split: -0.05%, +0.08%
CodeSize: 11487976 -> 11491620 (+0.03%); split: -0.04%, +0.07%
ALU: 1234867 -> 1235407 (+0.04%); split: -0.06%, +0.10%
FSCIB: 1234707 -> 1235249 (+0.04%); split: -0.06%, +0.10%
IC: 380514 -> 380518 (+0.00%)
GPRs: 117292 -> 117332 (+0.03%); split: -0.08%, +0.11%
Preamble instrs: 314064 -> 313948 (-0.04%); split: -0.05%, +0.01%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
it is sometimes useful to turn lowered bindless intrinsics into bound or vice
versa, and it is annoying to do so without this helper, so generalize the
helper.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
Class represents an indexed "ideal" register class, where non-general classes
only allow defs that choose that class in the def_size callback.
nir_opt_preamble will try to assign specialized classes where possible, falling
back to the general class once the special-purpose classes are exhausted.
AGX will use this mechanism to promote bindless texture handles to bound texture
registers where possible, falling back to pushing the handle as a uniform where
not possible. Supporting multiple classes in nir_opt_preamble allows this
multi-level hoisting to work in a single nir_opt_preamble call with proper
global behaviour.
Add this concept to nir_opt_preamble so we can use it in AGX later in this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
This reduces GLSL compile times with the gallium noop driver by 0.6%.
This might decrease register usage and do less code reordering because
nir_lower_io_vars_to_temporaries is no longer called for inputs, which
moved most input loads to the top.
radeonsi+ACO shader-db results are noise.
More uniforms are identified as inlinable.
TOTALS FROM ALL SHADERS (58138):
VGPRs: 2152680 -> 2158032 (0.25 %)
Code Size: 71008908 -> 71064812 (0.08 %) bytes
Max Waves: 916943 -> 916924 (-0.00 %)
Inline Uniforms: 6395 -> 6414 (0.30 %)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
The fmul+fadd -> fma rules in nir_opt_algebraic are marked imprecise,
because they are a contraction. However, they respect signed zero/Inf/NaN rules.
As such, it is legal to do this fusion with shader float controls as long as the
exact bit is not set (mapping to SPIR-V NoContract).
Unfortunately, NIR's imprecise rules do not distinguish between contraction
issues versus float special case issues, forcing nir_search to skip all
imprecise rules when any shader float control modes are used. This notably
affects DXVK, which sets shader float controls to get D3D11 float behaviour and
hence loses FMA fusing.
Therefore, we plumb in the exact bit to express NoContract independent of the
float controls, and weaken the requirement for fma fusion to allowable
contraction. For fma splitting, it's a similar issue, as inexact GLSL fma in
SPIR-V is just a multiply add that we're allowed to contract rather than the
real deal.
Drivers that use their own FMA fusing passes (notably, Intel and AMD) are
unaffected, but DXVK-capable drivers using fuse_ffma should like this. Results
on hk shown:
Totals from 2194 (4.06% of 54019) affected shaders:
MaxWaves: 2174272 -> 2175936 (+0.08%); split: +0.08%, -0.01%
Instrs: 1173283 -> 1131494 (-3.56%); split: -3.57%, +0.01%
CodeSize: 8568168 -> 8381724 (-2.18%); split: -2.18%, +0.01%
Spills: 1094 -> 747 (-31.72%)
Fills: 988 -> 681 (-31.07%)
Scratch: 4444 -> 3820 (-14.04%)
ALU: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
FSCIB: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
IC: 215398 -> 215274 (-0.06%)
GPRs: 139865 -> 139032 (-0.60%); split: -1.56%, +0.96%
Uniforms: 414886 -> 414466 (-0.10%); split: -0.14%, +0.04%
Preamble instrs: 646398 -> 644017 (-0.37%); split: -0.43%, +0.07%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
This check causes unnecessary overhead and can be replaced by simply
checking whether a phi_src is from a loop continue block.
Except for rare edge cases, the result will be the same.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
It's done later in nir_lower_io_passes only for shader stages not
supporting indirect access.
Unfortunately we have add a hack into nir_lower_io_passes to get rid of
output loads. A later commit will remove it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
make sure we can fold the f2f away. alternatively f2fmp would work
here but details.
elden ring:
Totals from 137 (4.27% of 3206) affected shaders:
Instrs: 485455 -> 484904 (-0.11%)
CodeSize: 3218638 -> 3215338 (-0.10%)
ALU: 308071 -> 307520 (-0.18%)
FSCIB: 308071 -> 307520 (-0.18%)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35909>
for drivers where we need to lower a base_workgroup_id but not global IDs.
rather than lowering the whole global ID to stick the base workgroup ID in
there, just add the workgroup offset to the final thread position.
Elden ring fossils:
Totals from 52 (1.62% of 3206) affected shaders:
Instrs: 48355 -> 48233 (-0.25%); split: -0.31%, +0.06%
CodeSize: 331912 -> 331148 (-0.23%); split: -0.28%, +0.05%
ALU: 30853 -> 30674 (-0.58%); split: -0.70%, +0.12%
FSCIB: 30853 -> 30674 (-0.58%); split: -0.70%, +0.12%
IC: 9054 -> 8958 (-1.06%)
GPRs: 4184 -> 4216 (+0.76%)
Uniforms: 6703 -> 6677 (-0.39%); split: -1.61%, +1.22%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35909>
As the lowering mentioned there got renamed twice:
commit b085016f94
Author: Rob Clark <robclark@freedesktop.org>
Date: Fri Mar 25 13:52:26 2016 -0400
nir: rename lower_outputs_to_temporaries -> lower_io_to_temporaries
Since it will gain support to lower inputs, give it a more generic name.
commit 1754507d49
Author: Marek Ol¨ák <maraeo@gmail.com>
Date: Wed Jun 25 19:05:19 2025 -0400
nir: rename nir_lower_io_to_temporaries -> nir_lower_io_vars_to_temporaries
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>
Reviewed-by: Marek Ol¨ák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35855>
This is a cleanup.
Old gs LDS layout: [es outputs][gs outputs][scratch]
Old nogs LDS layout: [xfb/cull][scratch]
New gs LDS layout: [es outputs][scratch|gs outputs]
New nogs LDS layout: [scratch|xfb/cull]
The LDS scratch is moved to the beginning of the preceding buffer in LDS,
while the addresses in that LDS buffer are offset by the scratch size.
It effectively merges the LDS scratch with the preceding buffer in LDS.
Thanks to that, we no longer need the ngg_scratch ABI and the offset
in a user SGPR.
The lowering passes now return the LDS scratch size, which is used
by the drivers to determine the final LDS size.
The ngg_lds_layout SGPR is now unused without GS in RADV.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35352>
A cooperative matrix can only be constructed from a single
scalar value. Print that value, wrapped by a function call that
looks like a type-constructor.
This adds a test case that will otherwise assert out in spirv2nir.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35757>
AGX now vendors a significantly different version of this pass, so the common
one doesn't need the stuff added for AGX.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>
for plumbing transformFeedbackRasterizationStreamSelect (in turn for exercising
more CTS and proving out my design).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>
In https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802 we will
significantly rework geometry shaders & transform feedback. In the new approach,
transform feedback is executed as part of the hardware vertex shader, meaning
the vertex shader needs to write out all the "copies" of the same value into
different parts of the XFB buffer. In the general case of a GS writing triangle
strips, we get 0-3 copies. This is good and lets us parallelize XFB better with
GS.
In the case of a VS alone with XFB, we insert a passthrough GS. In that case
special case, we can only get at most 1 copy, so if we can prove the length of
the output strip is 3 we can delete 2/3 of the shader.
Anyway, the only thing preventing NIR from doing that optimization is failing to
see through some conditionals, fixed by optimizing with the law of trichotomy.
We could add other variants of this pattern (signed vs unsigned, iand vs
ior/ixor) if we expect anything else to hit this other than my boutique use
case.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Co-Authored-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34303>
Brings the vertex shader in
dEQP-VK.subgroups.vote.framebuffer.subgroupallequal_dvec4_vertex
from 234 to 169 instructions on NAK.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>