We were missing a couple bits from hash and a bunch of stuff from the
comparison. This puts most of nir_tex_instr into a single pack_tex
helper that's used by both and grabs everything we were missing.
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36234>
I keep reaching for this helper but it doesn't exist. So I fixed that.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36142>
Mary has written the initial code, I've documented my changes below.
v2: support cmat_convert (Karol)
fix cross matmul (Karol)
rework matrix layout clasasification (Karol)
add support for saturated cmat_muladd (Karol)
Co-authored-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32777>
e.g. load X; load W; ==> load XYZW. Verified with a shader test.
This will be used by AMD drivers. See the code comments.
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36098>
When we detect that the source is a conversion generated by the pass,
try to get the real source instead of doing a round-trip conversion.
Make sure that the nir_alu_type and the bit_size is the same between what
we need and what's before the detected conversion.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35744>
The only last users of nir_link_opt_varyings are Vulkan drivers.
One linker error thrown by the optimizations is reimplemented
at the call site.
No interesting shader-db changes (other than random noise).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36091>
A TF2 shader propagates 0 to the consumer, which eliminates 1 input
if we run algebraic opts and DCE before compaction.
This is a prerequisite for removing all IO var optimizations from the GLSL
linker that are redundant with nir_opt_varyings.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36091>
Will be used by etnaviv too.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
It's a sysval in mesh shader, but it share the same
slot number with VARYING_SLOT_TESS_LEVEL_INNER.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35931>
this is more explicit than vec2's and hence has fewer footguns. in particular
it's easier to handle with preambles in a sane way.
modelled on what ir3 does.
there's probably room for more clean up but for now this unblocks what I want to
do.
stats don't seem concerning.
Totals from 692 (1.29% of 53701) affected shaders:
MaxWaves: 441920 -> 442112 (+0.04%)
Instrs: 1588748 -> 1589304 (+0.03%); split: -0.05%, +0.08%
CodeSize: 11487976 -> 11491620 (+0.03%); split: -0.04%, +0.07%
ALU: 1234867 -> 1235407 (+0.04%); split: -0.06%, +0.10%
FSCIB: 1234707 -> 1235249 (+0.04%); split: -0.06%, +0.10%
IC: 380514 -> 380518 (+0.00%)
GPRs: 117292 -> 117332 (+0.03%); split: -0.08%, +0.11%
Preamble instrs: 314064 -> 313948 (-0.04%); split: -0.05%, +0.01%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
it is sometimes useful to turn lowered bindless intrinsics into bound or vice
versa, and it is annoying to do so without this helper, so generalize the
helper.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
Class represents an indexed "ideal" register class, where non-general classes
only allow defs that choose that class in the def_size callback.
nir_opt_preamble will try to assign specialized classes where possible, falling
back to the general class once the special-purpose classes are exhausted.
AGX will use this mechanism to promote bindless texture handles to bound texture
registers where possible, falling back to pushing the handle as a uniform where
not possible. Supporting multiple classes in nir_opt_preamble allows this
multi-level hoisting to work in a single nir_opt_preamble call with proper
global behaviour.
Add this concept to nir_opt_preamble so we can use it in AGX later in this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
This reduces GLSL compile times with the gallium noop driver by 0.6%.
This might decrease register usage and do less code reordering because
nir_lower_io_vars_to_temporaries is no longer called for inputs, which
moved most input loads to the top.
radeonsi+ACO shader-db results are noise.
More uniforms are identified as inlinable.
TOTALS FROM ALL SHADERS (58138):
VGPRs: 2152680 -> 2158032 (0.25 %)
Code Size: 71008908 -> 71064812 (0.08 %) bytes
Max Waves: 916943 -> 916924 (-0.00 %)
Inline Uniforms: 6395 -> 6414 (0.30 %)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
It doesn't do anything since IO variables are lowered to intrinsics,
which simplifies and eliminates a lot of variable-specific stuff
like declared but dead builtin varyings and unused components
of builtin varying arrays.
This reduces GLSL compile times by 2.4% with the gallium noop driver.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36023>
nir_opt_varyings reduces the number of varyings. Check against limits after
that, so that old and limited GPUs don't fail linking when nir_opt_varyings
is able to reduce varyings to or below the limit.
The previous code only checked FS inputs, which is glaringly obvious
from the removed var_counts_against_varying_limit function.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36023>