A TF2 shader propagates 0 to the consumer, which eliminates 1 input
if we run algebraic opts and DCE before compaction.
This is a prerequisite for removing all IO var optimizations from the GLSL
linker that are redundant with nir_opt_varyings.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36091>
Class represents an indexed "ideal" register class, where non-general classes
only allow defs that choose that class in the def_size callback.
nir_opt_preamble will try to assign specialized classes where possible, falling
back to the general class once the special-purpose classes are exhausted.
AGX will use this mechanism to promote bindless texture handles to bound texture
registers where possible, falling back to pushing the handle as a uniform where
not possible. Supporting multiple classes in nir_opt_preamble allows this
multi-level hoisting to work in a single nir_opt_preamble call with proper
global behaviour.
Add this concept to nir_opt_preamble so we can use it in AGX later in this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
AGX now vendors a significantly different version of this pass, so the common
one doesn't need the stuff added for AGX.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Co-Authored-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34303>
Recent nvidia hardware has a native instruction for
nir_intrinsic_vote_ieq but not for nir_intrinsic_vote_feq. So, split
this boolean into two so we can contol the lowering separately for each
instruction.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
This adds a variant of nir_steal_tex_src() which is for derefs as well
as versions that just return the source without removing it.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35623>
This hoists all the annoyance of figuring out the current pixel's input
attachment coordinates to the driver. The pass still deals with all the
annoyance of turning an image instruciton into a texture instruction but
it gives the driver more control over the position. For most drivers,
this will be something like ivec3(int(gl_FragCoord.xy), gl_Layer) or
similar, some drivers need something more nuanced. Turnip, for
instance, needs unscaled coordinates for some attachments and NVK
doesn't really want gl_Layer or gl_ViewIndex for the layer. It's better
to just have a new system value that drivers can make what they want.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35551>
Some shaders contain back-to-back atomic accesses in SPIR-V with
AcquireRelease semantics. In NIR, we translate these to a release
memory barrier, the atomic, then an acquire memory barrier.
This results in a lot of unnecessary memory barriers in the middle
of the sequence of atomics:
0. Release barrier
1. Atomic
2. Acquire barrier
3. Release barrier
4. Atomic
5. Acquire barrier
6. Release barrier
7. Atomic
8. Acquire barrier
In the absence of loads/stores, and when the atomic destinations are
unused, these barriers in-between atomics shouldn't be required.
This optimization pass would drop them (lines 2-3 and 5-6 above) while
leaving the first and last barriers (0 and 8), so the sequence remains
synchronized against other access elsewhere in the program.
One common example where this occurs is a sequence of min and max
atomics to clamp a certain memory location's value within a range.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33504>
This handles basic operations where clang promotes integers to 32 bits
according to the C99 spec in OpenCL C source code.
This is its own opt_algerbraic pass, because we don't wanna fight with
nir_lower_bit_size.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34641>
this is useful across drivers for maint5 semantics on mobile hw.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34762>
The first pass computes which shader instructions contribute to each
output. It can be used to query how data flows within shaders towards
outputs.
The second pass computes which shader input components and which types of
memory loads are used to compute shader outputs.
The third pass uses the second pass to gather which input components are
used to compute pos and clip dist outputs, which input components are used
to compute all other outputs, and which input components are used to
compute both. This will be used by compaction in nir_opt_varyings for
drivers that split TES into a separate position cull shader and varying
shader to make it less likely that the same vec4 inputs are needed in both.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32262>
Allocate space for the aliased region first, then allocate the
non-Aliased blocks in sequence after that.
SPV_KHR_workgroup_memory_explicit_layout extension added support for
having Blocks of workgroup (shared) memory, which include layout
decoration. For that extension all such blocks must be decorated with
Aliased.
SPV_KHR_untyped_pointers extension lifts that requirement, allowing
blocks that don't alias in workgroup memory. They are still explicitly
laid out.
The motivation is that untyped pointers provide a different
mechanism to obtain the same effect as the Aliased blocks. Instead of
having two Aliased variables with different types, have a single
variable and use an untyped pointer with a different type to access it.
This patch is a preparation for supporting untyped pointers.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34139>
this is a cleaned up version of the lowering originally written for asahi, moved
to common code so it can be shared with an upcoming Vulkan implementation (not
honeykrisp).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34507>
This pass was originally based on a similar pass from Intel but it's
grown support for some fancy stuff like fp64 -> fp16 conversion
splitting with proper rounding.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34126>
Intel HW only has support for non-uniform offsets for TG4 operations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
Since removing nir_intrinsic_discard{_if} it has no purpose anymore.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33492>
Arm and NVIDIA hardware both have this as a bit you can set on the
texture instruction so we may as well have a shared pass for it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33402>
in prep for removing this method.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
These will replace nir_metadata_preserve as more ergonomic replacements that
convey a notion of impl progress instead of simply updating metadata.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>