Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.
We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.
Via Coccinelle patch:
@@
expression a, b;
@@
-nir_def_rewrite_uses_after(a, b, b->parent_instr)
+nir_def_rewrite_uses_after_def(a, b)
Followed by a bunch of sed.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
Another common composition.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
We want to get rid of nir_def::parent_instr eventually, requiring an accessor
function instead nir_def_parent_instr(def), so to mitigate the hit to NIR
ergonomics, let's add helpers for common patterns using parent_instr. This gets
us an immediate win for NIR ergonomics and then reduces the surface area for the
later flag day hiding parent_instr.
This commit starts us off by adding compositions for nir_instr_as_* with
parent_instr's, which are common.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
Set to true everywhere except:
- spirv_to_nir used by Vulkan
- bindless handles in GLSL
- some internal shaders and driver-specific code
Acked-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36099>
Correctly/optimally using nir_opt_varyings directly is pretty tricky. For GL, we
have all the right logic in the GLSL linker. for VK, we don't want to duplicate
this dance in every driver. Wrap it all up in a nir_opt_varyings_bulk helper
that operates on an entire pipeline of nir_shader's, following the GLSL linker's
logic. This is suitable for Vulkan drivers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36265>
See the comment at the top file :-)
The ideas in this pass are based on LLVM. The implementation itself is from
scratch because have you /tried/ to read that thing?
Because LLVM and therefore prop drivers do these optimizations, this should help
narrow Mesa's performance gap with the blobs.
This probably needs some tuning for best results on other ISAs, but the stats
for AGX speak for themselves, see next commits.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36147>
this is a little annoying.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36147>
nothing currently uses the associative flag, but they will change soon. we need
to stop incorrectly marking fmul/fadd/etc as associative, because they're not,
but they almost are. distinguish these properties so we can correctly
handle floating point rules without any opcode-based special casing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36257>
e.g. load X; load W; ==> load XYZW. Verified with a shader test.
This will be used by AMD drivers. See the code comments.
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36098>
A TF2 shader propagates 0 to the consumer, which eliminates 1 input
if we run algebraic opts and DCE before compaction.
This is a prerequisite for removing all IO var optimizations from the GLSL
linker that are redundant with nir_opt_varyings.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36091>
Class represents an indexed "ideal" register class, where non-general classes
only allow defs that choose that class in the def_size callback.
nir_opt_preamble will try to assign specialized classes where possible, falling
back to the general class once the special-purpose classes are exhausted.
AGX will use this mechanism to promote bindless texture handles to bound texture
registers where possible, falling back to pushing the handle as a uniform where
not possible. Supporting multiple classes in nir_opt_preamble allows this
multi-level hoisting to work in a single nir_opt_preamble call with proper
global behaviour.
Add this concept to nir_opt_preamble so we can use it in AGX later in this MR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35949>
This can be used to move input loads to top after we stop using
nir_lower_io_vars_to_temporaries that does it unconditionally.
It's more flexible than what nir_lower_io_vars_to_temporaries was doing,
and can be extended to handle any instructions.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
This is a partial replacement for nir_lower_io_vars_to_temporaries.
It supports all input and output loads. It doesn't handle stores.
The motivation is to improve compile times.
The main differences compared to nir_lower_io_vars_to_temporaries are:
- it only lowers indirect loads to temps and doesn't touch direct loads
which improves compile times and removes the need for nir_lower_vars_to_ssa
afterward because indirect temp access can't be lowered to SSA
- it doesn't move all input loads to the top; it only moves those input
loads to the top whose indirect loads are lowered (which improves
register usage because direct loads are not moved)
- it doesn't have to deal with complexities of variables
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36018>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
AGX now vendors a significantly different version of this pass, so the common
one doesn't need the stuff added for AGX.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35802>
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Co-Authored-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34303>
Recent nvidia hardware has a native instruction for
nir_intrinsic_vote_ieq but not for nir_intrinsic_vote_feq. So, split
this boolean into two so we can contol the lowering separately for each
instruction.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35778>
This adds a variant of nir_steal_tex_src() which is for derefs as well
as versions that just return the source without removing it.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35623>
This hoists all the annoyance of figuring out the current pixel's input
attachment coordinates to the driver. The pass still deals with all the
annoyance of turning an image instruciton into a texture instruction but
it gives the driver more control over the position. For most drivers,
this will be something like ivec3(int(gl_FragCoord.xy), gl_Layer) or
similar, some drivers need something more nuanced. Turnip, for
instance, needs unscaled coordinates for some attachments and NVK
doesn't really want gl_Layer or gl_ViewIndex for the layer. It's better
to just have a new system value that drivers can make what they want.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35551>
Some shaders contain back-to-back atomic accesses in SPIR-V with
AcquireRelease semantics. In NIR, we translate these to a release
memory barrier, the atomic, then an acquire memory barrier.
This results in a lot of unnecessary memory barriers in the middle
of the sequence of atomics:
0. Release barrier
1. Atomic
2. Acquire barrier
3. Release barrier
4. Atomic
5. Acquire barrier
6. Release barrier
7. Atomic
8. Acquire barrier
In the absence of loads/stores, and when the atomic destinations are
unused, these barriers in-between atomics shouldn't be required.
This optimization pass would drop them (lines 2-3 and 5-6 above) while
leaving the first and last barriers (0 and 8), so the sequence remains
synchronized against other access elsewhere in the program.
One common example where this occurs is a sequence of min and max
atomics to clamp a certain memory location's value within a range.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33504>
This handles basic operations where clang promotes integers to 32 bits
according to the C99 spec in OpenCL C source code.
This is its own opt_algerbraic pass, because we don't wanna fight with
nir_lower_bit_size.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34641>