Backward inter-shader code motion left dead output stores in the producer
in rare cases. Those dead stores would then make their way into drivers
and hw.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
If input_value, index, index1 or index2 is an input, here are examples of
code that this commit moves from consumers to producers:
* input_value * uniform_array[index]
* uniform_array[index]
* ubo[0].array[index]
* ubo[index].var
* ubo[index1].array[index2]
If the array index is computed from an input, it must be flat or convergent
within a primitive to be moved. If the array index is not an input, it must
be a uniform expression.
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment
has UBO indexing that is moved to the producer by this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Uniform and UBO loads with non-constant indices are now propagated.
The majority of this code implements cloning deref chains.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Holes due to indirectly-indexed inputs were ignored, making the compaction
worse when such inputs were present alongside convergent inputs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Without this, compaction can put inputs into vec4 slots already occupied
by indirectly-accessed inputs while ignoring their interpolation qualifier,
which is incorrect.
All input components sharing the same vec4 slot must use interpolation
qualifiers that are compatible with each other.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Most of these conditions are repeated below with a continue statement.
This just puts break at the end where all of them are false.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
Even the most flexible interpolation that we have in NIR options
(nir_io_has_flexible_input_interpolation_except_flat) doesn't allow
mixing flat and non-flat in the same vec4. This (legacy) optimization
can't promote interpolated inputs to flat if it doesn't consider
the interpolation mode of the whole vec4 slot.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
The load_*_uniform_block_intel intrinsics always load either 8x or 16x
32-bit components worth of data (so 32 byte increments). This leads to
cases where we load a few components from one vec8, followed by a few
components of an adjacent vec8. We want to combine those into a vec16
load, as that loads a whole cacheline at a time, and requires less hoops
to calculate addresses and request memory loads.
So, we allow 7 * 4 = 28 bytes of holes, which handles vec8+vec8 where
only the .x component is read.
Most drivers and intrinsics will not want such large holes. I thought
about adding a per-intrinsic max_hole to the core code, but decided that
since we already have driver callbacks, we can just rely on them to
reject what makes sense to them.
No driver callbacks currently allow holes, so this should not currently
affect any drivers. But any work in progress branches may need to be
updated to reject larger holes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>
When this pass is used with Zink, gl_PrimitiveID needs to be passed
through, however this is unnecessary for other divers.
Analogous to previous commit
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: d0342e28b3 ("nir: Add helper to create passthrough GS shader")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32397>
If the condition of the loop terminator is based on an unsigned value we
can in some cases find the max number of possible loop trips. With the
max loop trips know a complex unroll can unroll the loop.
For example:
uniform uint x;
uint i = x;
while (true) {
if (i >= 4)
break;
i += 6;
}
The above loop can be unrolled even though we don't know the initial
value of the induction variable because it can have at most 1 iteration.
There were no changes with my shader-db collection. Change was inspired
by MR #31312 where builtin shader code failed to unroll.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31701>
lower_boolean_reduce only supports ballot_components == 1. Fall back to
lower_scan_reduce when this is not the case.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
It might be convenient for filter implementations to have access to
extra information. This will be used, for example, by ir3 to access
compiler features.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
Like read_first_invocation but using getlast. Note that I intentionally
used the name of the ir3 instruction in the name as its semantics are
tricky to exactly describe otherwise.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
Some targets (e.g., ir3) don't always know the exact subgroup size.
Calculate the maximum subgroup size in that case by multiplying
ballot_components and ballot_bit_size.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>
The function is in glsl_parser_extras.cpp so move the declaration to
glsl_parser_extras.h
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32402>
switch libagx to the precompilation pipeline. see the big comment in the
previous commit for why we're doing this.
while doing so, we move some dispatch stuff. there was so much churn from
precompile that this avoids doing the churn twice. that new header will be used
for DGC down the road.
there's also a small vtn/bindgen patch in here to skip bindgen'ing entrypoints,
as that conflicts with the new dispatch macros. this is the sane behaviour, we
just need to do the full precomp switch across the tree at once.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339>
This is a simple way for drivers to enable uniform expression propagation
without having to set any callbacks for it. It replaces the old option.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32390>
This is currently needed to fix d3d12 for st_unlower_io_to_vars.
The idea is to track the current value of ClipVertex in a temporary
variable, and for every emit_vertex, we load the ClipVertex value from
the temporary (which matches the stored value) and insert new CLIP_DIST
stores before emit_vertex.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
The code for IO variables was interleaved with code for IO intrinsics,
which was difficult to follow.
lower_clip_outputs is split and replaced by more accurate names:
lower_clip_vertex_var and lower_clip_vertex_intrin
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
The algorithm is exactly the same for other opcodes so we don't have to
have to copy paste it.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
ir3 has a number of bitwise triops (e.g., shrm == (src0 >> src1) & src2)
that don't have NIR-equivalents. Doing instruction selection for them is
a lot more convenient using algebraic patterns than to have to manually
match for them. This patch add NIR opcodes for these instructions.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32181>
This was used to parse glsl ir in string format and create real ir with
it. It was previously used for some really old test infrastructure which
has now been removed so lets burn this with fire also.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32364>