To align with replace_varying_input_by_uniform_load and better
describe what it does.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
Varying assigned from uniform won't change after interpolation,
so move uniform load to fragment shader to eliminate the varying.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
Deeper in chain of calls, function "src_has_indirect" is used (which
reads "is_ssa" and "reg.indirect").
Fixes: d1eae6f36b ("nir: Properly clean up nir_src/dest indirects")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13317>
In order for the load to never straddle the load can't extend past 8
bytes, not 16. For example a vec2 load with align_mul = 8 and
align_offset = 4 can straddle.
Fixes assertion failures when we stop pushing UBOs in the preamble on
a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Passes generally shouldn't use nir_metadata_all unless they don't change
the program in any significant way. Some of these passes insert new
instructions so they should definitely not be preserving most of it.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13261>
Make u_vector_init a wrapper to u_vector_init_pot. Let both take
(element_count, element_size) as parameters.
Motivated by eed0fc4caf ("vulkan/wsi/wayland: fix an invalid
u_vector_init call")
v2: rename u_vector_init_pot to u_vector_init_pow2
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13201>
Allow backends to turn some sysvals into input varyings so the frontend
(in our case spirv_to_nir()) doesn't have to bother selecting which
one is expected.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>
... just like other-size constants are.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13223>
No shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 144380118 -> 143692823 (-0.5%)
SENDs in all programs: 6920822 -> 6920822 (+0.0%)
Loops in all programs: 38299 -> 38299 (+0.0%)
Cycles in all programs: 8434782176 -> 8423078994 (-0.1%)
Spills in all programs: 206830 -> 204469 (-1.1%)
Fills in all programs: 318737 -> 313660 (-1.6%)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
Add derivative opcodes fddx_must_abs_mali/fddy_must_abs_mali satisfying:
fabs(fdd*_must_abs_mali(v)) = fabs(fdd*(v))
The sign of their result is undefined.
On Bifrost and Valhall, these unsigned derivatives can be implemented
more efficiently than the correctly-signed counterparts, since the sign
fixup requires extra ALU instructions. On backends where this is the
case, it is useful to optimize fabs(fdd*(v)) to
fabs(fdd*_must_abs_mali(v)). This pattern comes up with the GLSL builtin
`fwidth`.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12332>
Make sure the new and old sources have the same number of components,
otherwise the NIR validation pass complains.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
SCALED formats are interpreted as floats, but not in the usual [0, 1]
or [-1, 1] range, meaning that the blend lowering logic can't directly
apply to those. Assert that the format being passed is not a scaled
format.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
The caller doesn't necessarily want to lower blend operations on all
render targets since some of them might be supported natively (panvk
will be in that case). Let's just skip RTs that have a format set to
PIPE_FORMAT_NONE to allow that.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
nir_ssa_for_src() is not supposed to pad the src vector if
dst->num_components > src->num_components. Let's pad things explicitly
with nir_pad_vector().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
The NIR validation complains if the swizzle accesses a component that's
not present in the source.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
That way we can get the address to the entry, which is needed for
some nir builtins because extra data in the entry can be used as
shader input.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>
These are I/O variables which are not going to be removed anyway.
However, get_variable_io_mask handles their location incorrectly.
Found using the GCC undefined behavior sanitizer.
Fixes the following error:
runtime error:
shift exponent 4294967258 is too large
for 64-bit type 'long unsigned int'
Closes: #5319
Fixes: cf5f8f55c3
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12719>
Most modern hardware needs the edge flag added as a hidden vertex input
and needs code added to the vertex shader to copy the input to an
output. Intel hardware is a little different. Gfx4 and Gfx5 hardware
works in the previously described mannter. Gfx6+ hardware needs the
edge flag as a specific vertex shader input, and that input is magically
processed by fixed-function hardware without need for extra shader code.
This flag signals only that the vertex shader input is needed. It would
be nice if we could decouple adding the vertex shader input from
generating the copy-to-output code, but that has proven to be
challenging. Not having that code causes other passes to want to
eliminate that shader input.
v2: Convert conditional to assertion. This pass is only called for
vertex shaders. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
Now that they're no longer ralloc'd, we have to be much more careful
about indirects. We have to make sure every time a source or
destination is overwritten, its indirect (if any) is freed. We also
have to choose a memory ownership convention for the rewrite functions.
Assuming that they will be called with the source from some other
instruction, we choose to always make a copy of the indirect (if any).
It's the responsibility of the caller to ensure its copy of the indirect
is freed.
Unfortunately, all this extra logic is going to make
nir_instr_rewrite/move_src/dest more expensive because they now have
all the logic of nir_src/dest_copy instead of a simple struct
assignment. Fortunately, the vast majority of rewrite calls are done by
nir_ssa_def_rewrite_uses which is an SSA-only fast-path.
Fixes: 879a569884 "nir: Switch from ralloc to malloc for NIR instructions."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>
By replacing the 48-byte ralloc header with our exec_node gc_node (16
bytes), runtime of shader-db on my system across this series drops
-4.21738% +/- 1.47757% (n=5).
Inspired by discussion on #5034.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
Right now we're using ralloc to GC our NIR instructions, but ralloc has
significant overhead for its recursive nature so it would be nice to use a
simpler mechanism for GCing instructions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
We were using the ralloc parent in some places, which should work out to
be the shader I think, but to de-ralloc the instrs we should just pass the
existing shader pointer in.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
This code was being tricky with passing a mem_ctx instead of the shader,
then freeing the mem_ctx when the pass was done and all the parallel
copies had been removed from the shader. Use the right type for instr
creation and do a bit of manual list management to prepare the way for
non-ralloc NIR instrs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
Calling this lower pass twice in a row would cause spurious
set_vertex_and_primitive_count(0, undef) intrinsics after the proper
set_vertex_and_primitive_count intrinsic. This pretty much turns any
geometry shader into garbage.
Fix this by treating nir_intrinsic_emit_vertex_with_counter and
nir_intrinsic_end_primitive_with_counter just like the non-_with_counter
versions. If no blocks would need set_vertex_and_primitive_count
intrinsics added, exit the pass before doing any work. This prevents
the need for DCE to do extra clean up later.
Since this pass is potentially called multiple times via multiple
invocations of a finalize_nir callback, it is (hypothetically?) possible
that control flow could be changed to add new blocks that need this
intrinsic. The check implemented in this commit should be robust
against that possibility.
v2: Add a_block_needs_set_vertex_and_primitive_count. Suggested by
Timur.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>