It was swept at the end, but it meant that in shaders with lots of copies
available at the start of lots of if statements, you'd blow up memory
usage.
turnip memory consumption on dEQP-VK.ssbo.layout.random.scalar.75 drops
from 1.4GB to 110MB, and runtime from 19s to 17s.
Fixes: #7361
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18891>
The BITFIELD_MASK() macro is intended for using with actual bitfields,
not with nir_component_mask_t. This means we do some extra work to
handle values that are invalid for nir_component_mask_t in the first
place.
This eliminates some warnings on Clang, where the compiler complains
about casting UINT32_MAX to UINT16_MAX.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15547>
This should reduce follow-on optimization work to copy-propagate and
dead-code away the movs generated in construction of vectors.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14865>
This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after()
with an SSA def, and rewrites all the users as needed.
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
Instead of recreating paths, create them once when needed using
nir_deref_and_path.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7511>
Without this patch, copy propagation pass can optimize out
buffer loads out of compare & swap loop, which then leads
to infinite loop.
Triggered by a change to atomicCompSwap float test in piglit.
Fixes: 8424cd8fbd ("nir: Account for atomics in copy propagation.")
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7538>
We need to consider shader calls as potential writes to their payloads.
For other ray-tracing intrinsics, we may not have a shader payload
pointer and have to treat them more like a barrier. We also need to
ensure that global and SSBO reads/writes aren't propagated across shader
call intrinsics.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
All the checks being replaced are fore potential aliasing so we want to
flush stores whenever the mode might be something that aliases.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
The warning is kind of silly:
Test case 'dEQP-GLES2.functional.shaders.indexing.tmp_array.vec3_const_write_static_read_vertex'..
==1874780== Source and destination overlap in memcpy(0xa261690, 0xa261690, 160)
==1874780== at 0x484D498: __GI_memcpy (vg_replace_strmem.c:1037)
==1874780== by 0x596FC07: copy_entry_remove (nir_opt_copy_prop_vars.c:296)
The "memcpy is undefined if they overlap" thing is surely meant to be
"memcpy with *partial* overlap is undefined", but let's keep anyone else
from having to debug this.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6178>
SPIRV OpControlBarrier can have both a memory and a control barrier
which some hardware can handle with a single instruction. Let's
turn the scoped_memory_barrier into a scoped barrier which can embed
both barrier types. Note that control-only or memory-only barriers can
be supported through this new intrinsic by passing NIR_SCOPE_NONE to the
unused barrier type.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
For deref_store, we can still delete invalid stores that write to
statically OOB data. For everything, we need to make sure that we kill
aliases of destinations even if it's volatile.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Add a NIR instrinsic that represent a memory barrier in SPIR-V /
Vulkan Memory Model, with extra attributes that describe the barrier:
- Ordering: whether is an Acquire or Release;
- "Cache control": availability ("ensure this gets written in the memory")
and visibility ("ensure my cache is up to date when I'm reading");
- Variable modes: which memory types this barrier applies to;
- Scope: how far this barrier applies.
Note that unlike in SPIR-V, the "Storage Semantics" and the "Memory
Semantics" are split into two different attributes so we can use
variable modes for the former.
NIR passes that took barriers in consideration were also changed
- nir_opt_copy_prop_vars: clean up the values for the mode of an
ACQUIRE barrier. Copy propagation effect is to "pull up a load" (by
not performing it), which is what ACQUIRE restricts.
- nir_opt_dead_write_vars and nir_opt_combine_writes: clean up the
pending writes for the modes of an RELEASE barrier. Dead writes
effect is to "push down a store", which is what RELEASE restricts.
- nir_opt_access: treat the ACQUIRE and RELEASE as a full barrier for
the modes. This is conservative, but since this is a GL-specific
pass, doesn't make a difference for now.
v2: Fix the scoped barrier handling in copy propagation. (Jason)
Add scoped barrier handling to nir_opt_access and
nir_opt_combine_writes. (Rhys)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Right now nir_copy_prop_vars is effectively undoing
nir_lower_io_to_temporaries for inputs by propagating the original
variable through the copy created in lower_io_to_temporaries. A
theoretical variable coalescing pass would have the same issue with
output variables, although that doesn't exist yet. To fix this, add a
new bit to nir_variable, and disable copy propagation when it's set.
This doesn't seem to affect any drivers now, probably since since no one
uses lower_io_to_temporaries for inputs as well as copy_prop_vars, but
it will fix radv once we flip on lower_io_to_temporaries for fs inputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The spec explicitly says that volatile writes can't be removed and
volatile reads do not guarantee that the same value will still be around
after the read, as if there were a barrier after each read/write. Just
ignore them.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Silence two unused var warnings. And init elem_size, elem_align to
zero to silence "maybe uninitialized" warnings.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Fix this build error with GCC 4.4.7.
CC nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘load_element_from_ssa_entry_value’:
nir/nir_opt_copy_prop_vars.c:454: error: unknown field ‘ssa’ specified in initializer
nir/nir_opt_copy_prop_vars.c:455: error: unknown field ‘def’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: unknown field ‘component’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: extra brace group at end of initializer
nir/nir_opt_copy_prop_vars.c:456: error: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:456: warning: excess elements in union initializer
nir/nir_opt_copy_prop_vars.c:456: warning: (near initialization for ‘(anonymous).<anonymous>’)
Fixes: 96c32d7776 ("nir/copy_prop_vars: handle load/store of vector elements")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109810
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load. Such SSA
values don't help loading of the indirect unless we emit an if-ladder.
Copy_derefs are supported for indirects.
Also enable two tests that now pass.
v2: Remove unnecessary temporaries. Be clearer when identifying the
case where copy_entry doesn't help when we are dealing with an
indirect array_deref (of a vector). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.
This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.
Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w. Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.
The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.
It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.
Tests related to these cases are now enabled.
v2: Instead of asserting on invalid indices, "load" an undef and
remove the store. (Jason)
v3: Merge code path for the cases of is_array_deref_of_vector into the
regular code path. Add a base_index parameter to
value_set_from_value. (code changes by Jason)
v4: Removed the get_entry_for_deref helper, now being used only once.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Also replace uses of 0xf with the appropriate full mask created from
the number of components.
Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The name reflected this function role back when the pass also did dead
write elimination. So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes following valgrind warning:
==27561== Conditional jump or move depends on uninitialised value(s)
==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78)
==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797)
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When a copy_entry is SSA, store not only the nir_ssa_def* for each
component, but also the source component they come from. At the
moment this is always a match (i.e. 'component[i] == i'), because all
the operations for a copy_entry happen using definitions with the same
size. This prepares the code for array_derefs of vectors, in which
'component[i] != i'.
Also, extract setting all SSA components into a function of its own.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Disabled by default, to be used during development. Adding those
so I don't rewrite some ad-hoc version of them everytime I'm working
with this pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For now these derefs are not handled, so don't let these get into the
copies list -- which would cause wrong propagations. For load_derefs,
do nothing. For store_derefs, invalidate whatever the store is
writing to. For copy_derefs, invalidate whatever the copy is writing
to.
These cases will happen once derefs to SSBOs/UBOs are kept around long
enough to get optimized by copy_prop_vars.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise writes get propagated across atomics if no barrier is
used. Without barrier writes should still be visible in the same
invocation, so an atomic has to be considered a write.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
NIR metadata validation verifies that the debug bit was unset (by a call
to nir_metadata_preserve) if a NIR optimization pass made progress on
the shader. With the expectation that the NIR shader consists of only a
single main function, it has been safe to call nir_metadata_preserve()
iff progress was made.
However, most optimization passes calculate progress per-function and
then return the union of those calculations. In the case that an
optimization pass makes progress only on a subset of the functions in
the shader metadata validation will detect the debug bit is still set on
any unchanged functions resulting in a failed assertion.
This patch offers a quick solution (short of a larger scale refactoring
which I do not wish to undertake as part of this series) that simply
unsets the debug bit on unchanged functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>