This commit adds a new num_components value for intrinsic sources of -1
which means that it consumes everything and the number of components
effectively isn't validated. This is useful for deref sources which
just take the result of the deref and we leave it up to the driver to
decide what that size should be.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We added this assert when first moving derefs over to instructions to
ensure that deref chains could go all the way back to the variables.
Now that we're going to start using derefs for things that we can do
variable pointers on such as UBOs and SSBOs, we need to be able to run
derefs through phi nodes, selects, and basically anything else.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already detect any incomplete deref chains (where the deref is used
for something other than another deref or a load/store) and flag the
variable as used thanks to deref_used_for_not_store. All that's left to
do is to properly skip casts when cleaning up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass is used when, for instance, we lazily change the mode of
variables rather than replacing the variable with a new one. Since we
only do this in cases where we know we have full deref chains, it's ok
to just skip them in fixup_deref_modes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The code which constructs deref paths already gives you the path
starting at the nearest deref_cast or deref_var. All we need to do for
casts is handle the case where the start of the path isn't a deref_var.
For ptr_as_array derefs, we just bail if we have any after the
divergence point between the two derefs. We may be able to do better in
the future but this works for now.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When handling casts, we can't blindly propagate the parent of a cast
into a ptr_as_array deref because doing so might loose the stride
information from the cast. Instead, before we can propagate into
ptr_as_array derefs, we need to check that the cast is a cast of an
array deref and that the stride matches. For other types of derefs, we
can continue to propagate casts as normal because they don't need the
stride. We also add an optimization which can combine a ptr_as_array
deref with it parent if it is also an array deref of some form.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
These correspond directly to SPIR-V's OpPtrAccessChain. As such, they
treat whatever their parent gives them as if it's the first element in
some array and dereferences that array. If the parent is, itself, an
array deref, then the two indices can just be added together to get the
final array deref. However, it can also be used in cases where what you
have is a dereference to some random vec2 value somewhere. In this
case, we require a cast before the ptr_as_array and use the ptr_stride
field in the cast to provide a stride for the ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We're going to want to do more deref optimizations going forward and
this gives us a central place to do them. Also, cast propagation will
get a bit more complicated with the addition of ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
SPIR-V allows for matrix and array types to be decorated with explicit
byte stride decorations and matrix types to be decorated row- or
column-major. This commit adds support to glsl_type to encode this
information. Because this doesn't work nicely with std430 and std140
alignments, we add asserts to ensure that we don't use any of the std430
or std140 layout functions with explicitly laid out types.
In SPIR-V, the layout information for matrices is applied to the parent
struct member instead of to the matrix type itself. However, this is
gets rather clumsy when you're walking derefs trying to compute offsets
because, the moment you hit a matrix, you have to crawl back the deref
chain and find the struct. Instead, we take the same path here as we've
taken in spirv_to_nir and put the decorations on the matrix type itself.
This also subtly adds support for strided vector types. These don't
come up in SPIR-V directly but you can get one as the result of taking a
column from a row-major matrix or a row from a column-major matrix.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It was added in bce6f99875 even though it's completely redundant with
glsl_array_type().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, NIR had a single nir_var_uniform mode used for atomic
counters, UBOs, samplers, images, and normal uniforms. This commit
splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform
is still a bit of a catch-all but the nir_var_ubo is specific to UBOs.
While we're at it, we also rename shader_storage to ssbo to follow the
convention.
We need this so that we can distinguish between normal uniforms and UBO
access at the deref level without going all the way back variable and
seeing if it has an interface type.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I have no idea how shader_storage made it into the list of banned
variable modes for stores but it clearly should be allowed. This only
doesn't cause us a problem today because we never actually use derefs on
shader_storage variables.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This doesn't currently change anything because array indices are
required to be 32 bits and all derefs are also 32 bits. However, we
will one day have 64-bit derefs for OpenCL.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and
vc4, but nir could potentially do some useful stuff for us (like avoiding
unpack/repacks) if we give it the information.
v2: Skip lowering for txs/query_levels
v3: Fix a crash on old-style shadow
v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack
the enum.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When copy_prop_vars also took care of dead write handling, intrin was
used as part of store_to_entry. Now it isn't, so this assignment
isn't used really used. Add a comment clarifying what happens to
intrin.
Fixes: 4dfa7adc10 "nir: Remove handling of dead writes from copy_prop_vars"
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
After trying multiple times to merge if-statements with phis
between them I've come to the conclusion that it cannot be done
without regressions. The problem is for some shaders we end up
with a whole bunch of phis for the merged ifs resulting in
increased register pressure.
So this patch just merges ifs that have no phis between them.
This seems to be consistent with what LLVM does so for radeonsi
we only see a change (although its a large change) in a single
shader.
Shader-db results i965 (SKL):
total instructions in shared programs: 13098176 -> 13098152 (<.01%)
instructions in affected programs: 1326 -> 1302 (-1.81%)
helped: 4
HURT: 0
total cycles in shared programs: 332032989 -> 332037583 (<.01%)
cycles in affected programs: 60665 -> 65259 (7.57%)
helped: 0
HURT: 4
The cycles estimates reported by shader-db for i965 seem inaccurate
as the only difference in the final code is the removal of the
redundent condition evaluations and jumps.
Also the biggest code reduction (~7%) for radeonsi was in a tomb
raider tressfx shader but for some reason this does not get merged
for i965.
Shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 232 -> 232 (0.00 %)
VGPRS: 164 -> 164 (0.00 %)
Spilled SGPRs: 59 -> 59 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14584 -> 13520 (-7.30 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 13 -> 13 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Makes debugging easier when we care about the deref chain and not the
deref instruction itself. To make it take a const pointer, constify
some of the static functions in nir_print.c.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just cleans things up a little and make things more safe for
derefs.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be reused by the following patch.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The former expects to see SSA-only things, but the latter injects registers.
The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.
Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.
The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers. The bug was saving
such pointer to use later. Change the code to first perform the
removals and then find the remaining right entry.
This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask. Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.
Change code to clean up the array before use to avoid invalid data
around.
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value. The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.
v2: Remove misleading comment about sized literals (suggested by
Timothy). Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Fixes: 6bcd2af086 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
Tested on gen9.
v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If we can't find the variable from the deref, just assume it isn't
invariant and continue on. This can happen if, for instance, we're
writing to a deref that points into an SSBO.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
"There's no point in walking the program if we're never going to
actually lower anything."
Except we might lower compacted local arrays. In that case, modes will
be 0, but there is still lowering to be done.
This reverts commit 7f75cf2a94.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I botched some copy-and-paste and clamped to signed int max instead of
uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on
skl.
Fixes: d3e046e76c ("nir: Pull some of intel's image load/store format
conversion to nir_format.h")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
nir_sweep already marks all metadata invalid, so it is safe to release
the memory here too.
mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475
gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811
deus ex mankind divided 148: 62,845,304 => 62,829,640
deus ex mankind divided 2890: 71,922,686 => 71,922,686
dirt showdown 676: 69,238,607 => 69,238,607
dolphin ubershaders 210: 77,822,072 => 77,822,072
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>