We're going to have multiple functions, so nir_shader_get_entrypoint()
needs to do something a little smarter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously it assumed that only a single function (the entrypoint)
existed and attempted to lower constant initializers of shader outputs
for each function, for instance.
I think this was copy-and-paste mistake -- nir_opt_large_constants was
passing in glsl_get_natural_size_align_bytes() given brw_nir.c's arguments
to the opt pass.
I wanted to reuse this function for handling constant offsets of arrays of
images in V3D.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
V3D returns the texels in a different order in the resulting vec4 from
what GLSL wants, so we need to put in a swizzle. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Initialize the variable with NULL. Fixes the following
In file included from ../src/compiler/nir/nir_lower_io.c:34:
../src/compiler/nir/nir_lower_io.c: In function ‘nir_lower_explicit_io’:
../src/compiler/nir/nir.h:668:11: warning: ‘addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return src;
^~~
../src/compiler/nir/nir_lower_io.c:735:17: note: ‘addr’ was declared here
nir_ssa_def *addr;
^~~~
v2: Avoid using a 'default' case so we get help from the compiler when
new deref types are added. (Lionel)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This new pass is for lowering explicitly laid out memory coming in from
SPIR-V or a similar source. It's quite a bit more complicated than the
normal lower_io because we have to be able to handle matrices. The
way the stride information is stored for matrices is awkward and dealing
with row-major matrices is especially painful.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit adds a new num_components value for intrinsic sources of -1
which means that it consumes everything and the number of components
effectively isn't validated. This is useful for deref sources which
just take the result of the deref and we leave it up to the driver to
decide what that size should be.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We added this assert when first moving derefs over to instructions to
ensure that deref chains could go all the way back to the variables.
Now that we're going to start using derefs for things that we can do
variable pointers on such as UBOs and SSBOs, we need to be able to run
derefs through phi nodes, selects, and basically anything else.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already detect any incomplete deref chains (where the deref is used
for something other than another deref or a load/store) and flag the
variable as used thanks to deref_used_for_not_store. All that's left to
do is to properly skip casts when cleaning up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass is used when, for instance, we lazily change the mode of
variables rather than replacing the variable with a new one. Since we
only do this in cases where we know we have full deref chains, it's ok
to just skip them in fixup_deref_modes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The code which constructs deref paths already gives you the path
starting at the nearest deref_cast or deref_var. All we need to do for
casts is handle the case where the start of the path isn't a deref_var.
For ptr_as_array derefs, we just bail if we have any after the
divergence point between the two derefs. We may be able to do better in
the future but this works for now.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When handling casts, we can't blindly propagate the parent of a cast
into a ptr_as_array deref because doing so might loose the stride
information from the cast. Instead, before we can propagate into
ptr_as_array derefs, we need to check that the cast is a cast of an
array deref and that the stride matches. For other types of derefs, we
can continue to propagate casts as normal because they don't need the
stride. We also add an optimization which can combine a ptr_as_array
deref with it parent if it is also an array deref of some form.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
These correspond directly to SPIR-V's OpPtrAccessChain. As such, they
treat whatever their parent gives them as if it's the first element in
some array and dereferences that array. If the parent is, itself, an
array deref, then the two indices can just be added together to get the
final array deref. However, it can also be used in cases where what you
have is a dereference to some random vec2 value somewhere. In this
case, we require a cast before the ptr_as_array and use the ptr_stride
field in the cast to provide a stride for the ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We're going to want to do more deref optimizations going forward and
this gives us a central place to do them. Also, cast propagation will
get a bit more complicated with the addition of ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
SPIR-V allows for matrix and array types to be decorated with explicit
byte stride decorations and matrix types to be decorated row- or
column-major. This commit adds support to glsl_type to encode this
information. Because this doesn't work nicely with std430 and std140
alignments, we add asserts to ensure that we don't use any of the std430
or std140 layout functions with explicitly laid out types.
In SPIR-V, the layout information for matrices is applied to the parent
struct member instead of to the matrix type itself. However, this is
gets rather clumsy when you're walking derefs trying to compute offsets
because, the moment you hit a matrix, you have to crawl back the deref
chain and find the struct. Instead, we take the same path here as we've
taken in spirv_to_nir and put the decorations on the matrix type itself.
This also subtly adds support for strided vector types. These don't
come up in SPIR-V directly but you can get one as the result of taking a
column from a row-major matrix or a row from a column-major matrix.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It was added in bce6f99875 even though it's completely redundant with
glsl_array_type().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, NIR had a single nir_var_uniform mode used for atomic
counters, UBOs, samplers, images, and normal uniforms. This commit
splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform
is still a bit of a catch-all but the nir_var_ubo is specific to UBOs.
While we're at it, we also rename shader_storage to ssbo to follow the
convention.
We need this so that we can distinguish between normal uniforms and UBO
access at the deref level without going all the way back variable and
seeing if it has an interface type.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I have no idea how shader_storage made it into the list of banned
variable modes for stores but it clearly should be allowed. This only
doesn't cause us a problem today because we never actually use derefs on
shader_storage variables.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This doesn't currently change anything because array indices are
required to be 32 bits and all derefs are also 32 bits. However, we
will one day have 64-bit derefs for OpenCL.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and
vc4, but nir could potentially do some useful stuff for us (like avoiding
unpack/repacks) if we give it the information.
v2: Skip lowering for txs/query_levels
v3: Fix a crash on old-style shadow
v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack
the enum.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When copy_prop_vars also took care of dead write handling, intrin was
used as part of store_to_entry. Now it isn't, so this assignment
isn't used really used. Add a comment clarifying what happens to
intrin.
Fixes: 4dfa7adc10 "nir: Remove handling of dead writes from copy_prop_vars"
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
After trying multiple times to merge if-statements with phis
between them I've come to the conclusion that it cannot be done
without regressions. The problem is for some shaders we end up
with a whole bunch of phis for the merged ifs resulting in
increased register pressure.
So this patch just merges ifs that have no phis between them.
This seems to be consistent with what LLVM does so for radeonsi
we only see a change (although its a large change) in a single
shader.
Shader-db results i965 (SKL):
total instructions in shared programs: 13098176 -> 13098152 (<.01%)
instructions in affected programs: 1326 -> 1302 (-1.81%)
helped: 4
HURT: 0
total cycles in shared programs: 332032989 -> 332037583 (<.01%)
cycles in affected programs: 60665 -> 65259 (7.57%)
helped: 0
HURT: 4
The cycles estimates reported by shader-db for i965 seem inaccurate
as the only difference in the final code is the removal of the
redundent condition evaluations and jumps.
Also the biggest code reduction (~7%) for radeonsi was in a tomb
raider tressfx shader but for some reason this does not get merged
for i965.
Shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 232 -> 232 (0.00 %)
VGPRS: 164 -> 164 (0.00 %)
Spilled SGPRs: 59 -> 59 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14584 -> 13520 (-7.30 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 13 -> 13 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Makes debugging easier when we care about the deref chain and not the
deref instruction itself. To make it take a const pointer, constify
some of the static functions in nir_print.c.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just cleans things up a little and make things more safe for
derefs.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be reused by the following patch.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The former expects to see SSA-only things, but the latter injects registers.
The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.
Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.
The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers. The bug was saving
such pointer to use later. Change the code to first perform the
removals and then find the remaining right entry.
This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask. Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.
Change code to clean up the array before use to avoid invalid data
around.
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value. The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.
v2: Remove misleading comment about sized literals (suggested by
Timothy). Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Fixes: 6bcd2af086 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075