It is possible and valid for a pointer to be selected based on a
conditional before used, and depending on the mode, those cases will
result in a phi with derefs as sources.
To achieve this, we don't rematerialize derefs that are used by phis.
As a consequence, when converting from SSA to regs, we may have phis
that come from different blocks and are used by phis. We now convert
those to regs too.
Validation was added to ensure only derefs of certain modes can be
used as phi sources. No extra validation is needed for the presence
of cast, any instruction that uses derefs will validate the
deref-chain is complete (ending in a cast or a var).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
we validate assert entry just before this, but since that doesn't
stop execution, we need to check entry before the next validation
assert.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The current SSA def validation we do in nir_validate validates three
things:
1. That each SSA def is only ever used in the function in which it is
defined.
2. That an nir_src exists in an SSA def's use list if and only if it
points to that SSA def.
3. That each nir_src is in the correct use list (uses or if_uses) based
on whether it's an if condition or not.
The way we were doing this before was that we had a hash table which
provided a map from SSA def to a small ssa_def_validate_state data
structure which contained a pointer to the nir_function_impl and two
hash sets, one for each use list. This meant piles of allocation and
creating of little hash sets. It also meant one hash lookup for each
SSA def plus one per use as well as two per src (because we have to look
up the ssa_def_validate_state and then look up the use.) It also
involved a second walk over the instructions as a post-validate step.
This commit changes us to use a single low-collision hash set of SSA
sources for all of this by being a bit more clever. We accomplish the
objectives above as follows:
1. The list is clear when we start validating a function. If the
nir_src references an SSA def which is defined in a different
function, it simply won't be in the set.
2. When validating the SSA defs, we walk the uses and verify that they
have is_ssa set and that the SSA def points to the SSA def we're
validating. This catches the case of a nir_src being in the wrong
list. We then put the nir_src in the set and, when we validate the
nir_src, we assert that it's in the set. This takes care of any
cases where a nir_src isn't in the use list. After checking that
the nir_src is in the set, we remove it from the set and, at the end
of nir_function_impl validation, we assert that the set is empty.
This takes care of any cases where a nir_src is in a use list but
the instruction is no longer in the shader.
3. When we put a nir_src in the set, we set the bottom bit of the
pointer to 1 if it's the condition of an if. This lets us detect
whether or not a nir_src is in the right list.
When running shader-db with an optimized debug build of mesa on my
laptop, I get the following shader-db CPU times:
With NIR_VALIDATE=0 3033.34 seconds
Before this commit 20224.83 seconds
After this commit 6255.50 seconds
Assuming shader-db is a representative sampling of GLSL shaders, this
means that making this change yields an 81% reduction in the time spent
in nir_validate. It still isn't cheap but enabling validation now only
increases compile times by 2x instead of 6.6x.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
All of our hash tables and sets are already using ralloc. There's
really no good reason why we don't just make a ralloc context rather
than try to remember to clean everything up manually.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All we ever do is initialize it to zero, clone it, print it, and
validate it. No one ever sets or uses it.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This prevents getting mixed-up results if a multi-threaded app has two
validation errors in different threads.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Values inside the offsets parameter of textureGatherOffsets are required to be
constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET,
GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET].
As this range is never outside [-32, 31] for all existing drivers inside mesa,
we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr.
Right now only Nvidia hardware supports this in hardware, so we can turn this
on inside Nouveau for the NIR path as it is already enabled with the TGSI one.
v2: use memcpy instead of for loops
add missing bits to nir_instr_set
don't show offsets if they are all 0
v3: default offsets aren't all 0
v4: rename offsets -> tg4_offsets
rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets
Signed-off-by: Karol Herbst <kherbst@redhat.com>
With UBOs and SSBOs we have boolean types but they're actually 32-bit
values. Make the validator a little less strict so that we can do a
32-bit load/store on boolean types. We're about to add a lowering pass
called gl_nir_lower_buffers which will lower boolean load/store
operations to 32-bit and insert i2b and b2i instructions to convert
to/from 1-bit booleans. We want that to be legal.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we want to be able to use copy_deref instructions on explicitly laid
out types, we have to be a little more flexible about what types we
allow. Instead, of requiring the types to exactly match, only require
the bare types to match.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
With OpenCL some system values match the address bits, but in GLSL we also
have some system values being 64 bit like subgroup masks.
With this it is possible to adjust the builder functions so that depending
on the bit_sizes the correct bit_size is used or an additional argument is
added in case of multiple possible values.
v2: validate dest bit_size
v3: generate hex values in python code
remove useless imports
rename and move bit_sizes
v4: add 1 to legal bit_sizes for front_face
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We added this assert when first moving derefs over to instructions to
ensure that deref chains could go all the way back to the variables.
Now that we're going to start using derefs for things that we can do
variable pointers on such as UBOs and SSBOs, we need to be able to run
derefs through phi nodes, selects, and basically anything else.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
These correspond directly to SPIR-V's OpPtrAccessChain. As such, they
treat whatever their parent gives them as if it's the first element in
some array and dereferences that array. If the parent is, itself, an
array deref, then the two indices can just be added together to get the
final array deref. However, it can also be used in cases where what you
have is a dereference to some random vec2 value somewhere. In this
case, we require a cast before the ptr_as_array and use the ptr_stride
field in the cast to provide a stride for the ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I have no idea how shader_storage made it into the list of banned
variable modes for stores but it clearly should be allowed. This only
doesn't cause us a problem today because we never actually use derefs on
shader_storage variables.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This doesn't currently change anything because array indices are
required to be 32 bits and all derefs are also 32 bits. However, we
will one day have 64-bit derefs for OpenCL.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit adds support for 1-bit Booleans and integers. Booleans
obviously take a value of true or false. Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit
arithmetic is then well-defined in the usual way, just with fewer bits.
The definition of int1_t and uint1_t doesn't usually matter but we do
need something for purposes of constant folding.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The lcssa and phis_to_regs passes are used by various NIR optimizations
that modify the CFG. Putting a couple of asserts will help ensure that
we don't accidentally put derefs in phis as part of an optimization
pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
OpenCL knows vector of size 8 and 16.
v2: rebased on master (nir_swizzle rework)
rework more declarations with nir_component_mask_t
adjust print_var_decl
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit completely reworks function calls in NIR. Instead of having
a set of variables for the parameters and return value, nir_call_instr
now has simply has a number of sources which get mapped to load_param
intrinsics inside the functions. It's up to the client API to build an
ABI on top of that. In SPIR-V, out parameters are handled by passing
the result of a deref through as an SSA value and storing to it.
This virtue of this approach can be seen by how much it allows us to
delete from core NIR. In particular, nir_inline_functions gets halved
and goes from a fairly difficult pass to understand in detail to almost
trivial. It also simplifies spirv_to_nir somewhat because NIR functions
never were a good fit for SPIR-V.
Unfortunately, there is no good way to do this without a mega-commit.
Core NIR and SPIR-V have to be changed at the same time. This also
requires changes to anv and radv because nir_inline_functions couldn't
handle deref instructions before this change and can't work without them
after this change.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a concept of "members" to a variable with an interface type.
It allows you to specify the full variable data for each member of the
interface instead of once for the variable. We also add a lowering pass
to lower those variables to a sequence of variables and rewrite all the
derefs accordingly.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a new instruction type to NIR for handling derefs.
Nothing uses it yet but this adds the data structure as well as all of
the code to validate, print, clone, and [de]serialize them.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves the switch statement for specific intrinsics above source and
destination validation. We also rework the source and destination
validation to use different bit_size values for each source and/or
destination.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add helpers to get the number of src/dest components for an intrinsic,
and update spots that were open-coding this logic to use the helpers
instead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Not used in GL but 8 and 16 component vectors exist in OpenCL.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
nir_validate.c's #endif already had the correct NDEBUG comment
Fixes: dcb1acdea0 "nir/validate: Only build in debug mode"
Fixes: 9ff71b649b "i965/nir: Validate that NIR passes call nir_metadata_preserve()"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were already validating that the parent type goes along with the
child type but we weren't actually validating that the parent type is
reasonable. This fixes that.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <eric@anholt.net>