In order to allow nir_gather_xfb_info to be used on OpenGL,
specifically ARB_gl_spirv.
So, from OpenGL 4.6 spec, section 11.1.2.1, "Output Variables":
"outputs specifying both an *XfbBuffer* and an *Offset* are
captured, while outputs not specifying both of these are not
captured. Values are captured each time the shader writes to such
a decorated object."
This implies that are captured if both are present, and not if one of
those are lacking. Technically, it doesn't explicitly point that
having just one or the other is a mistake. In some cases, glslang is
adding some extra XfbBuffer without XfbOffset around, and mentioning
that technically that is not a bug (see issue#1526)
And for the case of Vulkan, as the same glslang issue mentions, it is
not clear if that should be a mistake or not. But even if it is a
mistake, it is not really needed to be checked on the driver, and we
can let the validation layers to check that.
v2: simplify explicit_xfb_buffer and explicit_offset checks (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before, we were double-counting the component slots when we had a dvec3
or dvec4. Instead, just add them in once and manually offset the
recorded output offset.
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we have a transform feedback output like:
float[2] x2_out (VARYING_SLOT_VAR1.x, 0, 0)
which is lowered by nir_lower_io_arrays_to_elements to,
float x2_out (VARYING_SLOT_VAR1.x, 0, 0)
float x2_out@5 (VARYING_SLOT_VAR2.x, 0, 0)
We have to update the destination offset to avoid overwriting
the same value.
v2 (Jason Ekstrand):
- Compute the correct offsets for arrays of vectors and/or doubles
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When a xfb buffer is explicitely declared on a varying
variable, we shouldn't remove it at link time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
used for CL kernels
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: add assert to verify we have at least one valid bit_size
v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With OpenCL some system values match the address bits, but in GLSL we also
have some system values being 64 bit like subgroup masks.
With this it is possible to adjust the builder functions so that depending
on the bit_sizes the correct bit_size is used or an additional argument is
added in case of multiple possible values.
v2: validate dest bit_size
v3: generate hex values in python code
remove useless imports
rename and move bit_sizes
v4: add 1 to legal bit_sizes for front_face
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: rename nir_var_global to nir_var_mem_global
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Passes' function names, separated by comma, listed in NIR_SKIP
environment variable will be skipped in debug mode. The mechanism is
hooked into the _PASS macro, like NIR_PRINT.
The extra macro NIR_SKIP is available as a developer convenience, to
skip at pointer other than the passes entry points.
v2: Fix typo in NIR_SKIP macro. (Bas)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise writes get propagated across atomics if no barrier is
used. Without barrier writes should still be visible in the same
invocation, so an atomic has to be considered a write.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering.
ior: fmax instead of fadd allows removing the fsat.
inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better
with the scalar instruction set.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Even though no one's been brave enough to ever use this pass, I like to
keep it functionally working.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 393b59e077
('nir: Rework nir_lower_constant_initializers() to handle functions')
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
NIR metadata validation verifies that the debug bit was unset (by a call
to nir_metadata_preserve) if a NIR optimization pass made progress on
the shader. With the expectation that the NIR shader consists of only a
single main function, it has been safe to call nir_metadata_preserve()
iff progress was made.
However, most optimization passes calculate progress per-function and
then return the union of those calculations. In the case that an
optimization pass makes progress only on a subset of the functions in
the shader metadata validation will detect the debug bit is still set on
any unchanged functions resulting in a failed assertion.
This patch offers a quick solution (short of a larger scale refactoring
which I do not wish to undertake as part of this series) that simply
unsets the debug bit on unchanged functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Will be used to communicate that a shader uses 64-bit operations to the
concerned lowering passes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're going to have multiple functions, so nir_shader_get_entrypoint()
needs to do something a little smarter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously it assumed that only a single function (the entrypoint)
existed and attempted to lower constant initializers of shader outputs
for each function, for instance.
I think this was copy-and-paste mistake -- nir_opt_large_constants was
passing in glsl_get_natural_size_align_bytes() given brw_nir.c's arguments
to the opt pass.
I wanted to reuse this function for handling constant offsets of arrays of
images in V3D.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
V3D returns the texels in a different order in the resulting vec4 from
what GLSL wants, so we need to put in a swizzle. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Initialize the variable with NULL. Fixes the following
In file included from ../src/compiler/nir/nir_lower_io.c:34:
../src/compiler/nir/nir_lower_io.c: In function ‘nir_lower_explicit_io’:
../src/compiler/nir/nir.h:668:11: warning: ‘addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return src;
^~~
../src/compiler/nir/nir_lower_io.c:735:17: note: ‘addr’ was declared here
nir_ssa_def *addr;
^~~~
v2: Avoid using a 'default' case so we get help from the compiler when
new deref types are added. (Lionel)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This new pass is for lowering explicitly laid out memory coming in from
SPIR-V or a similar source. It's quite a bit more complicated than the
normal lower_io because we have to be able to handle matrices. The
way the stride information is stored for matrices is awkward and dealing
with row-major matrices is especially painful.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit adds a new num_components value for intrinsic sources of -1
which means that it consumes everything and the number of components
effectively isn't validated. This is useful for deref sources which
just take the result of the deref and we leave it up to the driver to
decide what that size should be.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We added this assert when first moving derefs over to instructions to
ensure that deref chains could go all the way back to the variables.
Now that we're going to start using derefs for things that we can do
variable pointers on such as UBOs and SSBOs, we need to be able to run
derefs through phi nodes, selects, and basically anything else.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already detect any incomplete deref chains (where the deref is used
for something other than another deref or a load/store) and flag the
variable as used thanks to deref_used_for_not_store. All that's left to
do is to properly skip casts when cleaning up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass is used when, for instance, we lazily change the mode of
variables rather than replacing the variable with a new one. Since we
only do this in cases where we know we have full deref chains, it's ok
to just skip them in fixup_deref_modes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The code which constructs deref paths already gives you the path
starting at the nearest deref_cast or deref_var. All we need to do for
casts is handle the case where the start of the path isn't a deref_var.
For ptr_as_array derefs, we just bail if we have any after the
divergence point between the two derefs. We may be able to do better in
the future but this works for now.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When handling casts, we can't blindly propagate the parent of a cast
into a ptr_as_array deref because doing so might loose the stride
information from the cast. Instead, before we can propagate into
ptr_as_array derefs, we need to check that the cast is a cast of an
array deref and that the stride matches. For other types of derefs, we
can continue to propagate casts as normal because they don't need the
stride. We also add an optimization which can combine a ptr_as_array
deref with it parent if it is also an array deref of some form.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>