Providing nir variables for UBO/SSBO it is not required for Vulkan,
but it is needed for OpenGL (ARB_gl_spirv), like for example, to
gather info from the UBO/SSBO while linking.
In opposite with most cases where the nir variables is created, here
the type assigned is the full type (not just the bare type). This is
needed because while linking using the nir shader we need the explicit
layout info (explicit stride, explicit offset, row_major, etc).
Also, we need to assign an interface type, used also on the OpenGL
linker if it is a UBO/SSBO. See ir_variable::is_in_buffer_block as
example.
v2: assign interface_type to be the variable type, not need to be
arrayness (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
No shader-db change on any Intel platform. No shader-db run-time
difference on a certain 36-core / 72-thread system at 95% confidence
(n=20).
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There is no reason to mark the fmul in the expression
('fmul', ('fadd', a, b), ('fadd', a, b))
as commutative. If a source of an instruction doesn't match one of the
('fadd', a, b) patterns, it won't match the other either.
This change is enough to make this pattern work:
('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
('fadd', 1.0, ('fneg', a))),
('fmul', ('flrp', a, 1.0, a), b))
This pattern has 5 commutative expressions (versus a limit of 4), but
the first fmul does not need to be commutative.
No shader-db change on any Intel platform. No shader-db run-time
difference on a certain 36-core / 72-thread system at 95% confidence
(n=20).
There are more subpatterns that could be marked as non-commutative, but
detecting these is more challenging. For example, this fadd:
('fadd', ('fmul', a, b), ('fmul', a, c))
The first fadd:
('fmul', ('fadd', a, b), ('fadd', a, b))
And this fadd:
('flt', ('fadd', a, b), 0.0)
This last case may be easier to detect. If all sources are variables
and they are the only instances of those variables, then the pattern can
be marked as non-commutative. It's probably not worth the effort now,
but if we end up with some patterns that bump up on the limit again, it
may be worth revisiting.
v2: Update the comment about the explicit "len(self.sources)" check to
be more clear about why it is necessary. Requested by Connor. Many
Python fixes style / idom fixes suggested by Dylan. Add missing (!!!)
opcode check in Expression::__eq__ method. This bug is the reason the
expected number of commutative expressions in the bitfield_reverse
pattern changed from 61 to 45 in the first version of this patch.
v3: Use all() in Expression::__eq__ method. Suggested by Connor.
Revert away from using __eq__ overloads. The "equality" implementation
of Constant and Variable needed for commutativity pruning is weaker than
the one needed for propagating and validating bit sizes. Using actual
equality caused the pruning to fail for my ('fmul', ('fadd', 1, a),
('fadd', 1, a)) case. I changed the name to "equivalent" rather than
the previous "same_as" to further differentiate it from __eq__.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Search patterns that are expected to have too many (e.g., the giant
bitfield_reverse pattern) can be added to a white list.
This would have saved me a few hours debugging. :(
v2: Implement the expected-failure annotation as a property of the
search-replace pattern instead of as a property of the whole list of
patterns. Suggested by Connor.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The bfi/bfm behavior change replaced the bfi/bfm usage in
lower_bitfield_insert_to_shifts with actual shifts like the name says,
but it failed to handle the offset=0, bits==32 case in the new
lowering.
v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits,
31) -> false optimization.
Fixes regressions in dEQP-GLES31.*bitfield_insert* on freedreno.
Fixes: 165b7f3a44 ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The helpers are needed so we can use the syntax `instr(cond)` in the
algebraic rules. Add simple rule for dropping a pair of mul-div of
the same value when wrapping is guaranteed to not happen.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When handling the specified ALU operations, check for the decorations
and set nir_alu_instr no_signed_wrap and no_unsigned_wrap flags accordingly.
v2: Add a glsl_base_type_is_unsigned_integer() helper. (Karol)
v3: Rename helper to glsl_base_type_is_uint().
v4: Use two flags, so we don't need the helper anymore. (Connor)
v5: Pass alu directly to handle function. (Jason)
Reviewed-by: Karol Herbst <kherbst@redhat.com> [v3]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
They indicate the operation does not cause overflow or underflow.
This is motivated by SPIR-V decorations NoSignedWrap and
NoUnsignedWrap.
Change the storage of `exact` to be a single bit, so they pack
together.
v2: Handle no_wrap in nir_instr_set. (Karol)
v3: Use two separate flags, since the NIR SSA values and certain
instructions are typeless, so just no_wrap would be insufficient
to know which one was referred to. (Connor)
v4: Don't use nir_instr_set to propagate the flags, unlike `exact`,
consider the instructions different if the flags have different
values. Fix hashing/comparing. (Jason)
Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can vectorize instructions with different constant sources by creating
a new load_const and using that.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Incrementing the iteration count was intended to fix an off-by-one error
when the first terminator was superseded by a later terminator. If
there is no first terminator or later terminator, there is no off-by-one
error. Incrementing the loop count creates one. This can be seen in
loops like:
do {
if (something) {
// No breaks or continues here.
}
} while (false);
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Abel Briggs <abelbriggs1@hotmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110953
Fixes: 646621c66d ("glsl: make loop unrolling more like the nir unrolling path")
The iterator `i` already walks the right amount now that is
incremented by `dmul`, so no need to `* 2`. Fixes invalid memory
access in upcoming ARB_gl_spirv tests.
Failure bisected by Arcady Goldmints-Orlov.
Fixes: b019fe8a5b "glsl/nir: Fix handling of 64-bit values in uniform storage"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For n_columns == 1, we have a vector which is handled by the else
case. Fixes invalid memory access in upcoming ARB_gl_spirv tests.
Failure bisected by Arcady Goldmints-Orlov.
Fixes: 81e51b412e "nir: Make nir_constant a vector rather than a matrix"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lets us use the optimization pattern
(('ult', 31, ('iand', b, 31)), False) to remove the
bcsel instruction for code originating in D3D shaders.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The [iu]bfe and bfm instructions are defined to only use the five
least significant bits.
This optimizes a common pattern from D3D -> SPIR-V translation.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
These optimizations are based on the fact that
'and(a,b) <= umin(a,b)'.
For AMD, this series moves the optimization from LLVM to NIR,
so currently no vkpipeline-db changes here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't expect the output of a TXS instruction to be wider than a
vec3. Add an assert() to make sure this never happens.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In ARB_gl_spirv we'll be able to use variables for uniform buffers, so
don't use the descriptor intrinsics to lower the block access.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most places in NIR, we treat matrices like arrays. The one annoying
exception to this has been nir_constant where a matrix is a first-class
thing. This commit changes that so a matrix nir_constant is the same as
an array nir_constant. This makes matrix nir_constants a tiny bit more
expensive but shrinks all others by 96B.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Now that nir_const_value is a scalar, there's no reason why we need
multiple paths here and it's just extra paths to keep working. While
we're here, we also add a vtn_fail_if check that component indices are
in-bounds.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
It only accepts 32-bit integers so it should have a more descriptive
name. This patch should not be a functional change.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
All of the callers for this function are looking at interpolation
qualifiers and want to make sure they're declared flat. Any 64-bit
integer inputs need to be flat. It's also makes the function make more
sense since "integer" is fairly generic.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
All of the callers of this function really just want to know if the type
is an integer and don't care about bit size.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Even if only variables access flags are changed, the existing NIR
infrastructure expects metadata to be explicitly preserved, so do
that. Don't care about avoiding preserve to be called twice since the
cost is negligible.
This scenario can be triggered by dead variables, and also by other
intrinsics that read the variables -- but not cause progress to be
made when processing the intrinsics.
Fixes: f2d0e48ddc "glsl/nir: Add optimization pass for access flags"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unwrap any array in the variable type so we can get the sampler dim.
This fixes piglit test
spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-const-uniform-index.shader_test.
Fixes: f2d0e48ddc "glsl/nir: Add optimization pass for access flags"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Right now, this just deduces when we can arbitrarily reorder SSBO and
image loads, matching the existing logic in radeonsi's TGSI->LLVM pass.
This approach can't handle some things that nir_opt_copy_prop_vars can,
but it can handle images, and with GCM it lets us hoist reads outside of
loops. We can also pass this information to LLVM which lets it do its
own optimizations on it.
This is GLSL only as I haven't tested it on Vulkan yet, and it would
probably need a few changes to work there.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The spec explicitly says that volatile writes can't be removed and
volatile reads do not guarantee that the same value will still be around
after the read, as if there were a barrier after each read/write. Just
ignore them.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We were completely ignoring these before, except for putting them on
variables. While we're here, don't set access qualifiers when converting
to bindless since glsl_to_nir will already have set a more accurate
qualifier that includes any qualifiers on struct members that are
dereferenced.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
In the next commit, we'll properly handle access qualifiers on struct
members by propagating them to load/store instructions, but these
instructions had no way to specify the qualifier.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This effectively does the opposite of nir_lower_alus_to_scalar, trying
to combine per-component ALU operations with the same sources but
different swizzles into one larger ALU operation. It uses a similar
model as CSE, where we do a depth-first approach and keep around a hash
set of instructions to be combined, but there are a few major
differences:
1. For now, we only support entirely per-component ALU operations.
2. Since it's not always guaranteed that we'll be able to combine
equivalent instructions, we keep a stack of equivalent instructions
around, trying to combine new instructions with instructions on the
stack.
The pass isn't comprehensive by far; it can't handle operations where
some of the sources are per-component and others aren't, and it can't
handle phi nodes. But it should handle the more common cases, and it
should be reasonably efficient.
[Alyssa: Rebase on latest master, updating with respect to typeless
moves]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The V3D driver has an open-coded solution for this, and we need the
same thing for Panfrost, so let's add a generic way to lower TXS(LOD)
into max(TXS(0) >> LOD, 1).
Changes in v2:
* Use == 0 instead of !
* Rework the minification logic as suggested by Jason
* Assign cursor pos at the beginning of the function
* Patch the LOD just after retrieving the old value
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
get_texture_size() will create a txs instruction with ->sampler_dim set
to the original tex->sampler_dim. The condition to call lower_rect()
only checks the value of ->sampler_dim and whether lower_rect is
requested or not. This leads to an infinite loop when calling
nir_lower_tex() with the same options until it returns false.
In order to avoid that, let's move the tex->sampler_dim patching before
get_texture_size() is called. This way the txs instruction will have
->sampler_dim set to GLSL_SAMPLER_DIM_2D and nir_lower_tex() won't try
to lower it on the subsequent passes.
Changes in v2:
* Add Jason R-b
* Add a comment explaining why we patch ->sampler_dim at the beginning
of the lower_rect() func
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The code considers that projector lowering was done even if it's not
really the case. Change the project_src() prototype to return a bool
encoding whether projector lowering happened or not and update the
progress var accordingly in nir_lower_tex_block().
---
Changes in v2:
* Add Jason R-b
* Drop the part suggesting that nir_lower_rect() could be called in
a do-while(progress) loop.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The VaryingNames array has NumVaryings entries. But BufferStride is
a small array of MAX_FEEDBACK_BUFFERS (4) entries. Programs with
more than 4 varyings would read out of bounds.
Also, BufferStride is set based on the shader itself, which means that
it's inherently already included in the hash, and doesn't need to be
included again. At the point when shader_cache_read_program_metadata
is called, the linker hasn't even set those fields yet. So, just drop
it entirely.
Fixes valgrind errors in KHR-GL45.transform_feedback.linking_errors_test.
Fixes: 6d830940f7 glsl/shader_cache: Allow shader cache usage with transform feedback
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Record types have their own slot to store the precision for each
member in glsl_struct_field. Previously if the member didn’t have an
explicit precision qualifier this was being left as
GLSL_PRECISION_NONE. This patch makes it take into account the type’s
default precision qualifier like it does for regular variables in
apply_type_qualifier_to_variable.
This has the additional benefit of correctly reporting an error when a
float type is used in a struct without declaring the default type.
Reviewed-by: Eric Anholt <eric@anholt.net>
This function is confusingly also used to match interstage interfaces
as well as intrastage. In the interstage case it needs to avoid
comparing the precisions. This patch adds a parameter to specify
whether to take the precision into account or not so that it can be
used for both cases.
Reviewed-by: Eric Anholt <eric@anholt.net>