Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time. We can drop these specifiers and now the optimizations
will apply more generally.
Shader-DB results on Kaby Lake:
total instructions in shared programs: 15321168 -> 15321227 (<.01%)
instructions in affected programs: 8836 -> 8895 (0.67%)
helped: 1
HURT: 31
total cycles in shared programs: 357481781 -> 357481321 (<.01%)
cycles in affected programs: 146524 -> 146064 (-0.31%)
helped: 22
HURT: 10
total spills in shared programs: 23675 -> 23673 (<.01%)
spills in affected programs: 11 -> 9 (-18.18%)
helped: 1
HURT: 0
total fills in shared programs: 32040 -> 32036 (-0.01%)
fills in affected programs: 27 -> 23 (-14.81%)
helped: 1
HURT: 0
No change in VkPipeline-DB
Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb. In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL. Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win? Also it helps spilling in one Car Chase compute shader.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If a def is used as an condition before its definition, we should also
consider this a case to repair. When repairing, make sure we rewrite
any if conditions too.
Found in while inspecting a SPIR-V conversion from a 'continue block'
that contains a conditional branch. We pull the continue block up to
the beggining of the loop, and the condition in the branch ends up
defined afterwards.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 364212f1ed "nir: Add a pass to repair SSA form"
The current algorithm only supports packing 32-bit types.
If a shader uses both 16-bit and 32-bit varyings, we shouldn't
compact them together.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
While a partial set of viewport system values exist, these are scalar
values, which is a poor fit for viewport transformations on vector ISAs
like Midgard (where the vec3 values for scale and offset each need to be
coherent in a vec4 uniform slot to take advantage of vectorized
transform math). This patch adds vec3 scale/offset fields corresponding
to the 3D Gallium viewport / glViewport+depth
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
If we increase the vector size in the future it would be good
to not have to fix these up, this should change nothing at present.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This prevents getting mixed-up results if a multi-threaded app has two
validation errors in different threads.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This pass attempts to dectect code sequences like
if (x < y) {
z = y - x;
...
}
and replace them with sequences like
t = x - y;
if (t < 0) {
z = -t;
...
}
On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction. It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.
Currently only floating point compares and adds are supported. Adding
support for integer will be a challenge due to integer overflow. There
are a couple possible solutions, but they may not apply to all
architectures.
v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block(). Add
missing (-A + B) case. Suggested by Caio.
v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32. Suggested by Ken.
v4: Add some comments explaining how this works. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Move bug fix in get_neg_instr from the next patch to this patch
(where it was intended to be in the first place). Noticed by Caio.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
No shader-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
llvm/spir-v spits out some struct a { struct b {} }, but it
doesn't deref, it casts (struct a) to (struct b), reconstruct
struct derefs instead of casts for these.
v2: use ssa_def_rewrite uses, rework the type restrictions (Jason)
v3: squish more stuff into one function, drop unused temp (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will allow us to make use of the selection control support in
spirv and the GL support provided by EXT_control_flow_attributes.
Note this only supports if-statements as we dont support switches
in NIR.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
This will allow us to make use of the loop control support in
spirv and the GL support provided by EXT_control_flow_attributes.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On Intel, we have both bindless and bindful and we'd like to use them at
the same time if we can so we need to be able to distinguish at the NIR
level between the two. This also fixes nir_lower_tex to properly handle
bindless in its tex_texture_size and get_texture_lod helpers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2 (Topi):
- Make bit-size handling order be 16-bit, 32-bit, 64-bit
- Clamp lower exponent range at -28 instead of -30.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With vkpipelinedb Samuel discovered a regression since we stopped
stripping types at the spir-v level.
This adds a check to the var splitting for the case where it
asserts the type hasn't changed, when it has just created a bare
type, and it's different than the original type which has an explicit
stride.
This also removes a pointless assert that also triggers.
Fixes: 3b3653c4cf (nir/spirv: don't use bare types, remove assert in split vars for testing)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.
While we are at it, factorize a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only the exponent needs to be 32-bit signed integer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix this build error with GCC 4.4.7.
CC nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘load_element_from_ssa_entry_value’:
nir/nir_opt_copy_prop_vars.c:454: error: unknown field ‘ssa’ specified in initializer
nir/nir_opt_copy_prop_vars.c:455: error: unknown field ‘def’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: unknown field ‘component’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: extra brace group at end of initializer
nir/nir_opt_copy_prop_vars.c:456: error: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:456: warning: excess elements in union initializer
nir/nir_opt_copy_prop_vars.c:456: warning: (near initialization for ‘(anonymous).<anonymous>’)
Fixes: 96c32d7776 ("nir/copy_prop_vars: handle load/store of vector elements")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109810
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Rather than skipping code that looked like this:
loop {
...
if (cond) {
do_work_1();
continue;
} else {
break;
}
do_work_2();
}
Previously we would turn this into:
loop {
...
if (cond) {
do_work_1();
continue;
} else {
do_work_2();
break;
}
}
This was clearly wrong. This change checks for this case and makes
sure we now leave it for nir_opt_dead_cf() to clean up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In some cases, we can end up with varying structs that aren't split to
their member variables. nir_compact_varyings attempted to record these
as unmovable, so it would leave them be. Unfortunately, it didn't do
it right for non-vector/scalar types. It set the mask to:
((1 << (elements * dmul)) - 1) << var->data.location_frac
where elements is the number of vector elements. For structures and
other non-vector/scalars, elements is 0...so the whole mask became 0.
This caused nir_compact_varyings to assign other varyings on top of
the structure varying's location (as it appeared to take up no space).
To combat this, we just set elements to 4 for non-vector/scalar types,
so that the entire slot gets marked as unmovable.
Fixes KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in on iris.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Users of this function expect alu to be a supported comparision
if the induction variable is not NULL. Since we attempt to
override the return values if the first limit is not a const, we
must make sure we are dealing with a valid comparision before
overriding the alu instruction.
Fixes an unreachable in inverse_comparison() with the game
Assasins Creed Odyssey.
Fixes: 3235a942c1 ("nir: find induction/limit vars in iand instructions")
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110216
Values inside the offsets parameter of textureGatherOffsets are required to be
constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET,
GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET].
As this range is never outside [-32, 31] for all existing drivers inside mesa,
we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr.
Right now only Nvidia hardware supports this in hardware, so we can turn this
on inside Nouveau for the NIR path as it is already enabled with the TGSI one.
v2: use memcpy instead of for loops
add missing bits to nir_instr_set
don't show offsets if they are all 0
v3: default offsets aren't all 0
v4: rename offsets -> tg4_offsets
rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Not sure how ptr_stride should be taken into account if at all here
v2: reorder check to avoid src walking (Jason)
v3: remove is_cast_cast checks, keep going afterwards (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For OpenCL we never want to strip the info from the types, and it makes
type comparisons easier in later stages. We might later need a nir pass to
strip this for GLSL, but so far the only regression is the assert and Jason
said removing that is fine.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit db57db5317. When
building IR, nothing is really immutable and, since C has no concept of
constness propagating beyond the first pointer, we have to be vary
careful with how we use it. To just throw const into a function like
this is a lie.
Instead, we should just drop the unneeded const in spirv_to_nir which
this commit does along with the revert.
local memory is too small to require 64 bit pointers, so cast the array index
to a 32 bit value to save up on 64 bit operations.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This pass was originally written for lowering TCS output reads and
writes but it is also applicable just about anything including UBOs,
SSBOs, and shared variables.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>