This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These will help us do a number of things, including:
- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:
if (foo) {
...
}
if (!foo) {
...
}
with:
if (foo) {
...
} else {
...
}
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a helper that will be shared between the new control flow
insertion and modification code.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().
In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We should be checking almost everything now.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly. Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.
Thanks to Connor Abbott for suggesting this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2, review from Francisco Jerez:
- make the destination variable as large as what the nir instrinsic
defines (4) instead of the size of the return variable of glsl. This
is still safe for the already existing code because all the intrinsics
affected returned the same amount of components as expected by glsl IR.
In the case of image_size, it is not possible to do so because the
returned number of component depends on the image type and this case
is not well handled by nir.
v3:
- Style fix
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.
Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs: 15785 -> 15585 (-1.27%)
helped: 36
HURT: 0
GAINED: 0
LOST: 0
Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26
Make that:
vec1 ssa_27 = fne ssa_21, 0.0f
This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
NIR instruction count results on i965:
total instructions in shared programs: 1261954 -> 1261937 (-0.00%)
instructions in affected programs: 455 -> 438 (-3.74%)
One in yofrankie, two in tropics. Apparently i965 had also optimized all
of these out anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
There are so many flags in textures, that the CSE pass would have a hard
time referencing the correct set when figuring out if two texture ops are
the same. By zeroing, we can avoid that fragility.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Avoids regressions in vc4 when trying to do our blending in NIR.
v2: Add the other unpack ops I meant to when writing the original commit
message.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We may find a cause to do more undef optimization in the future, but for
now this fixes up things after if flattening. vc4 was handling this
internally most of the time, but a GLB2.7 shader that did a conditional
discard and assign gl_FragColor in the else was still emitting some extra
code.
total instructions in shared programs: 100809 -> 100795 (-0.01%)
instructions in affected programs: 37 -> 23 (-37.84%)
v2: Use nir_instr_rewrite_src() to update def/use on src[0] (by Thomas
Helland).
v3: Make sure to flag metadata dirties, and copy the swizzle and abs/neg
over to src[0], too (by anholt).
Reviewed-by: Thomas Helland <thomashelland90@gmail.com> (v2)
Tested-by: Thomas Helland <thomashelland90@gmail.com> (v2)
This is useful to increase the CSE opportunities for a scalar backend. It
avoids regressions when dropping vc4's custom CSE implementation.
v2: Cleanups by Matt (decl in the for loop, and unreachable()).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Avoid copying an overwritten swizzle, use the original values.
Example:
Former swizzle[] = xyzw
src->swizzle[] = zyxx
The expected output swizzle = zyxx but if we reuse swizzle in the loop,
then output swizzle would be zyzz.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The current implementation operates in scalar mode only, so add a vec4
mode where types are padded to vec4 sizes.
This will be useful in the i965 driver for its vec4 nir backend
(and possbly other drivers that have vec4-based shaders).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The only values allowed are 0 and 1, and the value is checked before
assigning.
This is a copy of 8eeca7a56c that seems to have been made to the glsl
ir type after it was copied for use in nir but before nir landed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This type will be used to store the name of subroutine types
as in subroutine void myfunc(void);
will store myfunc into a subroutine type.
This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.
Also add contains_subroutine method.
v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
tests: fix warnings