Enabling GVN uncovered a bug where we would crash if the pass
thinking about pushing something into a loop.
Fixes: 6538b3e566 ("nir: add heuristic for instructions in loops with GCM")
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12242>
This changes the pass to extract pinned instructions and not just unpinned
instructions when rescheduling instructions. This stops pinned instructions
from being bunched together when instructions are reinserted into the blocks
which can result in regressions with regards to cycles and instruction
counts on i965 and register use/Max Waves on AMD hardware.
In order to do this we also throw away the post-order depth-first
search linearization algorithm used to re-insert the instructions, which
itself causes possible regressions when instructions are reinserted into
a less than ideal new order (of which the bunched together pinned
instructions is one example). Instead we simply insert instructions in the
reverse order they were extracted. This will simply place instructions
that were scheduled earlier onto the end of their new block and
instructions that were scheduled later to the start of their new block.
With this everything should remain in order without the need to run
over uses.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
With this pass enabled in Intel drivers, running shader-db on
shaders/unity/38.shader_test resulted in
Program received signal SIGSEGV, Segmentation fault.
gcm_schedule_early_src (src=0x555555d45348, void_state=0x7fffffffba40) at ../../SOURCE/master/src/compiler/nir/nir_opt_gcm.c:297
297 if (info->early_block->index < src_info->early_block->index)
(gdb) print src_info->early_block
$1 = (nir_block *) 0x0
I tracked this down to an early exit from gcm_schedule_early_instr on
the parent instruction because instr->pass_flags was 0x1c. That
should be an impossible value for this pass, so I inferred that
pass_flags must have dirt left from some previous pass.
Fixes: 8dfe6f672f ("nir/GCM: Use pass_flags instead of bitsets for tracking visited/pinned")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
Use a single set and ensure dominance by checking after a equivalent
instruction is found.
Besides removing the need to copy a set, this also lets us resize the set
at the start of the pass in the next commit.
ministat (CSE only):
Difference at 95.0% confidence
-984.956 +/- 28.8559
-6.90075% +/- 0.190231%
(Student's t, pooled s = 26.9052)
ministat (entire run):
Difference at 95.0% confidence
-1246.1 +/- 257.253
-0.998972% +/- 0.205094%
(Student's t, pooled s = 239.863)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6390>
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
In both cases, we're trying to determine if a load is scalarizable. We
don't want to scalarize if it's a function_temp or shader_temp because
it might turn into something we can't scalarize.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
I can't think of any reason why shared and output aren't in this list.
The real thing we're trying to do is avoid premature scalarization
because of a shader or function temporary variable because we might
lower it to something we don't want scalarized later. Also fix the
version we copy+pasted into GCM.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
This has the same semantics as load_global except the memory it reads is
known to be constant so load_global_constant intrinsics can be CSEd
rather than relying on more complex copy-propagation.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6379>
This helps us avoid moving the movs outside if branches when there
src can't be scalarized.
For example it avoids:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
if ... {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
...
} else {
...
if ... {
r0 = imov ssa_7.x
r1 = imov ssa_7.w
...
else {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
...
}
r2 = imov ssa_7.x
r3 = imov ssa_7.w
}
...
vec4 32 ssa_36 = vec4 r0, r1, r2, r3
Becoming something like:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
if ... {
...
} else {
if ... {
r0 = imov r2
r1 = imov r3
...
else {
...
}
...
}
While this is has a smaller instruction count it requires more work
for the same result. With more complex examples we can also end up
shuffling the registers around in a way that requires more registers
to use as temps so that we don't overwrite our original values along
the way.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Here we only pull instructions further up control flow if they are
constant or texture instructions. See the code comment for more
information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
We can't move them later as we could move them into non-uniform
control flow, but moving them earlier should be fine.
This helps avoid a bunch of spilling in unigine shaders due to
moving the tex instructions sources earlier (outside if branches)
but not the instruction itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Classically, global code motion is also a dead code pass. However, in
the initial implementation, the decision was made to place every
instruction and let conventional DCE clean up the dead ones. Because
any uses of a dead instruction are unreachable, we have no late block
and the dead instructions are always scheduled early. The problem is
that, because we place the dead instruction early, it pushes the
placement of any dependencies of the dead instruction earlier than they
may need to be placed. In order prevent dead instructions from
affecting the placement of live ones, we need to delete them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Now that the GCM pass is more conservative and only moves instructions
to different blocks when it's advantageous to do so, we can have a
proper notion of what it means to make progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
We are about to adjust our instruction block assignment algorithm and we
will want to know the current block that the instruction lives in. In
order to allow for this, we can't overwrite nir_instr::block in the
early scheduling pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Even though no one's been brave enough to ever use this pass, I like to
keep it functionally working.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were using impl->num_blocks, but that isn't guaranteed to be
up-to-date until after the block_index metadata is required. If we were
unlucky, this could lead to overwriting memory.
Noticed by inspection.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a bug in code motion that occurred when the best block is the
same as the schedule early block. In this case, because we're checking
(lca != def->parent_instr->block) at the top of the loop, we never get to
the check for loop depth so we wouldn't move it out of the loop. This
commit reworks the loop to be a simple for loop up the dominator chain and
we place the (lca != def->parent_instr->block) check at the end of the
loop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Unlike the current CSE pass, global value numbering is capable of detecting
common values even if one does not dominate the other. For instance, in
you have
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
ssa_2 = ssa_0 + 7;
/* use ssa_2 */
}
Global value numbering doesn't care about dominance relationships so it
figures out that ssa_1 and ssa_2 are the same and converts this to
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
/* use ssa_1 */
}
Obviously, we just broke SSA form which is bad. Global code motion,
however, will repair this for us by turning this into
ssa_1 = ssa_0 + 7;
if (...) {
/* use ssa_1 */
} else {
/* use ssa_1 */
}
This intended to eventually mostly replace CSE. However, conventional CSE
may still be useful because it's less of a scorched-earth approach and
doesn't require GCM. This makes it a bit more appropriate for use as a
clean-up in a late optimization run.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.
No difference in OglBatch7 (n=20).
Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_use(\([^,]*\),\s*\([^,]*\))/nir_foreach_use(\2, \1)/
and similar expressions for nir_foreach_use_safe, etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_phi_src(\([^,]*\),\s*\([^,]*\))/nir_foreach_phi_src(\2, \1)/
and a similar expression for nir_foreach_phi_src_safe.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/
and similar expressions for nir_foreach_instr_safe etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>