This helps us avoid moving the movs outside if branches when there
src can't be scalarized.
For example it avoids:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
if ... {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
...
} else {
...
if ... {
r0 = imov ssa_7.x
r1 = imov ssa_7.w
...
else {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
...
}
r2 = imov ssa_7.x
r3 = imov ssa_7.w
}
...
vec4 32 ssa_36 = vec4 r0, r1, r2, r3
Becoming something like:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
if ... {
...
} else {
if ... {
r0 = imov r2
r1 = imov r3
...
else {
...
}
...
}
While this is has a smaller instruction count it requires more work
for the same result. With more complex examples we can also end up
shuffling the registers around in a way that requires more registers
to use as temps so that we don't overwrite our original values along
the way.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Here we only pull instructions further up control flow if they are
constant or texture instructions. See the code comment for more
information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
We can't move them later as we could move them into non-uniform
control flow, but moving them earlier should be fine.
This helps avoid a bunch of spilling in unigine shaders due to
moving the tex instructions sources earlier (outside if branches)
but not the instruction itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Classically, global code motion is also a dead code pass. However, in
the initial implementation, the decision was made to place every
instruction and let conventional DCE clean up the dead ones. Because
any uses of a dead instruction are unreachable, we have no late block
and the dead instructions are always scheduled early. The problem is
that, because we place the dead instruction early, it pushes the
placement of any dependencies of the dead instruction earlier than they
may need to be placed. In order prevent dead instructions from
affecting the placement of live ones, we need to delete them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Now that the GCM pass is more conservative and only moves instructions
to different blocks when it's advantageous to do so, we can have a
proper notion of what it means to make progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
We are about to adjust our instruction block assignment algorithm and we
will want to know the current block that the instruction lives in. In
order to allow for this, we can't overwrite nir_instr::block in the
early scheduling pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Even though no one's been brave enough to ever use this pass, I like to
keep it functionally working.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were using impl->num_blocks, but that isn't guaranteed to be
up-to-date until after the block_index metadata is required. If we were
unlucky, this could lead to overwriting memory.
Noticed by inspection.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a bug in code motion that occurred when the best block is the
same as the schedule early block. In this case, because we're checking
(lca != def->parent_instr->block) at the top of the loop, we never get to
the check for loop depth so we wouldn't move it out of the loop. This
commit reworks the loop to be a simple for loop up the dominator chain and
we place the (lca != def->parent_instr->block) check at the end of the
loop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Unlike the current CSE pass, global value numbering is capable of detecting
common values even if one does not dominate the other. For instance, in
you have
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
ssa_2 = ssa_0 + 7;
/* use ssa_2 */
}
Global value numbering doesn't care about dominance relationships so it
figures out that ssa_1 and ssa_2 are the same and converts this to
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
/* use ssa_1 */
}
Obviously, we just broke SSA form which is bad. Global code motion,
however, will repair this for us by turning this into
ssa_1 = ssa_0 + 7;
if (...) {
/* use ssa_1 */
} else {
/* use ssa_1 */
}
This intended to eventually mostly replace CSE. However, conventional CSE
may still be useful because it's less of a scorched-earth approach and
doesn't require GCM. This makes it a bit more appropriate for use as a
clean-up in a late optimization run.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.
No difference in OglBatch7 (n=20).
Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_use(\([^,]*\),\s*\([^,]*\))/nir_foreach_use(\2, \1)/
and similar expressions for nir_foreach_use_safe, etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_phi_src(\([^,]*\),\s*\([^,]*\))/nir_foreach_phi_src(\2, \1)/
and a similar expression for nir_foreach_phi_src_safe.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/
and similar expressions for nir_foreach_instr_safe etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>