Consider the loop:
float i = 0.0;
while (true) {
if (i != 0.0)
break;
i = i + 1.0;
}
This loop clearly executes exactly one time.
Some trickery is necessary to handle cases where the initial loop value
is very large and the increment is, by comparison, very small. From the
fenu_once test case,
float i = -604462909807314587353088.0;
while (true) {
if (i != -604462909807314587353088.0)
break;
i = i + 36028797018963968.0;
}
This loop should also execute exactly once, but this is much more
challenging to calculate due to precision issues.
Going towards smaller magnitude (i.e., adding a small positive value to
a large negative value) requires a smaller delta to make a difference
than going towards a larger magnitude. For this reason,
-604462909807314587353088.0 + 36028797018963968.0 !=
-604462909807314587353088.0, but -604462909807314587353088.0 +
-36028797018963968.0 == -604462909807314587353088.0. Math class is
tough.
No changes in shader-db or fossil-db.
v2: Fix major bug in checking result of the eval_const_binop(nir_op_feq,
...) discovered while developing fneu_once_easy unit test. Fix a typo in
the comment just above that. Add fneu_once_easy test.
v3: Skip the iteration count adjustment tests for nir_op_fenu and
nir_op_ine. Since the iteration count is either 1 or unknown, all this
function can do is add numerical error. Add fenu_once tests.
v4: Change the initial value in the fneu_once test from large positive
to large negative. Change check in get_iteration from nir_op_fsub to
nir_op_fadd. Both changes from discussion with M Henning. Also add some
more explanation in fneu_once.
v5: Rename test cases.
Fixes: 6772a17acc ("nir: Add a loop analysis pass")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19732>
I discovered this problem because adding an algebraic transformation to
convert some uge and ult to ieq or ine caused a couple loops to stop
unrolling. Consider the loop:
uint i = 0;
while (true) {
if (i >= 1)
break;
i++;
}
This loop clearly executes exactly one time. Note that uge(x, 1) is
equivalent to ine(x, 0). Changing the condition to 'if (i != 0)' will
also execute exactly one time.
In the added test cases, uge_once correctly get an exact loop trip count
of 1. Without the changes to nir_loop_analyze.c, the ine_once case
detects a maximum loop trip count of zero and does not get an exact loop
trip count.
No changes in shader-db or fossil-db.
v2: Move nir_op_fneu changes to a separate commit.
v3: Rename test cases.
Fixes: 6772a17acc ("nir: Add a loop analysis pass")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19732>
If we know that a select will be eliminated once the loop is
unrolled than we don't need to count the instruction towards the
cost of the loop.
This change helps 2 loops unroll in an xcom enemy unknown shader
that is loaded full of these redundant selects.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18587>
Here we move the calculation of the instruction cost of the loop
after we have processed other information such as finding the
induction variables. This is useful because we can use this further
information to find instructions that will be eliminated if the
loop was to unroll and therefore give them a cost of 0.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18587>
Ever since 4246c2869c and 7d85dc4f35 loop unrolling can no
longer depend on inot being eliminated from the loop
terminator condition so we need to be able to handle it.
This change avoids 292 loop unrolling regressions with shader-db
once the following patch is applied.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18006>
Ever since 4246c2869c and 7d85dc4f35 loop unrolling can no
longer depend on inot being eliminated from the loop
terminator condition so we need to be able to handle it.
Here we simply check to see if the inot contains a simple
terminator condition we previously handled. We also update
the previous users of this function to use a newly name
copy of the previous behaviour
nir_is_terminator_condition_with_two_inputs().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18006>
Some drivers don't support these indirects and therefore require
loop unrolling if a shader uses a loop induction variable to
access a sampler array.
Here we add a new nir shader compiler option that drivers can set,
this will be the equivalent of the EmitNoIndirectSampler setting
used in the GLSL IR unrolling pass.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
Previously we only cared if this was set for the limiting
terminator. However in the following patch we will make use of this
information on other terminators to decide if we can eliminate them.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16399>
Instead of fail in trip count calculation, just don't mark such
kind of variable as induction from the beginning.
Don't bother inline uniform to deal with such kind of variable
either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Zero'ing the allocation and calling initialize_ssa_def() for every
ssa def can be expensive. Since we only use a subset of the allocated
variables, initialize it only when needed.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7511>
If we don't know the actual mode then we can't get to the variable so
it's going to be a scratch or other indirect load anyway and we aren't
saving ourselves anything by unrolling the loop.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
v2:
- Refactor conditions and shared function (Connor).
- Move code to nir_eval_const_opcode() (Connor).
- Don't flush to zero on fquantize2f16
From Vulkan spec, VK_KHR_shader_float_controls section:
"3) Do denorm and rounding mode controls apply to OpSpecConstantOp?
RESOLVED: Yes, except when the opcode is OpQuantizeToF16."
v3:
- Fix bit size (Connor).
- Fix execution mode on nir_loop_analize (Connor).
v4:
- Adapt after API changes to nir_eval_const_opcode (Andres).
v5:
- Simplify constant_denorm_flush_to_zero (Caio).
v6:
- Adapt after API changes and to use the new constant
constructors (Andres).
- Replace MAYBE_UNUSED with UNUSED as the first is going
away (Andres).
v7:
- Adapt to newly added calls (Andres).
- Simplified the auxiliary to flush denorms to zero (Caio).
- Updated to renamed supported capabilities member (Andres).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v4]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Loops like:
block block_0:
vec1 32 ssa_2 = load_const (0x00000020)
vec1 32 ssa_3 = load_const (0x00000001)
loop {
vec1 32 ssa_7 = phi block_0: ssa_3, block_4: ssa_9
vec1 1 ssa_8 = ige ssa_2, ssa_7
if ssa_8 {
break
} else {
}
vec1 32 ssa_9 = iadd ssa_7, ssa_1
}
Were treated as having more than 1 iteration and after unrolling
produced wrong results, however such loop will exit during
the first iteration if not unrolled.
So we check if loop will actually loop.
Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles. Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition. The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.
Shader-db results on Kaby Lake:
total loops in shared programs: 4388 -> 4364 (-0.55%)
loops in affected programs: 29 -> 5 (-82.76%)
helped: 29
HURT: 5
Shader-db results on Haswell:
total loops in shared programs: 4370 -> 4373 (0.07%)
loops in affected programs: 2 -> 5 (150.00%)
helped: 2
HURT: 5
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure. This makes their callers
significantly simpler.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
None of the current code knows what to do with swizzles. Take the safe
option for now and bail if we see one. This does have a small shader-db
impact but it is at least safe.
Shader-db results on Kaby Lake:
total loops in shared programs: 4364 -> 4388 (0.55%)
loops in affected programs: 5 -> 29 (480.00%)
helped: 5
HURT: 29
Shader-db results on Haswell:
total loops in shared programs: 4373 -> 4370 (-0.07%)
loops in affected programs: 5 -> 2 (-60.00%)
helped: 5
HURT: 2
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means. Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way. We also use the new
constant constructors to build the correct size constants.
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions. Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true. If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.
Fixes: 9e6b39e1d5 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Users of this function expect alu to be a supported comparision
if the induction variable is not NULL. Since we attempt to
override the return values if the first limit is not a const, we
must make sure we are dealing with a valid comparision before
overriding the alu instruction.
Fixes an unreachable in inverse_comparison() with the game
Assasins Creed Odyssey.
Fixes: 3235a942c1 ("nir: find induction/limit vars in iand instructions")
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110216
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
loop {
...
vec1 1 ssa_12 = ilt ssa_225, ssa_11
vec1 1 ssa_17 = ilt ssa_226, ssa_1
vec1 1 ssa_18 = iand ssa_12, ssa_17
vec1 1 ssa_19 = inot ssa_18
if ssa_19 {
...
break
} else {
...
}
}
So in order to find the trip count we need to find the inverse of
ilt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.
The new helper will also be used in a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).
For example:
for (int i = 0; i < imin(x, 4); i++)
...
We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.
v2: check if the induction var is used to index more than a single
array and if so get the size of the smallest array.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The lowering we do for 64-bit instructions can cause a single NIR ALU
instruction to blow up into hundreds or thousands of instructions
potentially with control flow. If loop unrolling isn't aware of this,
it can unroll a loop 20 times which contains a nir_op_fsqrt which we
then lower to a full software implementation based on integer math.
Those 20 invocations suddenly get a lot more expensive than NIR loop
unrolling currently expects. By giving it an approximate estimate
function, we can prevent loop unrolling from going to town when it
shouldn't.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a squash of a few distinct changes:
glsl,spirv: Generate 1-bit Booleans
Revert "Use 32-bit opcodes in the NIR producers and optimizations"
Revert "nir/builder: Generate 32-bit bool opcodes transparently"
nir/builder: Generate 1-bit Booleans in nir_build_imm_bool
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Here we rework force_unroll_array_access() so that we can reuse
the induction variable detection in a following patch.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Following commits will introduce additional fields such as
guessed_trip_count. Renaming these will help avoid confusion
as our unrolling feature set grows.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>