Commit graph

69 commits

Author SHA1 Message Date
Ian Romanick
30879a760c nir/loop_analyze: Add a function to evaluate an ALU as constant
...with a substitution. This function is largely a copy-and-paste of
try_fold_alu (nir_opt_constant_folding.c), and an argument could be made
that this function belongs in that file.

v2: Some changes were mistakenly squashed in to "nir/loop_analyze: Use
try_eval_const_alu and induction variable basis info" that should have
been here.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00
Daniel Schürmann
2bb369dd8d nir: add assertions that loops don't have a Continue Construct
Hoping that I didn't miss any, this *should* add assertions
to all functions and passes which explicitly handle 'nir_loop'.

Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
2023-02-21 10:41:11 +00:00
Ian Romanick
862b5b7d01 nir/loop_analyze: Simplify some logic in compute_induction_information
This part now looks more like it did before 0b9639c35d.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
2023-02-17 22:12:05 +00:00
Ian Romanick
9461cc4424 nir/loop_analyze: Track induction variables with uniform initializer
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
2023-02-17 22:12:05 +00:00
Ian Romanick
4edf1cdd3d nir/loop_analyze: Eliminate nir_basic_induction_var
No longer used. All of the information that was previously track here is
tracked directly in nir_loop_variable... and, technically speaking, has
been tracked there ever since 0b9639c35d.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
2023-02-17 22:12:05 +00:00
Ian Romanick
e444ed9210 nir/loop_analyze: Use nir_loop_variable::init_src instead of nir_basic_induction_var::def_outside_loop
These track the same information in a slightly different way. Since
nir_loop_variable::init_src is visible outside this module, it cannot
be eliminated.

As an intentional side effect, induction variables with constant
initializers will now have their nir_loop_induction_variable::init_src
field point to the load_const source. Previously this pointer would be
NULL.

v2: Update unit tests and commit message. Remove the now unused ind_var
variable in find_trip_count.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
2023-02-17 22:12:05 +00:00
Ian Romanick
72e763650c nir/loop_analyze: Use nir_loop_variable::update_src instead of nir_basic_induction_var::alu
These track the same information in a slightly different way. Since
nir_loop_variable::update_src is visible outside this module, it cannot
be eliminated.

This leads to some nice simplification in find_trip_count. Previously
this code only had access to the ALU instruction that performs the
increment. It had to "search" the parameters to determine which (if any)
was the constant. With this change, this code has access to the
nir_alu_src of the ALU instruction that performs the increment. It no
longer needs to search the parameters for the constant. It's either the
supplied nir_alu_src or nothing.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
2023-02-17 22:12:05 +00:00
Ian Romanick
1bc43c0778 nir/loop_analyze: Track induction variables with uniform increments
As an intentional side effect, induction variables with constant
increments will now have their nir_loop_induction_variable::update_src
field point to the load_const source. Previously this pointer would be
NULL.

v2: Update unit tests and commit message.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21289>
2023-02-17 22:12:05 +00:00
Ian Romanick
f75c83c4aa nir/loop_analyze: Fix get_iteration for nir_op_fneu
Consider the loop:

    float i = 0.0;
    while (true) {
       if (i != 0.0)
          break;

       i = i + 1.0;
    }

This loop clearly executes exactly one time.

Some trickery is necessary to handle cases where the initial loop value
is very large and the increment is, by comparison, very small.  From the
fenu_once test case,

    float i = -604462909807314587353088.0;
    while (true) {
       if (i != -604462909807314587353088.0)
          break;

       i = i + 36028797018963968.0;
    }

This loop should also execute exactly once, but this is much more
challenging to calculate due to precision issues.

Going towards smaller magnitude (i.e., adding a small positive value to
a large negative value) requires a smaller delta to make a difference
than going towards a larger magnitude. For this reason,
-604462909807314587353088.0 + 36028797018963968.0 !=
-604462909807314587353088.0, but -604462909807314587353088.0 +
-36028797018963968.0 == -604462909807314587353088.0. Math class is
tough.

No changes in shader-db or fossil-db.

v2: Fix major bug in checking result of the eval_const_binop(nir_op_feq,
...) discovered while developing fneu_once_easy unit test. Fix a typo in
the comment just above that. Add fneu_once_easy test.

v3: Skip the iteration count adjustment tests for nir_op_fenu and
nir_op_ine. Since the iteration count is either 1 or unknown, all this
function can do is add numerical error. Add fenu_once tests.

v4: Change the initial value in the fneu_once test from large positive
to large negative. Change check in get_iteration from nir_op_fsub to
nir_op_fadd. Both changes from discussion with M Henning. Also add some
more explanation in fneu_once.

v5: Rename test cases.

Fixes: 6772a17acc ("nir: Add a loop analysis pass")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19732>
2022-11-22 03:18:54 +00:00
Ian Romanick
d9f014401b nir/loop_analyze: Fix get_iteration for nir_op_ine
I discovered this problem because adding an algebraic transformation to
convert some uge and ult to ieq or ine caused a couple loops to stop
unrolling. Consider the loop:

    uint i = 0;
    while (true) {
       if (i >= 1)
          break;

       i++;
    }

This loop clearly executes exactly one time. Note that uge(x, 1) is
equivalent to ine(x, 0). Changing the condition to 'if (i != 0)' will
also execute exactly one time.

In the added test cases, uge_once correctly get an exact loop trip count
of 1. Without the changes to nir_loop_analyze.c, the ine_once case
detects a maximum loop trip count of zero and does not get an exact loop
trip count.

No changes in shader-db or fossil-db.

v2: Move nir_op_fneu changes to a separate commit.

v3: Rename test cases.

Fixes: 6772a17acc ("nir: Add a loop analysis pass")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19732>
2022-11-22 03:18:54 +00:00
SoroushIMG
121f30005f nir: track whether a loop contains soft fp64 ops
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18863>
2022-09-30 17:07:37 +00:00
Timothy Arceri
40c32dfbb1 nir/loop_analyze: remove cost of redundant selects
If we know that a select will be eliminated once the loop is
unrolled than we don't need to count the instruction towards the
cost of the loop.

This change helps 2 loops unroll in an xcom enemy unknown shader
that is loaded full of these redundant selects.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18587>
2022-09-27 00:31:47 +00:00
Timothy Arceri
13d0ae593b nir/loop_analyze: delay instruction cost calculation
Here we move the calculation of the instruction cost of the loop
after we have processed other information such as finding the
induction variables. This is useful because we can use this further
information to find instructions that will be eliminated if the
loop was to unroll and therefore give them a cost of 0.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18587>
2022-09-27 00:31:47 +00:00
Timothy Arceri
61c3438b27 nir: support loop unrolling with inot conditions
Ever since 4246c2869c and 7d85dc4f35 loop unrolling can no
longer depend on inot being eliminated from the loop
terminator condition so we need to be able to handle it.

This change avoids 292 loop unrolling regressions with shader-db
once the following patch is applied.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18006>
2022-09-08 01:01:14 +00:00
Timothy Arceri
96c19d23c9 nir: update nir_is_supported_terminator_condition()
Ever since 4246c2869c and 7d85dc4f35 loop unrolling can no
longer depend on inot being eliminated from the loop
terminator condition so we need to be able to handle it.

Here we simply check to see if the inot contains a simple
terminator condition we previously handled. We also update
the previous users of this function to use a newly name
copy of the previous behaviour
nir_is_terminator_condition_with_two_inputs().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18006>
2022-09-08 01:01:14 +00:00
Timothy Arceri
ff8ddcb23e nir: add support for forced sampler indirect loop unrolling
Some drivers don't support these indirects and therefore require
loop unrolling if a shader uses a loop induction variable to
access a sampler array.

Here we add a new nir shader compiler option that drivers can set,
this will be the equivalent of the EmitNoIndirectSampler setting
used in the GLSL IR unrolling pass.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
2022-05-17 02:12:21 +00:00
Timothy Arceri
4c3d138e5d nir: always set the exact_trip_count_unknown loop terminator property
Previously we only cared if this was set for the limiting
terminator. However in the following patch we will make use of this
information on other terminators to decide if we can eliminate them.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16399>
2022-05-12 02:06:31 +00:00
Daniel Schürmann
89a842b2b6 nir/loop_analyze: consider instruction cost of nir_op_flrp
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>
2021-08-24 16:10:30 +00:00
Qiang Yu
3c93ebbae5 nir/loop_analyze: skip unsupported induction variable early
Instead of fail in trip count calculation, just don't mark such
kind of variable as induction from the beginning.

Don't bother inline uniform to deal with such kind of variable
either.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Qiang Yu
0b9639c35d nir/loop_analyze: record induction variables for each loop
For being used by uniform inline lowering pass.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Qiang Yu
c86ec09d11 nir/loop_analyze: move nir_is_supported_terminator_condition() to header
To be shared with uniform inline.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Rhys Perry
4832262560 nir/loop_analyze: initialize loop variables on demand
Zero'ing the allocation and calling initialize_ssa_def() for every
ssa def can be expensive. Since we only use a subset of the allocated
variables, initialize it only when needed.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7511>
2020-11-20 13:57:34 +00:00
Jason Ekstrand
0f94ff8a6a nir: Only force loop unrolling if we know it's a in/out/temp
If we don't know the actual mode then we can't get to the variable so
it's going to be a scratch or other indirect load anyway and we aren't
saving ourselves anything by unrolling the loop.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
2020-11-03 22:18:28 +00:00
Rhys Perry
4735c8a522 nir/loop_analyze: adjust force unrolling to only include interesting modes
Instead of force-unrolling any loop which reads an entire array, only do
it for arrays which might be faster to access with constant indices.

Significantly improves compile-time for these CTS tests, which could
previously timeout:
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_geom
dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.struct_mixed_types.storage_buffer_geom
dEQP-VK.spirv_assembly.instruction.graphics.spirv_ids_abuse.lots_ids_geom

fossil-db (Navi):
Totals from 19 (0.01% of 137413) affected shaders:
SGPRs: 1728 -> 1688 (-2.31%)
VGPRs: 1176 -> 1168 (-0.68%)
CodeSize: 198496 -> 136580 (-31.19%)
MaxWaves: 154 -> 156 (+1.30%)
Instrs: 38889 -> 26029 (-33.07%)
Cycles: 446108 -> 1059924 (+137.59%); split: -0.91%, +138.51%
VMEM: 3245 -> 2926 (-9.83%)
SMEM: 850 -> 828 (-2.59%); split: +4.71%, -7.29%
VClause: 549 -> 533 (-2.91%)
SClause: 1810 -> 1522 (-15.91%)
Copies: 2209 -> 1705 (-22.82%); split: -22.95%, +0.14%
Branches: 854 -> 603 (-29.39%); split: -29.86%, +0.47%
PreSGPRs: 1512 -> 1506 (-0.40%); split: -0.53%, +0.13%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7161>
2020-10-22 12:07:45 +01:00
Karol Herbst
e5899c1e88 nir: rename nir_op_fne to nir_op_fneu
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
2020-08-21 17:26:21 +00:00
Kristian H. Kristensen
de856c6170 nir: Delete unused is_var_constant() helper
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3686>
2020-02-04 06:03:52 +00:00
Samuel Iglesias Gonsálvez
f7d73db353 nir: add support for flushing to zero denorm constants
v2:
- Refactor conditions and shared function (Connor).
- Move code to nir_eval_const_opcode() (Connor).
- Don't flush to zero on fquantize2f16
  From Vulkan spec, VK_KHR_shader_float_controls section:

  "3) Do denorm and rounding mode controls apply to OpSpecConstantOp?

  RESOLVED: Yes, except when the opcode is OpQuantizeToF16."

v3:
- Fix bit size (Connor).
- Fix execution mode on nir_loop_analize (Connor).

v4:
- Adapt after API changes to nir_eval_const_opcode (Andres).

v5:
- Simplify constant_denorm_flush_to_zero (Caio).

v6:
- Adapt after API changes and to use the new constant
  constructors (Andres).
- Replace MAYBE_UNUSED with UNUSED as the first is going
  away (Andres).

v7:
- Adapt to newly added calls (Andres).
- Simplified the auxiliary to flush denorms to zero (Caio).
- Updated to renamed supported capabilities member (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v4]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:18 +03:00
Danylo Piliaiev
e71fc7f238 nir/loop_analyze: Treat do{}while(false) loops as 0 iterations
Loops like:

block block_0:
vec1 32 ssa_2 = load_const (0x00000020)
vec1 32 ssa_3 = load_const (0x00000001)
loop {
    vec1 32 ssa_7 = phi block_0: ssa_3, block_4: ssa_9
    vec1 1 ssa_8 = ige ssa_2, ssa_7
    if ssa_8 {
        break
    } else {
    }
    vec1 32 ssa_9 = iadd ssa_7, ssa_1
}

Were treated as having more than 1 iteration and after unrolling
produced wrong results, however such loop will exit during
the first iteration if not unrolled.

So we check if loop will actually loop.

Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-08-21 11:01:15 +00:00
Jason Ekstrand
7e0fcea727 nir/loop_analyze: Pass nir_const_values directly to helpers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
ff972c7a3a nir/loop_analyze: Properly handle swizzles in loop conditions
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles.  Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition.  The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4388 -> 4364 (-0.55%)
    loops in affected programs: 29 -> 5 (-82.76%)
    helped: 29
    HURT: 5

Shader-db results on Haswell:

    total loops in shared programs: 4370 -> 4373 (0.07%)
    loops in affected programs: 2 -> 5 (150.00%)
    helped: 2
    HURT: 5

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
0333649e63 nir/loop_analyze: Refactor detection of limit vars
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure.  This makes their callers
significantly simpler.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
9a3cb6f5fe nir/loop_analyze: Bail if we encounter swizzles
None of the current code knows what to do with swizzles.  Take the safe
option for now and bail if we see one.  This does have a small shader-db
impact but it is at least safe.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4364 -> 4388 (0.55%)
    loops in affected programs: 5 -> 29 (480.00%)
    helped: 5
    HURT: 29

Shader-db results on Haswell:

    total loops in shared programs: 4373 -> 4370 (-0.07%)
    loops in affected programs: 5 -> 2 (-60.00%)
    helped: 5
    HURT: 2

Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
6455fa9710 nir/loop_analyze: Use new eval_const_* helpers in test_iterations
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
268ad47c11 nir/loop_analyze: Handle bit sizes correctly in calculate_iterations
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means.  Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way.  We also use the new
constant constructors to build the correct size constants.

Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
9f7ffe41dd nir/loop_analyze: Fix phi-of-identical-alu detection
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions.  Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true.  If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.

Fixes: 9e6b39e1d5 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Karol Herbst
14531d676b nir: make nir_const_value scalar
v2: remove & operator in a couple of memsets
    add some memsets
v3: fixup lima

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-14 22:25:56 +02:00
Karol Herbst
e72beacb95 nir/loop_analyze: use nir_const_value.b for boolean results, not u32
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Caio Marcelo de Oliveira Filho
e5830e1132 nir: Handle array-deref-of-vector case in loop analysis
SPIR-V can produce those for SSBO and UBO access.  Found when testing
the ARB_gl_spirv series.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-22 13:50:39 -07:00
Timothy Arceri
427a6fee43 nir: only override previous alu during loop analysis if supported
Users of this function expect alu to be a supported comparision
if the induction variable is not NULL. Since we attempt to
override the return values if the first limit is not a const, we
must make sure we are dealing with a valid comparision before
overriding the alu instruction.

Fixes an unreachable in inverse_comparison() with the game
Assasins Creed Odyssey.

Fixes: 3235a942c1 ("nir: find induction/limit vars in iand instructions")

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110216
2019-03-21 21:51:21 +11:00
Brian Paul
02c2863df5 nir: silence a couple new compiler warnings
[33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'.
../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’:
../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
    if (*ind == NULL || *ind && (*ind)->type != basic_induction ||
                             ^
[85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'.
../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’:
../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable]
    nir_cf_node *unroll_loc =
                 ^
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-12 14:34:51 +11:00
Timothy Arceri
3235a942c1 nir: find induction/limit vars in iand instructions
This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

On RADV this unrolls a bunch of loops in F1-2017 shaders.

Totals from affected shaders:
SGPRS: 4112 -> 4136 (0.58 %)
VGPRS: 4132 -> 4052 (-1.94 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 515444 -> 587720 (14.02 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 194 -> 196 (1.03 %)
Wait states: 0 -> 0 (0.00 %)

It also unrolls a couple of loops in shader-db on radeonsi.

Totals from affected shaders:
SGPRS: 128 -> 128 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 6880 -> 9504 (38.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 16 -> 16 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
67c3478482 nir: pass nir_op to calculate_iterations()
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
11e8f8a166 nir: add get_induction_and_limit_vars() helper to loop analysis
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
f219f6114d nir: add helper to return inversion op of a comparison
This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

So in order to find the trip count we need to find the inverse of
ilt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
090feaacdc nir: simplify the loop analysis trip count code a little
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.

The new helper will also be used in a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
68ce0ec222 nir: calculate trip count for more loops
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).

For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
03a452b7d0 nir: add guess trip count support to loop analysis
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.

v2: check if the induction var is used to index more than a single
    array and if so get the size of the smallest array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Jason Ekstrand
9314084237 nir: Teach loop unrolling about 64-bit instruction lowering
The lowering we do for 64-bit instructions can cause a single NIR ALU
instruction to blow up into hundreds or thousands of instructions
potentially with control flow.  If loop unrolling isn't aware of this,
it can unroll a loop 20 times which contains a nir_op_fsqrt which we
then lower to a full software implementation based on integer math.
Those 20 invocations suddenly get a lot more expensive than NIR loop
unrolling currently expects.  By giving it an approximate estimate
function, we can prevent loop unrolling from going to town when it
shouldn't.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Timothy Arceri
6dade5d534 nir: avoid uninitialized variable warning
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231
2019-01-07 10:57:00 +11:00
Jason Ekstrand
44227453ec nir: Switch to using 1-bit Booleans for almost everything
This is a squash of a few distinct changes:

    glsl,spirv: Generate 1-bit Booleans

    Revert "Use 32-bit opcodes in the NIR producers and optimizations"

    Revert "nir/builder: Generate 32-bit bool opcodes transparently"

    nir/builder: Generate 1-bit Booleans in nir_build_imm_bool

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00