Commit graph

22 commits

Author SHA1 Message Date
Jason Ekstrand
9314084237 nir: Teach loop unrolling about 64-bit instruction lowering
The lowering we do for 64-bit instructions can cause a single NIR ALU
instruction to blow up into hundreds or thousands of instructions
potentially with control flow.  If loop unrolling isn't aware of this,
it can unroll a loop 20 times which contains a nir_op_fsqrt which we
then lower to a full software implementation based on integer math.
Those 20 invocations suddenly get a lot more expensive than NIR loop
unrolling currently expects.  By giving it an approximate estimate
function, we can prevent loop unrolling from going to town when it
shouldn't.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Timothy Arceri
6dade5d534 nir: avoid uninitialized variable warning
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231
2019-01-07 10:57:00 +11:00
Jason Ekstrand
44227453ec nir: Switch to using 1-bit Booleans for almost everything
This is a squash of a few distinct changes:

    glsl,spirv: Generate 1-bit Booleans

    Revert "Use 32-bit opcodes in the NIR producers and optimizations"

    Revert "nir/builder: Generate 32-bit bool opcodes transparently"

    nir/builder: Generate 1-bit Booleans in nir_build_imm_bool

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
80e8dfe9de nir: Rename Boolean-related opcodes to include 32 in the name
This is a squash of a bunch of individual changes:

    nir/builder: Generate 32-bit bool opcodes transparently

    nir/algebraic: Remap Boolean opcodes to the 32-bit variant

    Use 32-bit opcodes in the NIR producers and optimizations

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

     Use 32-bit opcodes in the NIR back-ends

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Alejandro Piñeiro
c7bdcd67aa nir: remove unused variable
To avoid the following warning:
./src/compiler/nir/nir_loop_analyze.c:807:16: warning: unused variable ‘ns’ [-Wunused-variable]
    nir_shader *ns = impl->function->shader;
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-13 16:35:21 +01:00
Timothy Arceri
9e6b39e1d5 nir: detect more induction variables
This allows loop analysis to detect inductions variables that
are incremented in both branches of an if rather than in a main
loop block. For example:

   loop {
      block block_1:
      /* preds: block_0 block_7 */
      vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20
      vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4
      vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4
      vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21
      vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22
      vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9
      vec1 32 ssa_14 = ige ssa_8, ssa_5
      /* succs: block_2 block_3 */
      if ssa_14 {
         block block_2:
         /* preds: block_1 */
         break
         /* succs: block_8 */
      } else {
         block block_3:
         /* preds: block_1 */
         /* succs: block_4 */
      }
      block block_4:
      /* preds: block_3 */
      vec1 32 ssa_15 = ilt ssa_6, ssa_8
      /* succs: block_5 block_6 */
      if ssa_15 {
         block block_5:
         /* preds: block_4 */
         vec1 32 ssa_16 = iadd ssa_8, ssa_7
         vec1 32 ssa_17 = load_const (0x3f800000 /* 1.000000*/)
         /* succs: block_7 */
      } else {
         block block_6:
         /* preds: block_4 */
         vec1 32 ssa_18 = iadd ssa_8, ssa_7
         vec1 32 ssa_19 = load_const (0x3f800000 /* 1.000000*/)
         /* succs: block_7 */
      }
      block block_7:
      /* preds: block_5 block_6 */
      vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18
      vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4
      vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19
      /* succs: block_1 */
   }

Unfortunatly GCM could move the addition out of the if for us
(making this patch unrequired) but we still cannot enable the GCM
pass without regressions.

This unrolls a loop in Rise of The Tomb Raider.

vkpipeline-db results (VEGA):

Totals from affected shaders:
SGPRS: 88 -> 96 (9.09 %)
VGPRS: 56 -> 52 (-7.14 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2168 -> 4560 (110.33 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 4 -> 4 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211
2018-12-13 10:58:35 +11:00
Timothy Arceri
c03d6e61cc nir: reword code comment
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-12-13 10:58:35 +11:00
Timothy Arceri
48b40380e3 nir: in loop analysis track actual control flow type
This will allow us to improve analysis to find more induction
variables.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-12-13 10:58:35 +11:00
Timothy Arceri
721566bddb nir: rework force_unroll_array_access()
Here we rework force_unroll_array_access() so that we can reuse
the induction variable detection in a following patch.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-12-13 10:39:51 +11:00
Timothy Arceri
03d7c65ad8 nir: clarify some nit_loop_info member names
Following commits will introduce additional fields such as
guessed_trip_count. Renaming these will help avoid confusion
as our unrolling feature set grows.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-10 13:59:50 +11:00
Timothy Arceri
de0aee7638 nir: small tidy ups for nir_loop_analyze()
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-10 13:59:50 +11:00
Timothy Arceri
5a6b04d94b nir: add complex_loop bool to loop info
In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.

This will be used in the following patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 16:02:05 +10:00
Timothy Arceri
fef6325e58 nir: always attempt to find loop terminators
This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 16:02:05 +10:00
Timothy Arceri
463f849097 nir: fix selection of loop terminator when two or more have the same limit
We need to add loop terminators to the list in the order we come
across them otherwise if two or more have the same exit condition
we will select that last one rather than the first one even though
its unreachable.

This fix is for simple unrolls where we only have a single exit
point. When unrolling these type of loops the unreachable
terminators and their unreachable branch are removed prior to
unrolling. Because of the logic change we also switch some
list access in the complex unrolling logic to avoid breakage.

Fixes: 6772a17acc ("nir: Add a loop analysis pass")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-30 10:13:03 +10:00
Jason Ekstrand
9800b81ffb nir: Remove deref chain support from analyze_loops
Note that this patch needs to come late in the series since this pass
can be run after any pass that damages nir_metadata_loop_analysis.

Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 21:23:06 -07:00
Jason Ekstrand
414148cdc1 nir: Support deref instructions in loop_analyze
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:15:56 -07:00
Rob Clark
d80c342d89 nir: add deref lowering sanity checking
This will be removed at the end of the transition, but add some tracking
plus asserts to help ensure that lowering passes are called at the
correct point (pre or post deref instruction lowering) as passes are
converted and the point where lower_deref_instrs() is called is moved.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:15:54 -07:00
Timothy Arceri
1098bc5e85 nir: move ends_in_break() helper to nir_loop_analyze.h
We will use the helper while simplifying potential loop terminators
in the following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-07 11:33:04 +10:00
Matt Turner
d6e2bdfed3 nir: Stop using apostrophes to pluralize.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:43 -07:00
Elie TOURNIER
082d5b1aee nir: Delete unused arg in get_iteration
nir_const_value is not needed in get_iteration

Signed-off-by: Elie Tournier <tournier.elie@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-02-27 14:35:16 +00:00
Timothy Arceri
4b7dfd8812 nir: fix loop iteration count calculation for floats
Fixes performance regression in SynMark PSPom caused by loops with float
counters not always unrolling.

For example:

   for (float i = 0.02; i < 0.9; i += 0.11)
      ...

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-04 14:48:36 +11:00
Thomas Helland
6772a17acc nir: Add a loop analysis pass
This pass detects induction variables and calculates the
trip count of loops to be used for loop unrolling.

V2: Rebase, adapt to removal of function overloads

V3: (Timothy Arceri)
 - don't try to find trip count if loop terminator conditional is a phi
 - fix trip count for do-while loops
 - replace conditional type != alu assert with return
 - disable unrolling of loops with continues
 - multiple fixes to memory allocation, stop leaking and don't destroy
   structs we want to use for unrolling.
 - fix iteration count bugs when induction var not on RHS of condition
 - add FIXME for && conditions
 - calculate trip count for unsigned induction/limit vars

V4: (Timothy Arceri)
- count instructions in a loop
- set the limiting_terminator even if we can't find the trip count for
 all terminators. This is needed for complex unrolling where we handle
 2 terminators and the trip count is unknown for one of them.
- restruct structs so we don't keep information not required after
 analysis and remove dead fields.
- force unrolling in some cases as per the rules in the GLSL IR pass

V5: (Timothy Arceri)
- fix metadata mask value 0x10 vs 0x16

V6: (Timothy Arceri)
- merge loop_variable and nir_loop_variable structs and lists suggested by Jason
- remove induction var hash table and store pointer to induction information in
  the loop_variable suggested by Jason.
- use lowercase list_addtail() suggested by Jason.
- tidy up init_loop_block() as per Jasons suggestions.
- replace switch with nir_op_infos[alu->op].num_inputs == 2 in
  is_var_basic_induction_var() as suggested by Jason.
- use nir_block_last_instr() in and rename foreach_cf_node_ex_loop() as suggested
  by Jason.
- fix else check for is_trivial_loop_terminator() as per Connors suggetions.
- simplify offset for induction valiables incremented before the exit conditions is
  checked.
- replace nir_op_isub check with assert() as it should have been lowered away.

V7: (Timothy Arceri)
- use rzalloc() on nir_loop struct creation. Worked previously because ralloc()
  was broken and always zeroed the struct.
- fix cf_node_find_loop_jumps() to find jumps when loops contain
  nested if statements. Code is tidier as a result.

V8: (Timothy Arceri)
- move is_trivial_loop_terminator() to nir.h so we can use it to assert is
  the loop unroll pass
- fix analysis to not bail when looking for terminator when the break is in the else
  rather then the if
- added new loop terminator fields: break_block, continue_from_block and
  continue_from_then so we don't have to gather these when doing unrolling.
- get correct array length when forcing unrolling of variables
  indexed arrays that are the same size as the iteration count
- add support for induction variables of type float
- update trival loop terminator check to allow an if containing
  instructions as long as both branches contain only a single
  block.

V9: (Timothy)
 - bunch of tidy ups and simplifications suggested by Jason.
 - rewrote trivial terminator detection, now the only restriction is there
   must be no nested jumps, anything else goes.
 - rewrote the iteration test to use nir_eval_const_opcode().
 - count instruction properly even when forcing an unroll.
 - bunch of other tidy ups and simplifications.

V10: (Timothy)
 - some trivial tidy ups suggested by Jason.
 - conditional fix for break inside continue branch by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00