Commit graph

14 commits

Author SHA1 Message Date
Eric Engestrom
bb84fa146f util: use C99 declaration in the for-loop hash_table_foreach() macro
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-10-25 12:43:18 +01:00
Caio Marcelo de Oliveira Filho
b3c6146925 nir: Copy propagation between blocks
Extend the pass to propagate the copies information along the control
flow graph.  It performs two walks, first it collects the vars
that were written inside each node. Then it walks applying the copy
propagation using a list of copies previously available.  At each node
the list is invalidated according to results from the first walk.

This approach is simpler than a full data-flow analysis, but covers
various cases.  If derefs are used for operating on more memory
resources (e.g. SSBOs), the difference from a regular pass is expected
to be more visible -- as the SSA copy propagation pass won't apply to
those.

A full data-flow analysis would handle more scenarios: conditional
breaks in the control flow and merge equivalent effects from multiple
branches (e.g. using a phi node to merge the source for writes to the
same deref).  However, as previous commentary in the code stated, its
complexity 'rapidly get out of hand'.  The current patch is a good
intermediate step towards more complex analysis.

The 'copies' linked list was modified to use util_dynarray to make it
more convenient to clone it (to handle ifs/loops).

Annotated shader-db results for Skylake:

    total instructions in shared programs: 15105796 -> 15105451 (<.01%)
    instructions in affected programs: 152293 -> 151948 (-0.23%)
    helped: 96
    HURT: 17

        All the HURTs and many HELPs are one instruction.  Looking
        at pass by pass outputs, the copy prop kicks in removing a
        bunch of loads correctly, which ends up altering what other
        other optimizations kick.  In those cases the copies would be
        propagated after lowering to SSA.

        In few HELPs we are actually helping doing more than was
        possible previously, e.g. consolidating load_uniforms from
        different blocks.  Most of those are from
        shaders/dolphin/ubershaders/.

    total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
    cycles in affected programs: 151461830 -> 151367845 (-0.06%)
    helped: 2933
    HURT: 2950

        A lot of noise on both sides.

    total loops in shared programs: 4603 -> 4603 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 11085 -> 11073 (-0.11%)
    spills in affected programs: 23 -> 11 (-52.17%)
    helped: 1
    HURT: 0

        The shaders/dolphin/ubershaders/12.shader_test was able to
        pull a couple of loads from inside if statements and reuse
        them.

    total fills in shared programs: 23143 -> 23089 (-0.23%)
    fills in affected programs: 2718 -> 2664 (-1.99%)
    helped: 27
    HURT: 0

        All from shaders/dolphin/ubershaders/.

    LOST:   0
    GAINED: 0

The other generations follow the same overall shape.  The spills and
fills HURTs are all from the same game.

shader-db results for Broadwell.

    total instructions in shared programs: 15402037 -> 15401841 (<.01%)
    instructions in affected programs: 144386 -> 144190 (-0.14%)
    helped: 86
    HURT: 9

    total cycles in shared programs: 600912755 -> 600902486 (<.01%)
    cycles in affected programs: 185662820 -> 185652551 (<.01%)
    helped: 2598
    HURT: 3053

    total loops in shared programs: 4579 -> 4579 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 80929 -> 80924 (<.01%)
    spills in affected programs: 720 -> 715 (-0.69%)
    helped: 1
    HURT: 5

    total fills in shared programs: 93057 -> 93013 (-0.05%)
    fills in affected programs: 3398 -> 3354 (-1.29%)
    helped: 27
    HURT: 5

    LOST:   0
    GAINED: 2

shader-db results for Haswell:

    total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
    instructions in affected programs: 44992 -> 43374 (-3.60%)
    helped: 27
    HURT: 69

    total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
    cycles in affected programs: 7720673 -> 7687588 (-0.43%)
    helped: 1609
    HURT: 1416

    total loops in shared programs: 1830 -> 1830 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 1988 -> 1692 (-14.89%)
    spills in affected programs: 296 -> 0
    helped: 1
    HURT: 0

    total fills in shared programs: 2103 -> 1668 (-20.68%)
    fills in affected programs: 438 -> 3 (-99.32%)
    helped: 4
    HURT: 0

    LOST:   0
    GAINED: 1

v2: Remove the DISABLE prefix from tests we now pass.

v3: Add comments about missing write_mask handling. (Caio)
    Add unreachable when switching on cf_node type. (Jason)
    Properly merge the component information in written map
    instead of replacing. (Jason)
    Explain how removal from written arrays works. (Jason)
    Use mode directly from deref instead of getting the var. (Jason)

v4: Register the local written mode for calls. (Jason)
    Prefer cf_node instead of node. (Jason)
    Clarify that remove inside iteration only works in backward
    iterations. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho
dc349f07b5 nir: Take call instruction into account in copy_prop_vars
Calls are not used yet (functions are inlined), but since new code is
already taking them into account, do it here too.  The convention here
and in other places is that no writable memory is assumed to remain
unchanged, as well as global variables.

Also, explicitly state the modes affected (instead of using the
reverse logic) in one of the apply_for_barrier_modes calls.

Suggested by Jason.

v2: Consider local vars used by a call to be conservative, SPIR-V has
    such cases. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho
4dfa7adc10 nir: Remove handling of dead writes from copy_prop_vars
These are covered by another pass now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho
5196041e93 nir: Export deref comparison functions
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-22 14:41:26 -07:00
Karol Herbst
1beef89ad8 nir: prepare for bumping up max components to 16
OpenCL knows vector of size 8 and 16.

v2: rebased on master (nir_swizzle rework)
    rework more declarations with nir_component_mask_t
    adjust print_var_decl

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-17 13:24:09 +02:00
Jason Ekstrand
a331d7d1cd nir: Remove old-school deref chain support
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 21:23:06 -07:00
Jason Ekstrand
ba2bd20f87 nir: Rework opt_copy_prop_vars to use deref instructions
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:54:00 -07:00
Jason Ekstrand
fa6ffcc083 nir/copy_prop_vars: Re-order some logic in compare_derefs
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:54:00 -07:00
Rob Clark
d80c342d89 nir: add deref lowering sanity checking
This will be removed at the end of the transition, but add some tracking
plus asserts to help ensure that lowering passes are called at the
correct point (pre or post deref instruction lowering) as passes are
converted and the point where lower_deref_instrs() is called is moved.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:15:54 -07:00
Jason Ekstrand
a1452a94fc nir: Return a cursor from nir_instr_remove
Because nir_instr_remove is an inline wrapper around nir_instr_remove_v,
the compiler should be able to tell that the return value is unused and
not emit the extra code in most cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-30 17:20:27 -07:00
Timothy Arceri
3f0fb23b03 nir: fix nir_opt_copy_prop_vars() for arrays of arrays
Previously we only incremented the guide for a single
dimension/wildcard.

V2: rework logic to avoid code duplication

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2017-07-19 11:06:23 +10:00
Vinson Lee
01d80bed1f nir: Fix anonymous union initialization with older GCC.
Fix this build error with GCC 4.4.7.

  CC     nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘copy_prop_vars_block’:
nir/nir_opt_copy_prop_vars.c:765: error: unknown field ‘deref’ specified in initializer
nir/nir_opt_copy_prop_vars.c:765: warning: missing braces around initializer
nir/nir_opt_copy_prop_vars.c:765: warning: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:765: warning: initialization from incompatible pointer type

Fixes: 62332d139c ("nir: Add a local variable-based copy propagation pass")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 23:25:32 -08:00
Jason Ekstrand
62332d139c nir: Add a local variable-based copy propagation pass
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:28 -08:00