Commit graph

3353 commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho
61965afd00 nir/copy_prop_vars: add tests for indirect array deref
Both on an actual array and on a vector, and an extra test on a vector
mixing direct and indirect access.  The vector tests are disabled and
will be enabled by a later commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho
96c32d7776 nir/copy_prop_vars: handle load/store of vector elements
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.

Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w.  Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.

The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.

It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.

Tests related to these cases are now enabled.

v2: Instead of asserting on invalid indices, "load" an undef and
    remove the store.  (Jason)

v3: Merge code path for the cases of is_array_deref_of_vector into the
    regular code path.  Add a base_index parameter to
    value_set_from_value.  (code changes by Jason)

v4: Removed the get_entry_for_deref helper, now being used only once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho
33dafdc024 nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS
Also replace uses of 0xf with the appropriate full mask created from
the number of components.

Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho
e84c841fb0 nir/copy_prop_vars: rename/refactor store_to_entry helper
The name reflected this function role back when the pass also did dead
write elimination.  So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Juan A. Suarez Romero
b43b55d461 nir/spirv: return after emitting a branch in block
When emitting a branch in a block, it does not make sense to continue
processing further instructions, as they will not be reachable.

This fixes a nasty case with a loop with a branch that both then-part
and else-part exits the loop:

%1 = OpLabel
     OpLoopMerge %2 %3 None
     OpBranchConditional %false %2 %2
%3 = OpLabel
     OpBranch %1
%2 = OpLabel
    [...]

We know that block %1 will branch always to block %2, which is the merge
block for the loop. And thus a break is emitted. If we keep continuing
processing further instructions, we will be processing the branch
conditional and thus emitting the proper NIR conditional, which leads to
instructions after the break.

This fixes dEQP-VK.graphicsfuzz.continue-and-merge.

CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 09:47:06 +01:00
Timothy Arceri
7536af670b glsl: fix shader cache for packed param list
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.

Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.

This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.

Fixes: edded12376 ("mesa: rework ParameterList to allow packing")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-28 11:47:37 +11:00
Gert Wollny
b7201a468d nir: Add posibility to not lower to source mod 'abs' for ops with three sources
This is useful for r600 since there the abs source modifier is not supported
for ops with three sources

v2: Use correct logic to enable lowering to abs source mod (Eric Anhold)

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-27 11:04:06 +00:00
Kasireddy, Vivek
78fb3fd17e nir/lower_tex: Add support for XYUV lowering
The memory layout associated with this format would be:
Byte:      0 1 2 3
Component: V U Y X

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:51 +00:00
Tapani Pälli
22267feff1 nir: initialize value in copy_prop_vars_block
Fixes following valgrind warning:

   ==27561== Conditional jump or move depends on uninitialised value(s)
   ==27561==    at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78)
   ==27561==    by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797)

Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-26 08:56:25 +02:00
Eric Anholt
7c1bf075f3 nir: Just return when asked to rewrite uses of an SSA def to itself.
The nir_builder swizzling improvement to not emit extra MOVs resulted in
nir_lower_tex() trying to rewrite an SSA def to itself, triggering the
assert on all texturing in v3d.  There's no work to be done in this case,
so just stop asserting.

Fixes: 743700be1f ("nir/builder: Don't emit no-op swizzles")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 21:25:24 +00:00
Daniel Schürmann
0bd45f96b9 nir: Use SM5 properties to optimize shift(a@32, iand(31, b))
This is a common pattern from HLSL->SPIRV translation
and supported in HW by all current NIR backends.

vkpipeline-db results anv (SKL):

    total instructions in shared programs: 6403130 -> 6402380 (-0.01%)
    instructions in affected programs: 204084 -> 203334 (-0.37%)
    helped: 208
    HURT: 0

    total cycles in shared programs: 1915629582 -> 1918198408 (0.13%)
    cycles in affected programs: 1158892682 -> 1161461508 (0.22%)
    helped: 107
    HURT: 86

shader-db results on i965 (KBL):

    total instructions in shared programs: 15284592 -> 15284568 (<.01%)
    instructions in affected programs: 81683 -> 81659 (-0.03%)
    helped: 24
    HURT: 0

    total cycles in shared programs: 375013622 -> 375013932 (<.01%)
    cycles in affected programs: 40169618 -> 40169928 (<.01%)
    helped: 13
    HURT: 9

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:44 -06:00
Daniel Schürmann
0525bdc225 nir: Define shifts according to SM5 specification.
SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:43 -06:00
Oscar Blumberg
da9c030763 glsl: Fix function return typechecking
apply_implicit_conversion only converts and check base types but we
need actual type equality for function returns, otherwise you can
return a vec2 from a function declared as returning a float.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-25 08:49:06 +02:00
Jason Ekstrand
743700be1f nir/builder: Don't emit no-op swizzles
The nir_swizzle helper is used some on it's own but it's also called by
nir_channel and nir_channels which are used everywhere.  It's pretty
quick to check while we're walking the swizzle anyway whether or not
it's an identity swizzle.  If it is, we now don't bother emitting the
instruction.  Sure, copy-prop will clean it up for us but there's no
sense making more work for the optimizer than we have to.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-24 20:01:27 -06:00
Jason Ekstrand
724371c6b9 nir/split_vars: Don't compact vectors unnecessarily
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-24 20:01:18 -06:00
Caio Marcelo de Oliveira Filho
4c160b6bd8 nir: fix MSVC build
Zero initialize struct with {0} instead of {}.
2019-02-22 22:38:05 -08:00
Caio Marcelo de Oliveira Filho
eb13211997 nir/copy_prop_vars: add tests for load/store elements of vectors
Test using array deref on vectors in loads and stores.  These are
marked DISABLED_ as this optimization is currently not done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
4f3809d389 nir: nir_build_deref_follower accept array derefs of vectors
Code itself already supports it, just make sure we can use it for
those cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
c4beadd28e nir/copy_prop_vars: change test helper to get intrinsics
Replace find_next_intrinsic(intrinsic, after) with
get_intrinsic(intrinsic, index).  This makes slightly more convenient
to check the resulting loads/stores/copies, since in most tests we
know which one we care about.  The cost is to perform more traversals,
but for such tests this is not a problem.

Added the ASSERT_EQ() on count to some tests missing it, so the
indices queried are always expected to find something.

Also, drop two nir_print_shader leftover calls in a test.

v2: Remove redundant assertions.  nir_src_comp_as_uint already
    assert what we need.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
fdcb9779d9 nir/copy_prop_vars: keep track of components in copy_entry
When a copy_entry is SSA, store not only the nir_ssa_def* for each
component, but also the source component they come from.  At the
moment this is always a match (i.e. 'component[i] == i'), because all
the operations for a copy_entry happen using definitions with the same
size.  This prepares the code for array_derefs of vectors, in which
'component[i] != i'.

Also, extract setting all SSA components into a function of its own.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
6624decbb5 nir/copy_prop_vars: add debug helpers
Disabled by default, to be used during development.  Adding those
so I don't rewrite some ad-hoc version of them everytime I'm working
with this pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
60d9bb9ff5 nir/copy_prop_vars: don't get confused by array_deref of vectors
For now these derefs are not handled, so don't let these get into the
copies list -- which would cause wrong propagations.  For load_derefs,
do nothing.  For store_derefs, invalidate whatever the store is
writing to.  For copy_derefs, invalidate whatever the copy is writing
to.

These cases will happen once derefs to SSBOs/UBOs are kept around long
enough to get optimized by copy_prop_vars.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Timothy Arceri
f48527e51a nir: allow nir_lower_phis_to_scalar() on more src types
Rather than only lowering if all srcs are scalarizable we instead
check that at least one src is scalarizable.

We change undef type to return false otherwise it will cause
regressions when it is the only scalarizable src.

total instructions in shared programs: 13219105 -> 13024547 (-1.47%)
instructions in affected programs: 1153797 -> 959239 (-16.86%)
helped: 581
HURT: 74

total cycles in shared programs: 333968972 -> 324807922 (-2.74%)
cycles in affected programs: 129809402 -> 120648352 (-7.06%)
helped: 571
HURT: 131

total spills in shared programs: 57947 -> 29130 (-49.73%)
spills in affected programs: 53364 -> 24547 (-54.00%)
helped: 351
HURT: 0

total fills in shared programs: 51310 -> 25468 (-50.36%)
fills in affected programs: 44882 -> 19040 (-57.58%)
helped: 351
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-23 11:11:51 +11:00
Timothy Arceri
d9e08e753b nir: clone instruction set rather than removing individual entries
This reduces the time spent in nir_opt_cse() by almost a half.

The massif tool from callgrind reported no change in peak
memory use with the large doliphin uber shaders I used for
testing.

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 08:36:36 +11:00
Jason Ekstrand
f98fd9d15a nir/lower_clip_cull: Fix an incorrect assert
Copy+paste error.  It was supposed to test cull and not clip.

Fixes: 4e69fba534 "nir: Rewrite lower_clip_cull_distance_arrays..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-21 12:05:12 -06:00
Jason Ekstrand
f9b2f10a41 nir: Fix a compile warning 2019-02-21 09:44:42 -06:00
Alejandro Piñeiro
0629b2a462 nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fs
On GLSL that info is set as a layout qualifier when redeclaring
gl_FragCoord, so somehow tied to a specific variable. But in practice,
they behave as a global of the shader. On ARB programs they are set
using a global OPTION (defined at ARB_fragment_coord_conventions), and
on SPIR-V using ExecutionModes, that are also not tied specifically to
the builtin.

This patch moves that info from nir variable and ir variable to nir
shader and gl_program shader_info respectively, so the map is more
similar to SPIR-V, and ARB programs, instead of more similar to GLSL.

FWIW, shader_info.fs already had pixel_center_integer, so this change
also removes some redundancy. Also, as struct gl_program also includes
a shader_info, we removed gl_program::OriginUpperLeft and
PixelCenterInteger, as it would be superfluous.

This change was needed because recently spirv_to_nir changed the order
in which execution modes and variables are handled, so the variables
didn't get the correct values. Now the info is set on the shader
itself, and we don't need to go back to the builtin variable to set
it.

Fixes: e68871f6a ("spirv: Handle constants and types before execution
                   modes")

v2: (Jason)
   * glsl_to_nir: get the info before glsl_to_nir, while all the rest
     of the info gathering is happening
   * prog_to_nir: gather the info on a general info-gathering pass,
     not on variable setup.

v3: (Jason)
   * Squash with the patch that removes that info from ir variable
   * anv: assert that OriginUpperLeft is true. It should be already
     set by spirv_to_nir.
   * blorp: set origin_upper_left on its core "compile fragment
     shader", not just on some specific places (for this we added an
     helper on a previous patch).
   * prog_to_nir: no need to gather specifically this fragcoord modes
     as the full gl_program shader_info is copied.
   * spirv_to_nir: assert that we are a fragment shader when handling
     this execution modes.

v4: (reported by failing gitlab pipeline #18750)
   * state_tracker: update too due changes on ir.h/gl_program

v5:
   * blorp: minor change after change on previous patch
   * radeonsi: update due this change.

v6: (Timothy Arceri)
   * prog_to_nir: remove extra whitespace
   * shader_info: don't use :1 on origin_upper_left
   * glsl: program.fs.origin_upper_left/pixel_center_integer can be
     move out of the shader list loop
2019-02-21 11:47:59 +01:00
Jason Ekstrand
1a93fc382b nir/xfb: Handle compact arrays in gather_xfb_info
This makes us properly handle gl_ClipDistance and gl_CullDistance.

Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
558c314504 nir/xfb: Work in terms of components rather than slots
We needed to better handle cases where a chunk of a variable starts at
some non-zero location_frac and rolls over into the next slot but may
not be more than 4 dwords.  For example, if gl_CullDistance is an array
of 3 things and has location_frac = 2, it will span across two vec4s but
is not, itself, bigger than a vec4.  If you ignore the clip/cull special
case, it's not allowed to happen for anything else because the only
things that can span more than one slot is dvec3 and dvec4 and they're
both bigger than a vec4.  The current code uses this attrib_slot thing
where we count attribute slots and iterate over them.  However, that
doesn't work in the case above because gl_CullDistance will have an
attrib_slot count of 1 even though it does span two slots.  We could fix
this by adjusting attrib_slot but we already have comp_mask and it's
easier to just handle it that way.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
4e69fba534 nir: Rewrite lower_clip_cull_distance_arrays to do a lot less lowering
Instead of going to all the work of to combine them into one array, just
make two arrays and use location_frac to colocate them within CLIP0.
Then the back-end can sort things out and stack them on top of each
other.  Thanks to ef99f4c8, we also don't need to set compact anymore.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
8f0fe71cc5 nir/xfb: Properly align 64-bit values
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
30b548fc62 compiler/types: Add a contains_64bit helper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Timothy Arceri
03783253b1 nir: remove non-ssa support from nir_copy_prop()
Even in a very basic shader this reduces the time spent in
nir_copy_prop() by ~17%.

No shader-db changes for radeonsi NIR or i965.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 10:18:24 +11:00
Kenneth Graunke
d6337b59f6 nir: Don't forget if-uses in new nir_opt_dead_cf liveness check
Commit 08bfd710a2. (nir/dead_cf: Stop
relying on liveness analysis) introduced a new check that iterated
through a SSA def's uses, to see if it's used.  But it only checked
normal uses, and not uses which are part of an 'if' condition.  This
led to it thinking more nodes were dead than possible.

Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test
(and related tests) with the out-of-tree Iris driver.

Fixes: 08bfd710a2 nir/dead_cf: Stop relying on liveness analysis
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 09:44:06 -08:00
Kenneth Graunke
3c2c6bd1c7 compiler: Make is_64bit(GL_*) helper more broadly available
I'd like to use this in the prog_parameter.c code, so I need to move it
into C, make it non-static, and so on.  This probably isn't the ideal
place for it, but I couldn't think of a better one.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 13:26:58 -08:00
Kenneth Graunke
535251487b nir: Don't reassociate add/mul chains containing only constants
The idea here is to reassociate a * (b * c) into (a * c) * b, when
b is a non-constant value, but a and c are constants, allowing them
to be combined.

But nothing was enforcing that 'b' must be non-constant, which meant
that running opt_algebraic in a loop would never terminate if the IR
contained non-folded constant expressions like 256 * 0.5 * 2.  Normally,
we call constant folding in such a loop too, but IMO it's better for
nir_opt_algebraic to be robust and not rely on that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581
Fixes: 32e266a9a5 i965: Compile fp64 funcs only if we do not have 64-bit hardware support

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-16 23:36:14 -08:00
Timothy Arceri
a801196ec9 nir: remove simple dead if detection from nir_opt_dead_cf()
This was probably useful when it was first written, however it
looks to be no longer necessary.

As far as I can tell these days dce is smart enough to remove useless
instructions from if branches. Once this is done
nir_opt_peephole_select() will end up removing the empty if.

Removing this support reduces the dolphin uber shader compilation
time spent in nir_opt_dead_cf() by a little over 7x.

No shader-db changes on i965 or radeonsi.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-02-16 10:45:31 +11:00
Ian Romanick
979b43b347 nir/algebraic: Simplify comparison with sequential integers starting with 0
All of the affected shaders are Unreal4 demos.

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437170 -> 15437001 (<.01%)
instructions in affected programs: 21536 -> 21367 (-0.78%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4
helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -4.07 -3.79
95% mean confidence interval for instructions %-change: -0.83% -0.77%
Instructions are helped.

total cycles in shared programs: 383007896 -> 383007378 (<.01%)
cycles in affected programs: 158640 -> 158122 (-0.33%)
helped: 38
HURT: 4
helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6
helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19%
HURT stats (abs)   min: 2 max: 3 x̄: 2.50 x̃: 2
HURT stats (rel)   min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -16.90 -7.77
95% mean confidence interval for cycles %-change: -0.39% -0.19%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8213746 -> 8213745 (<.01%)
instructions in affected programs: 127 -> 126 (-0.79%)
helped: 1
HURT: 0

total cycles in shared programs: 187734146 -> 187734144 (<.01%)
cycles in affected programs: 2132 -> 2130 (-0.09%)
helped: 1
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-15 11:11:02 -08:00
Ian Romanick
ad05920258 nir/algebraic: Convert some f2u to f2i
Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec
says:

     It is undefined to convert a negative floating-point value to an
     uint.

Assuming that (uint)some_float behaves like (uint)(int)some_float allows
some optimizations in the i965 backend to proceed.

This basically undoes the small amount of damage done by
"intel/compiler: Avoid propagating inequality cmods if types are
different".

v2: Replicate part of the commit message as a comment in the code.
Suggested by Jason.

shader-db results compairing *before* "intel/compiler: Avoid propagating
inequality cmods if types are different" and after this commit:

Skylake
total cycles in shared programs: 383007996 -> 383007896 (<.01%)
cycles in affected programs: 85208 -> 85108 (-0.12%)
helped: 13
HURT: 8
helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6
helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14%
HURT stats (abs)   min: 2 max: 12 x̄: 5.00 x̃: 3
HURT stats (rel)   min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07%
95% mean confidence interval for cycles value: -9.31 -0.21
95% mean confidence interval for cycles %-change: -0.24% <.01%
Cycles are helped.

Broadwell
total cycles in shared programs: 415251194 -> 415251370 (<.01%)
cycles in affected programs: 83750 -> 83926 (0.21%)
helped: 7
HURT: 13
helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12
helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30%
HURT stats (abs)   min: 2 max: 36 x̄: 19.69 x̃: 22
HURT stats (rel)   min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47%
95% mean confidence interval for cycles value: 0.76 16.84
95% mean confidence interval for cycles %-change: <.01% 0.37%
Inconclusive result (%-change mean confidence interval includes 0).

Haswell
total instructions in shared programs: 13823885 -> 13823886 (<.01%)
instructions in affected programs: 2249 -> 2250 (0.04%)
helped: 0
HURT: 1

total cycles in shared programs: 390094243 -> 390094001 (<.01%)
cycles in affected programs: 85640 -> 85398 (-0.28%)
helped: 15
HURT: 6
helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18
helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42%
HURT stats (abs)   min: 2 max: 14 x̄: 6.00 x̃: 2
HURT stats (rel)   min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04%
95% mean confidence interval for cycles value: -17.36 -5.69
95% mean confidence interval for cycles %-change: -0.44% -0.14%
Cycles are helped.

Ivy Bridge
total cycles in shared programs: 180986448 -> 180986552 (<.01%)
cycles in affected programs: 34835 -> 34939 (0.30%)
helped: 0
HURT: 10
HURT stats (abs)   min: 2 max: 18 x̄: 10.40 x̃: 10
HURT stats (rel)   min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30%
95% mean confidence interval for cycles value: 4.67 16.13
95% mean confidence interval for cycles %-change: 0.20% 0.35%
Cycles are HURT.

Sandy Bridge
total cycles in shared programs: 154603969 -> 154603970 (<.01%)
cycles in affected programs: 171514 -> 171515 (<.01%)
helped: 25
HURT: 14
helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1
helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04%
HURT stats (abs)   min: 1 max: 8 x̄: 3.29 x̃: 3
HURT stats (rel)   min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11%
95% mean confidence interval for cycles value: -0.91 0.96
95% mean confidence interval for cycles %-change: -0.02% 0.04%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-15 11:11:02 -08:00
Juan A. Suarez Romero
1fb24080b7 nir: remove jump from two merging jump-ending blocks
In opt_peel_initial_if optimization, when moving the continue list to
end of the continue block, before the jump, could happen that the
continue list itself also ends with a jump.

This would mean that we would have two jump instructions in a row: the
first one from the continue list and the second one from the contine
block.

As inserting an instruction after a jump is not allowed (and it does not
make sense, as it will not be executed), remove the jump from the
continue block and keep the one from continue list, as it will be
executed first.

CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-15 15:16:24 +01:00
Juan A. Suarez Romero
69be9934a7 nir: move ALU instruction before the jump instruction
opt_split_alu_of_phi moves ALU instruction to the end of continue block.

But if the continue block ends with a jump instruction (an explicit
"continue" instruction) then the ALU must be inserted before the jump,
as it is illegal to add instructions after the jump.

CC: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0881e90c09 ("nir: Split ALU instructions in loops that read phis")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-15 15:14:36 +01:00
Jason Ekstrand
08bfd710a2 nir/dead_cf: Stop relying on liveness analysis
The liveness analysis pass is fairly expensive because it has to build
large bit-sets and run a fix-point algorithm on them.  Instead of
requiring liveness for detecting if values escape a CF node, just take
advantage of the structured nature of NIR and use block indices instead.
This only requires the block index metadata which is the fastest we have
metadata to generate.

No shader-db changes on Kaby Lake

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-14 23:06:29 -06:00
Jason Ekstrand
b50465d197 nir/dead_cf: Inline cf_node_has_side_effects
We want to handle live SSA values differently and it's going to involve
walking the instructions.  We can make it a single instruction walk if
we combine it with cf_node_has_side_effects.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-14 23:05:28 -06:00
Jason Ekstrand
b14d7a6b60 nir: Silence a couple of warnings in release builds
[28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'.
../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’:
../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable]
    unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0};
             ^~~~~~~~~~
[36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'.
../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function]
 instr_each_src_and_dest_is_ssa(nir_instr *instr)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:35 -06:00
Kenneth Graunke
6775665e5e spirv: Eliminate dead input/output variables after translation.
spirv_to_nir can generate input/output variables which are illegal
for the current shader stage, which would cause nir_validate_shader
to balk.  After my recent commit to start decorating arrays as compact,
dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started
hitting validation errors due to outputs in a TCS (not intended for the
TCS at all) not being per-vertex arrays.

Thanks to Jason Ekstrand for suggesting this approach.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573
Fixes: ef99f4c8d1 compiler: Mark clip/cull distance arrays as compact before lowering.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
2019-02-14 11:03:56 -08:00
Ian Romanick
9a918050e0 spirv: Add missing break
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: c6465fec0c ("spirv: add SpvCapabilityInt64Atomics")
CID: 1442555
2019-02-14 08:35:59 -08:00
Eric Anholt
42d2cae907 nir: Move panfrost's isign lowering to nir_opt_algebraic.
I wanted to reuse this from v3d.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-14 00:32:30 +00:00
Timothy Arceri
68baf96824 nir: turn an ssa check in nir_search into an assert
Everything should be in ssa form when we call this. This is a
hotpath so replace the check with an assert.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-02-14 09:35:32 +11:00
Timothy Arceri
46a4d2c867 nir: turn ssa check into an assert
Everthing should be in ssa form when this is called. Checking
for it here is expensive so turn this into an assert instead.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-02-14 09:35:32 +11:00
Timothy Arceri
0a89c9779a nir: prehash instruction in nir_instr_set_add_or_rewrite()
There is no need to hash the instruction twice, especially as we
end up adding it in the majority of cases.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-14 09:35:32 +11:00