Commit graph

1016 commits

Author SHA1 Message Date
Timothy Arceri
fef6325e58 nir: always attempt to find loop terminators
This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-29 16:02:05 +10:00
Caio Marcelo de Oliveira Filho
f172a77dd8 nir: Remove outdated comment
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-28 08:11:03 -07:00
Kevin Rogovin
119435c877 mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().

One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-08-28 17:15:10 +03:00
Jason Ekstrand
07a227f543 nir: Pull block_ends_in_jump into nir.h
We had two different implementations in different files.  May as well
have one and put it in nir.h.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-27 02:15:38 -05:00
Jason Ekstrand
53072582dc nir: Add an array copy optimization
This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array.  The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore.  It can always be improved later if needed.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:47:47 -05:00
Jason Ekstrand
be8d009908 nir: Add a array-of-vector variable shrinking pass
This pass looks for variables with vector or array-of-vector types and
narrows the type to only the components used.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:46:56 -05:00
Jason Ekstrand
fa6417495c nir: Add an array splitting pass
This pass looks for array variables where at least one level of the
array is never indirected and splits it into multiple smaller variables.

This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through arrays of arrays and can detect indirects on just
one level or even see that arr[i][0][5] does not alias arr[i][1][j].
This pass exists to help other passes more easily see through arrays of
arrays.  If a back-end does implement arrays using scratch or indirects
on registers, having more smaller arrays is likely to have better memory
efficiency.

v2 (Jason Ekstrand):
 - Better comments and naming (some from Caio)
 - Rework to use one hash map instead of two

v2.1 (Jason Ekstrand):
 - Fix a couple of bugs that were added in the rework including one
   which basically prevented it from running

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:44:14 -05:00
Jason Ekstrand
26eb077ec4 nir: Add a structure splitting pass
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split".  This
pass exists to help other passes more easily see through structure
variables.  If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:44:14 -05:00
Ian Romanick
0842655ac6 nir: Add floating point atomic min, max, and compare-swap instrinsics
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Ian Romanick
69ce7baa9e nir: Add floating point atomic add instrinsics
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Caio Marcelo de Oliveira Filho
410de0e3f1 nir: Give end_block its own index
Since there's no particular reason for the index to be 0, choose an
index that is not used by other block.  This is convenient when we
store "per-block" data in an array AND look for the successors
data (e.g. any kind of backwards data-flow analysis).

v2: Add a note about end_block's index. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-22 14:41:26 -07:00
Caio Marcelo de Oliveira Filho
8364ec3fce nir: Skip common instructions when comparing deref paths
Deref paths may share the same deref instructions in their chains,
e.g.

    ssa_100 = deref_var A
    ssa_101 = deref_struct "array_field" of ssa_100
    ssa_102 = deref_array "[1]" of ssa_101
    ssa_103 = deref_struct "field_a" of ssa_102
    ssa_104 = deref_struct "field_a" of ssa_103

when comparing the two last deref instructions, their paths will share
a common sequence ssa_100, ssa_101, ssa_102.  This patch skips to next
iteration if the deref instructions are the same.  Path[0] (the var)
is still handled specially, so in the case above, only ssa_101 and
ssa_102 will be skipped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-22 14:41:26 -07:00
Caio Marcelo de Oliveira Filho
5196041e93 nir: Export deref comparison functions
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-22 14:41:26 -07:00
Jason Ekstrand
68ae66542a nir/vars_to_ssa: Don't build deref nodes for non-local variables
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 14:17:38 -05:00
Kai Wasserbäch
6f0647c0b2 nir: mark *prev_block as MAYBE_UNUSED in opt_peel_loop_initial_if
Only used, when asserts are enabled.

Fixes an unused-variable warning with gcc-8:
 ../../../src/compiler/nir/nir_opt_if.c: In function 'opt_peel_loop_initial_if':
 ../../../src/compiler/nir/nir_opt_if.c:109:15: warning: unused variable 'prev_block' [-Wunused-variable]
     nir_block *prev_block =
                ^~~~~~~~~~

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-18 10:34:15 +10:00
Timothy Arceri
d0803dea11 nir: allow more nested loops to be unrolled
The innermost check was added to stop us from unrolling multiple
loops in a single pass, and to stop outer loops from unrolling.

When we successfully unroll a loop we need to run the analysis
pass again before deciding if we want to go ahead an unroll a
second loop.

However the logic was flawed because it never tried to unroll any
nested loops other than the first innermost loop it found.
If this innermost loop is not unrolled we end up skipping all
other nested loops.

This unrolls a loop in a Deus Ex: MD shader on ultra settings and
also unrolls a loop in a shader from the game Prey when running
on DXVK.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-18 09:03:13 +10:00
Alejandro Piñeiro
d6c8066663 nir/glsl: make nir_remap_attributes public
As we plan to reuse it for ARB_gl_spirv implementation.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-13 16:28:27 +02:00
Alejandro Piñeiro
5332d7582d nir: add how_declared to nir_variable.data
Equivalent to the already existing how_declared at GLSL IR. The only
difference is that we are not adding all the declaration_type
available on GLSL, only the one that we will use on the short term. We
would add more mode if needed on the future.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-13 16:28:26 +02:00
Mathieu Bridon
2ee1c86d71 meson: Build with Python 3
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.

Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-10 15:15:09 -07:00
Mathieu Bridon
1e668ca111 python: Better check for integer types
Python 3 lost the long type: now everything is an int, with the right
size.

This commit makes the script compatible with Python 2 (where we check
for both int and long) and Python 3 (where we only check for int).

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-09 16:49:19 -07:00
Mathieu Bridon
14f1ab998f python: Do not mix bytes and unicode strings
Mixing the two is a long-standing recipe for errors in Python 2, so much
so that Python 3 now completely separates them.

This commit stops treating both as if they were the same, and in the
process makes the script compatible with both Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-09 16:49:19 -07:00
Mathieu Bridon
d9ca4a172e python: Use the right function for the job
The code was just reimplementing itertools.combinations_with_replacement
in a less efficient way.

This does change the order of the results slightly, but it should be ok.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-09 16:49:18 -07:00
Mathieu Bridon
ba1ebf2ee1 python: Specify the template output encoding
We're trying to write a unicode string (i.e decoded) to a file opened
in binary (i.e encoded) mode.

In Python 2 this works, because of the automatic conversion between
byte and unicode strings.

In Python 3 this fails though, as no automatic conversion is attempted.

This change makes the scripts compatible with both versions of Python.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-07 13:28:35 -07:00
Ian Romanick
3b07d28f81 nir: Transform expressions of b2f(a) and b2f(b) to a == b
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14276886 -> 14276838 (<.01%)
instructions in affected programs: 312 -> 264 (-15.38%)
helped: 2
HURT: 0

total cycles in shared programs: 532578395 -> 532570985 (<.01%)
cycles in affected programs: 682562 -> 675152 (-1.09%)
helped: 374
HURT: 4
helped stats (abs) min: 2 max: 200 x̄: 20.39 x̃: 18
helped stats (rel) min: 0.07% max: 11.64% x̄: 1.25% x̃: 1.28%
HURT stats (abs)   min: 2 max: 114 x̄: 53.50 x̃: 49
HURT stats (rel)   min: 0.06% max: 11.70% x̄: 5.02% x̃: 4.15%
95% mean confidence interval for cycles value: -21.30 -17.91
95% mean confidence interval for cycles %-change: -1.30% -1.06%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10488123 -> 10488075 (<.01%)
instructions in affected programs: 336 -> 288 (-14.29%)
helped: 2
HURT: 0

total cycles in shared programs: 150260379 -> 150260439 (<.01%)
cycles in affected programs: 4726 -> 4786 (1.27%)
helped: 0
HURT: 2

No changes on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
c658b6c4c8 nir: Transform expressions of b2f(a) and b2f(b) to a ^^ b
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14276892 -> 14276886 (<.01%)
instructions in affected programs: 484 -> 478 (-1.24%)
helped: 2
HURT: 0

total cycles in shared programs: 532578397 -> 532578395 (<.01%)
cycles in affected programs: 3522 -> 3520 (-0.06%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
3aca80aabc nir: Transform expressions of b2f(a) and b2f(b) to !(a && b)
All Gen platforms had pretty similar results. (Skylake shown)
total cycles in shared programs: 532578400 -> 532578397 (<.01%)
cycles in affected programs: 2784 -> 2781 (-0.11%)
helped: 1
HURT: 1
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.26% max: 0.26% x̄: 0.26% x̃: 0.26%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%

v2: s/fmax/fmin/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
1713c97181 nir: Transform expressions of b2f(a) and b2f(b) to a && b
No changes on any Gen platform.

v2: s/fmax/fmin/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
4425f4786a nir: Transform expressions of b2f(a) and b2f(b) to !(a || b)
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14276961 -> 14276892 (<.01%)
instructions in affected programs: 3215 -> 3146 (-2.15%)
helped: 28
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2
helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92%
95% mean confidence interval for instructions value: -2.87 -2.06
95% mean confidence interval for instructions %-change: -5.73% -2.95%
Instructions are helped.

total cycles in shared programs: 532577068 -> 532578400 (<.01%)
cycles in affected programs: 121864 -> 123196 (1.09%)
helped: 35
HURT: 30
helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22
helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86%
HURT stats (abs)   min: 2 max: 246 x̄: 93.80 x̃: 36
HURT stats (rel)   min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58%
95% mean confidence interval for cycles value: -5.02 46.01
95% mean confidence interval for cycles %-change: -0.99% 1.65%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 7781299 -> 7781342 (<.01%)
instructions in affected programs: 22300 -> 22343 (0.19%)
helped: 13
HURT: 40
helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3
helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43%
95% mean confidence interval for instructions value: 0.23 1.39
95% mean confidence interval for instructions %-change: -1.18% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 177878928 -> 177879332 (<.01%)
cycles in affected programs: 383298 -> 383702 (0.11%)
helped: 7
HURT: 43
helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10
helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40%
HURT stats (abs)   min: 2 max: 38 x̄: 11.02 x̃: 12
HURT stats (rel)   min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09%
95% mean confidence interval for cycles value: 5.21 10.95
95% mean confidence interval for cycles %-change: -0.51% 0.21%
Inconclusive result (%-change mean confidence interval includes 0).

v2: s/fmin/fmax/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
6b3670ae80 nir: Transform -fabs(a) >= 0 to a == 0
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14276964 -> 14276961 (<.01%)
instructions in affected programs: 411 -> 408 (-0.73%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.47% max: 1.96% x̄: 1.04% x̃: 0.68%

total cycles in shared programs: 532577062 -> 532577068 (<.01%)
cycles in affected programs: 1093 -> 1099 (0.55%)
helped: 1
HURT: 1
helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
helped stats (rel) min: 7.77% max: 7.77% x̄: 7.77% x̃: 7.77%
HURT stats (abs)   min: 22 max: 22 x̄: 22.00 x̃: 22
HURT stats (rel)   min: 2.48% max: 2.48% x̄: 2.48% x̃: 2.48%

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
46e7c340d4 nir: Transform expressions of b2f(a) and b2f(b) to a || b
All Gen6+ platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277184 -> 14276964 (<.01%)
instructions in affected programs: 10082 -> 9862 (-2.18%)
helped: 37
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4
helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for instructions value: -7.87 -3.71
95% mean confidence interval for instructions %-change: -6.98% -3.16%
Instructions are helped.

total cycles in shared programs: 532577990 -> 532577062 (<.01%)
cycles in affected programs: 170959 -> 170031 (-0.54%)
helped: 33
HURT: 9
helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30
helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13%
HURT stats (abs)   min: 2 max: 24 x̄: 10.22 x̃: 8
HURT stats (rel)   min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22%
95% mean confidence interval for cycles value: -31.23 -12.96
95% mean confidence interval for cycles %-change: -2.90% -1.02%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 7781539 -> 7781301 (<.01%)
instructions in affected programs: 10169 -> 9931 (-2.34%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6
helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88%
95% mean confidence interval for instructions value: -9.53 -5.34
95% mean confidence interval for instructions %-change: -5.94% -2.12%
Instructions are helped.

total cycles in shared programs: 177878590 -> 177878932 (<.01%)
cycles in affected programs: 78706 -> 79048 (0.43%)
helped: 7
HURT: 21
helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28
helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37%
HURT stats (abs)   min: 2 max: 86 x̄: 24.48 x̃: 22
HURT stats (rel)   min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70%
95% mean confidence interval for cycles value: 0.30 24.13
95% mean confidence interval for cycles %-change: -1.52% 1.01%
Inconclusive result (%-change mean confidence interval includes 0).

v2: s/fmin/fmax/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
be7d3ba34a nir: Transform -fabs(a) < 0 to a != 0
Unlike the much older -abs(a) >= 0.0 transformation, this is not
precise.  The behavior changes if a is NaN.

All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277216 -> 14277184 (<.01%)
instructions in affected programs: 2300 -> 2268 (-1.39%)
helped: 8
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.00 x̃: 3
helped stats (rel) min: 0.48% max: 15.15% x̄: 4.41% x̃: 1.01%
95% mean confidence interval for instructions value: -6.45 -1.55
95% mean confidence interval for instructions %-change: -9.96% 1.13%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 532577848 -> 532577990 (<.01%)
cycles in affected programs: 17486 -> 17628 (0.81%)
helped: 2
HURT: 5
helped stats (abs) min: 2 max: 6 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.06% max: 1.81% x̄: 0.93% x̃: 0.93%
HURT stats (abs)   min: 6 max: 50 x̄: 30.00 x̃: 26
HURT stats (rel)   min: 0.55% max: 2.17% x̄: 1.19% x̃: 1.02%
95% mean confidence interval for cycles value: -1.06 41.63
95% mean confidence interval for cycles %-change: -0.58% 1.74%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
d49eab2757 nir: Rearrange bcsel with two bcsel sources
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277220 -> 14277216 (<.01%)
instructions in affected programs: 422 -> 418 (-0.95%)
helped: 2
HURT: 0

total cycles in shared programs: 532577908 -> 532577848 (<.01%)
cycles in affected programs: 2800 -> 2740 (-2.14%)
helped: 2
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
b92fded6eb nir: Collapse more repeated bcsels on the same argument
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277230 -> 14277220 (<.01%)
instructions in affected programs: 751 -> 741 (-1.33%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2
helped stats (rel) min: 1.23% max: 1.40% x̄: 1.32% x̃: 1.32%
95% mean confidence interval for instructions value: -3.42 -1.58
95% mean confidence interval for instructions %-change: -1.47% -1.17%
Instructions are helped.

total cycles in shared programs: 532577947 -> 532577908 (<.01%)
cycles in affected programs: 10641 -> 10602 (-0.37%)
helped: 4
HURT: 3
helped stats (abs) min: 1 max: 40 x̄: 13.75 x̃: 7
helped stats (rel) min: 0.11% max: 3.08% x̄: 1.10% x̃: 0.60%
HURT stats (abs)   min: 2 max: 8 x̄: 5.33 x̃: 6
HURT stats (rel)   min: 0.13% max: 0.55% x̄: 0.30% x̃: 0.23%
95% mean confidence interval for cycles value: -20.69 9.55
95% mean confidence interval for cycles %-change: -1.63% 0.63%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-04 01:12:03 -07:00
Ian Romanick
408330ed48 nir: Don't compare i2f or u2i with zero
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14277620 -> 14277230 (<.01%)
instructions in affected programs: 36905 -> 36515 (-1.06%)
helped: 101
HURT: 6
helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6
helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51%
HURT stats (abs)   min: 1 max: 28 x̄: 10.00 x̃: 1
HURT stats (rel)   min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47%
95% mean confidence interval for instructions value: -4.59 -2.70
95% mean confidence interval for instructions %-change: -1.90% -1.41%
Instructions are helped.

total cycles in shared programs: 532580716 -> 532577947 (<.01%)
cycles in affected programs: 940575 -> 937806 (-0.29%)
helped: 92
HURT: 12
helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62
helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41%
HURT stats (abs)   min: 10 max: 1112 x̄: 160.58 x̃: 63
HURT stats (rel)   min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20%
95% mean confidence interval for cycles value: -50.66 -2.59
95% mean confidence interval for cycles %-change: -2.09% -0.73%
Cycles are helped.

total spills in shared programs: 8116 -> 8124 (0.10%)
spills in affected programs: 200 -> 208 (4.00%)
helped: 0
HURT: 2

total fills in shared programs: 11086 -> 11094 (0.07%)
fills in affected programs: 436 -> 444 (1.83%)
helped: 0
HURT: 2

Ivy Bridge and Haswell had similar results. (Haswell shown)
total instructions in shared programs: 12979054 -> 12978067 (<.01%)
instructions in affected programs: 33633 -> 32646 (-2.93%)
helped: 120
HURT: 2
helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13
helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17%
HURT stats (abs)   min: 18 max: 18 x̄: 18.00 x̃: 18
HURT stats (rel)   min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00%
95% mean confidence interval for instructions value: -9.19 -6.99
95% mean confidence interval for instructions %-change: -5.27% -3.62%
Instructions are helped.

total cycles in shared programs: 411212880 -> 411199636 (<.01%)
cycles in affected programs: 696441 -> 683197 (-1.90%)
helped: 107
HURT: 5
helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146
helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88%
HURT stats (abs)   min: 2 max: 50 x̄: 24.00 x̃: 22
HURT stats (rel)   min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25%
95% mean confidence interval for cycles value: -136.96 -99.54
95% mean confidence interval for cycles %-change: -9.75% -6.53%
Cycles are helped.

total spills in shared programs: 78623 -> 78631 (0.01%)
spills in affected programs: 66 -> 74 (12.12%)
helped: 0
HURT: 2

total fills in shared programs: 80104 -> 80108 (<.01%)
fills in affected programs: 133 -> 137 (3.01%)
helped: 0
HURT: 2

No changes on Sandy Bridge, Iron Lake, or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
a3845616a2 nir: Remove f2i(i2f(x)) conversions
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14277978 -> 14277620 (<.01%)
instructions in affected programs: 36957 -> 36599 (-0.97%)
helped: 76
HURT: 1
helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4
helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87%
HURT stats (abs)   min: 14 max: 14 x̄: 14.00 x̃: 14
HURT stats (rel)   min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36%
95% mean confidence interval for instructions value: -7.06 -2.24
95% mean confidence interval for instructions %-change: -1.28% -0.77%
Instructions are helped.

total cycles in shared programs: 532584581 -> 532580716 (<.01%)
cycles in affected programs: 973591 -> 969726 (-0.40%)
helped: 76
HURT: 1
helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32
helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19%
HURT stats (abs)   min: 8280 max: 8280 x̄: 8280.00 x̃: 8280
HURT stats (rel)   min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10%
95% mean confidence interval for cycles value: -386.98 286.59
95% mean confidence interval for cycles %-change: -1.41% -0.81%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8127 -> 8116 (-0.14%)
spills in affected programs: 108 -> 97 (-10.19%)
helped: 1
HURT: 0

total fills in shared programs: 11090 -> 11086 (-0.04%)
fills in affected programs: 440 -> 436 (-0.91%)
helped: 1
HURT: 1

Haswell
total instructions in shared programs: 12979174 -> 12979054 (<.01%)
instructions in affected programs: 9040 -> 8920 (-1.33%)
helped: 14
HURT: 1
helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6
helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14%
HURT stats (abs)   min: 3 max: 3 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19%
95% mean confidence interval for instructions value: -13.58 -2.42
95% mean confidence interval for instructions %-change: -3.94% -1.01%
Instructions are helped.

total cycles in shared programs: 411227148 -> 411212880 (<.01%)
cycles in affected programs: 630506 -> 616238 (-2.26%)
helped: 15
HURT: 0
helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38
helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17%
95% mean confidence interval for cycles value: -2544.28 641.88
95% mean confidence interval for cycles %-change: -6.89% -0.94%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 78626 -> 78623 (<.01%)
spills in affected programs: 42 -> 39 (-7.14%)
helped: 1
HURT: 0

total fills in shared programs: 80111 -> 80104 (<.01%)
fills in affected programs: 140 -> 133 (-5.00%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11684101 -> 11684030 (<.01%)
instructions in affected programs: 3080 -> 3009 (-2.31%)
helped: 4
HURT: 1
helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5
helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99%
HURT stats (abs)   min: 3 max: 3 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15%
95% mean confidence interval for instructions value: -45.59 17.19
95% mean confidence interval for instructions %-change: -9.38% -1.56%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 258407697 -> 258389653 (<.01%)
cycles in affected programs: 328323 -> 310279 (-5.50%)
helped: 5
HURT: 0
helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32
helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60%
95% mean confidence interval for cycles value: -11616.71 4399.11
95% mean confidence interval for cycles %-change: -16.56% -2.03%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4537 -> 4528 (-0.20%)
spills in affected programs: 64 -> 55 (-14.06%)
helped: 1
HURT: 0

total fills in shared programs: 4823 -> 4815 (-0.17%)
fills in affected programs: 189 -> 181 (-4.23%)
helped: 1
HURT: 1

Sandy Bridge
total instructions in shared programs: 10488464 -> 10488449 (<.01%)
instructions in affected programs: 272 -> 257 (-5.51%)
helped: 3
HURT: 0
helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5
helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49%

total cycles in shared programs: 150263359 -> 150263263 (<.01%)
cycles in affected programs: 7978 -> 7882 (-1.20%)
helped: 3
HURT: 0
helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32
helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23%

No changes on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick
ea6c276436 nir: Mark the 0.0 < abs(a) transformation as imprecise
Unlike the much older -abs(a) >= 0.0 transformation, this is not
precise.  The behavior changes if the source is NaN.

No shader-db changes on any platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Timothy Arceri
d5175d21c7 nir: add fall through comment to nir_gather_info
This stops Coverity reporting a defect and helps make the code less
error-prone.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-03 09:30:57 +10:00
Jason Ekstrand
7f75cf2a94 nir/lower_indirect: Bail early if modes == 0
There's no point in walking the program if we're never going to actually
lower anything.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Dylan Baker
34998aae18 nir/meson: fix c vs cpp args for nir test
Fixes: d1992255bb
       ("meson: Add build Intel "anv" vulkan driver")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-01 12:51:22 -07:00
Mathieu Bridon
ad363913e6 python: Explicitly add the 'L' suffix on Python 3
Python 2 had two integer types: int and long. Python 3 dropped the
latter, as it made the int type automatically support bigger numbers.

As a result, Python 3 lost the 'L' suffix on integer litterals.

This probably doesn't make much difference when compiling the generated
C code, but adding it explicitly means that both Python 2 and 3 generate
the exact same C code anyway, which makes it easier to compare and check
for discrepencies when moving to Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-01 14:26:19 +01:00
Mathieu Bridon
e40200e0aa python: Don't abuse hex()
The hex() builtin returns a string containing the hexa-decimal
representation of an integer.

When the argument is not an integer, then the function calls that
object's __hex__() method, if one is defined. That method is supposed to
return a string.

While that's not explicitly documented, that string is supposed to be a
valid hexa-decimal representation for a number. Python 2 doesn't enforce
this though, which is why we got away with returning things like
'NIR_TRUE' which are not numbers.

In Python 3, the hex() builtin instead calls an object's __index__()
method, which itself must return an integer. That integer is then
automatically converted to a string with its hexa-decimal representation
by the rest of the hex() function.

As a result, we really can't make this compatible with Python 3 as it
is.

The solution is to stop using the hex() builtin, and instead use a hex()
object method, which can return whatever we want, in Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-01 14:26:19 +01:00
Neil Roberts
1e3f61d1d5 nir/gather_info: Set info.gs.uses_streams
Whenever a non-zero stream is written to it now sets uses_streams to
true. This reflects the code in validate_geometry_shader_emissions for
GLSL.

v2: set uses_streams at gather_info instead that at spirv to nir
    (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-31 13:18:28 +02:00
Neil Roberts
3fd5b4c7aa nir: Add members for the explicit XFB properties to nir_variable
These are copied from the from the corresponding values in
ir_variable. The intention is to eventually use them in a pure-NIR
linker.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-31 13:18:28 +02:00
Jason Ekstrand
05fb2f88ec nir/instr_set: Fix nir_instrs_equal for derefs
We weren't returning at the end of the nir_isntr_type_deref case in
nir_instrs_equal and it was falling through to the default of false.
While we're at it, make the default unreachable because all statements
in the switch now have their own returns.  Had we done that before, we
would have caught this bug a long time ago.

Fixes: 19a4662a54 "nir: Add a deref instruction type"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
2018-07-29 13:39:35 -07:00
Jason Ekstrand
9a4ab4c120 nir: Take if uses into account in ssa_def_components_read
Fixes: d800b7daa5 "nir: Add a helper for figuring out what..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-29 13:39:35 -07:00
Karol Herbst
bc0e0c2818 nir/lower_int64: mark all metadata as dirty
v2: use nir_metadata_preserve
    preserve metadata in case of !progress

Fixes: 074f5ba0b5
       "nir: Add a simple int64 lowering pass"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-28 19:59:28 +02:00
Kenneth Graunke
488972222c i965: Combine both gl_PatchVerticesIn lowering passes.
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases.  Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward.  This simplified the
passes, but made life painful for the callers.

This patch combines both into a single pass.  If you give it a non-zero
static count, it uses that.  If you give it Mesa state slots, it turns
it back into a built-in uniform.  Otherwise, it does nothing.

This also moves the i965 uniform lowering out to shared code.

v2: Make token arrays const.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-07-26 21:51:36 -07:00
Eric Anholt
d934d3206e nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-26 11:00:34 -07:00
Samuel Pitoiset
6465bf0015 nir: remove wrong assertion in print_var_decl()
This breaks printing input/output variables with more than
4 components like mat4.

Fixes: 1beef89ad8 ("nir: prepare for bumping up max components to 16")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-26 08:57:38 +02:00
Jason Ekstrand
b3b170ade9 nir: Add a couple of iand/ior optimizations
Spotted in a shader in Batman: Arkham City.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-24 20:39:43 -07:00