Commit graph

4029 commits

Author SHA1 Message Date
Jason Ekstrand
548da20b22 nir/lower_doubles: Handle fdiv and fsub directly
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
d7d35a9522 nir/lower_doubles: Use the new NIR lowering framework
One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
197a08dc69 nir/lower_doubles: Use "alu" for the nir_alu_instr
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
d65902c179 nir/lower_int64: Use the core NIR lowering framework
One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
c1cffa4249 nir/alu_to_scalar: Use the new NIR lowering framework
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
eb768b0a09 nir/alu_to_scalar: Use "alu" as the name for the nir_alu_instr
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
998d84fca5 nir/lower_system_values: Support lowering more intrinsics
Instead of only lowering system from variables, lower most to intrinsics
and let the lowering framework immediately lower the intrinsic.  This
will result in a bit more instruction churn but it means that NIR code
builders can just use intrinsics instead of everything having to go
through variables.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
ae8caaadee nir/lower_system_values: Drop the context-aware builder functions
Instead of having context-aware builder functions, just provide lowering
for the system value intrinsics and let nir_shader_lower_instructions
handle the recursion for us.  This makes everything a bit simpler and
means that the lowering can also be used if something comes in as a
system value intrinsic rather than a load_deref.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
58ffd7fbf6 nir/lower_system_values: Use the new generic NIR lowering helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
ce3af830cb nir/lower_subgroups: Use the new generic NIR lowering helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
758fdce9fe nir: Add some generic helpers for writing lowering passes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
c74b98486a nir: Add a helper for fetching the SSA def from an instruction
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Caio Marcelo de Oliveira Filho
1210e8caaf spirv: Ignore ArrayStride for storage classes that should not use it
The stride was already overriden when using
lower_workgroup_access_to_offsets, so elaborate a bit the commentary
there.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-15 16:18:57 -07:00
Caio Marcelo de Oliveira Filho
026cfa1099 spirv: Fix stride calculation when lowering Workgroup to offsets
Use alignment to calculate the stride associated with the pointer
types.  That stride is used when the pointers are casted to arrays.

Note that size alone is not sufficient, e.g. struct { vec2 a; vec1 b;
} will have element an element size of 12 bytes, but the stride needs
to be 16 bytes to respect the 8 byte alignment.

Fixes: 050eb6389a "spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-15 16:18:46 -07:00
Jason Ekstrand
0ba508d7a3 nir,intel: Add support for lowering 64-bit nir_opt_extract_*
We need this when doing full software 64-bit emulation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-15 16:08:37 -05:00
Jason Ekstrand
7a19e05e8c nir/opt_if: Clean up single-src phis in opt_if_loop_terminator
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071
Fixes: 2a74296f24 "nir: add opt_if_loop_terminator()"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-15 19:58:51 +00:00
Alejandro Piñeiro
bb3bbdfbbd glsl/shader_cache: handle SPIR-V shaders
Right now we don't have cache support for SPIR-V shaders (from
ARB_gl_spirv). Right now they are properly skipped because they fall
on the ff shader code path (no key, no name), but it would be better
to update current comments, and add some guards.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov
637b168470 nir/linker: Initialize UniformDataDefaults when using SPIR-V
Allocate UniformDataDefaults and fill in the data defaults when
linking a SPIR-V program. Among other things, this allows program
serialization to work.

It allows the following piglit test (when run on SPIR-V mode) to pass:
  spec/arb_get_program_binary/execution/uniform-after-restore.shader_test

v2: use memcpy to initialize UniformDataDefaults

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov
761b0fe95f glsl/serialize: Update write_program_resource_data() to handle NULL input and output variable names
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov
c3122d2431 glsl/serialize: Handle NULL uniform name in write_uniforms()
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
cafc1a40d4 nir/types: Add glsl_type_is_unsized_array helper
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
bfc5e46746 nir/linker: Fill TOP_LEVEL_ARRAY_SIZE and STRIDE
From the ARB_program_interface_query specification:

    "For the property TOP_LEVEL_ARRAY_SIZE, a single integer
    identifying the number of active array elements of the top-level
    shader storage block member containing to the active variable is
    written to <params>.  If the top-level block member is not
    declared as an array, the value one is written to <params>.  If
    the top-level block member is an array with no declared size, the
    value zero is written to <params>."

    "For the property TOP_LEVEL_ARRAY_STRIDE, a single integer
    identifying the stride between array elements of the top-level
    shader storage block member containing the active variable is
    written to <params>.  For top-level block members declared as
    arrays, the value written is the difference, in basic machine
    units, between the offsets of the active variable for consecutive
    elements in the top-level array.  For top-level block members not
    declared as an array, zero is written to <params>."

v2: move top_level_array_size and stride into nir_link_uniforms_state
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
ae2ea5ec1f nir/linker: Compute the offset for non-trivial uniform types.
ARB_gl_spirv points that the offset must be explicit, however this is
true for 'root' types. For complex types, like struct members or
arrays of arraya, it needs to be computed.

We are not using the offset stored in the gl_buffer_variables during
the uniform blocks linking because currently we do not have a way to
relate a gl_buffer_variable with its corresponding gl_uniform_storage.
The GLSL path uses the name for that, but we can not rely on that
because names are optional in SPIR-V.

Notice that uniforms non-backed by a buffer object will have an offset
equal to -1, like in the GLSL path.

v2: add offset and var_is_in_block as per-variable state in
    nir_link_uniforms_state (Arcady)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
e15c663d8e nir/linker: Add atomic counters to the program resource list
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
e1464a1cf8 nir/linker: Add XFB resources to the program resource list
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
53087a89ac nir/linker: Add BUFFER_VARIABLEs to the prog resource list
v2: use link_util_should_add_buffer_variable() (Arcady)
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
ffdb44d3a0 nir/linker: Add inputs/outputs to the program resource list
v2: added TODO comment hinting possible future refactoring of
    nir_build_program_resource_list and build_program_resource_list,
    to avoid code duplication (Alejandro, to explicitly reflect a
    valid concern from Timothy during the review).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Alejandro Piñeiro
691cee751a nir/linker: add ubo/ssbo to the program resource list
v2: "nir/linker: Use the stageref when adding UBO/SSBO resources"
     squashed on this one (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
a638971929 nir/linker: Fill the uniform's BLOCK_INDEX
Binding comparison is used to determine the block the uniform is part
of. Note that to do the binding comparison we need the information in
UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have
to call gl_nir_link_uniform_blocks() before linking the uniforms.

v2: add missing break (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Andres Gomez
9aadd5d688 nir/compiler: keep same bit size when lowering with flrp
This was probably not caught before because no supported test was
exercising the flrp lowering with other bit size different than 32.

With the arrival of VK_KHR_shader_float_controls we will have some of
those and, unless we keep the bit size, we will end with something
like:

../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed.

Fixes: 158370ed2a ("nir/flrp: Add new lowering pass for flrp instructions")
Fixes: ae02622d8f ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Yevhenii Kolesnikov
8c5692b696 glsl/link_varyings: Fix hash table leak
Hash tables were not destroyed at return.

v2: Use ralloc_context (Eric Anholt)

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-12 11:07:08 +03:00
Iago Toral Quiroga
b0eec9e27d nir: add a new v3d-specific intrinsic for tile buffer color reads
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.

v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Ian Romanick
ef7b4fdf3f nir/algebraic: Recognize open-coded flrp(a, b, a)
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

All Gen6-Gen9 platforms had similar results. (Skylake shown)
total instructions in shared programs: 15041996 -> 15041184 (<.01%)
instructions in affected programs: 71776 -> 70964 (-1.13%)
helped: 312
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3
helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28%
95% mean confidence interval for instructions value: -2.66 -2.55
95% mean confidence interval for instructions %-change: -1.89% -1.61%
Instructions are helped.

total cycles in shared programs: 354303333 -> 354301807 (<.01%)
cycles in affected programs: 433742 -> 432216 (-0.35%)
helped: 206
HURT: 78
helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8
helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82%
HURT stats (abs)   min: 1 max: 220 x̄: 35.95 x̃: 10
HURT stats (rel)   min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56%
95% mean confidence interval for cycles value: -10.68 -0.06
95% mean confidence interval for cycles %-change: -0.99% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
0c2b3a7fc0 nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Convert the pattern directly to flrp.  There were negligible
improvements on Gen4 and Gen5, and Gen11 was actually hurt.  I believe
the problem is this optimization conflicts with the (1-x)*y =>
ffma(-x, y, y) optimization on Gen11.

Skylake
total instructions in shared programs: 15046487 -> 15041996 (-0.03%)
instructions in affected programs: 194681 -> 190190 (-2.31%)
helped: 880
HURT: 20
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17%
95% mean confidence interval for instructions value: -5.25 -4.73
95% mean confidence interval for instructions %-change: -5.11% -4.36%
Instructions are helped.

total cycles in shared programs: 354340839 -> 354303333 (-0.01%)
cycles in affected programs: 1753622 -> 1716116 (-2.14%)
helped: 786
HURT: 182
helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22
helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84%
HURT stats (abs)   min: 1 max: 440 x̄: 37.99 x̃: 9
HURT stats (rel)   min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32%
95% mean confidence interval for cycles value: -45.90 -31.59
95% mean confidence interval for cycles %-change: -3.09% -2.50%
Cycles are helped.

All Gen6-Gen8 platforms had similar results. (Broadwell shown)
total instructions in shared programs: 15055907 -> 15051466 (-0.03%)
instructions in affected programs: 196370 -> 191929 (-2.26%)
helped: 871
HURT: 26
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12%
95% mean confidence interval for instructions value: -5.21 -4.69
95% mean confidence interval for instructions %-change: -4.99% -4.24%
Instructions are helped.

total cycles in shared programs: 387729170 -> 387699745 (<.01%)
cycles in affected programs: 1816409 -> 1786984 (-1.62%)
helped: 788
HURT: 172
helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22
helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76%
HURT stats (abs)   min: 1 max: 404 x̄: 45.59 x̃: 14
HURT stats (rel)   min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43%
95% mean confidence interval for cycles value: -35.69 -25.61
95% mean confidence interval for cycles %-change: -2.88% -2.40%
Cycles are helped.

total fills in shared programs: 34712 -> 34710 (<.01%)
fills in affected programs: 7 -> 5 (-28.57%)
helped: 1
HURT: 0

LOST:   0
GAINED: 2

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
09705747d7 nir/algebraic: Reassociate fadd into fmul in DPH-like pattern
Moving the add to the other end of the sequence allows it to be fused
into an FMA.

Ice Lake
total instructions in shared programs: 17173074 -> 16933147 (-1.40%)
instructions in affected programs: 7938745 -> 7698818 (-3.02%)
helped: 35583
HURT: 90
helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6
helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45%
HURT stats (abs)   min: 1 max: 41 x̄: 2.46 x̃: 1
HURT stats (rel)   min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77%
95% mean confidence interval for instructions value: -6.80 -6.65
95% mean confidence interval for instructions %-change: -5.32% -5.22%
Instructions are helped.

total cycles in shared programs: 360881386 -> 359533568 (-0.37%)
cycles in affected programs: 189489144 -> 188141326 (-0.71%)
helped: 27250
HURT: 6707
helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16
helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35%
HURT stats (abs)   min: 1 max: 3507 x̄: 51.56 x̃: 14
HURT stats (rel)   min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27%
95% mean confidence interval for cycles value: -44.70 -34.68
95% mean confidence interval for cycles %-change: -2.75% -2.65%
Cycles are helped.

total spills in shared programs: 8943 -> 8829 (-1.27%)
spills in affected programs: 625 -> 511 (-18.24%)
helped: 6
HURT: 3

total fills in shared programs: 21815 -> 21719 (-0.44%)
fills in affected programs: 1653 -> 1557 (-5.81%)
helped: 7
HURT: 10

LOST:   11
GAINED: 3

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15271996 -> 15040882 (-1.51%)
instructions in affected programs: 7193699 -> 6962585 (-3.21%)
helped: 33985
HURT: 30
helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6
helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85%
HURT stats (abs)   min: 1 max: 41 x̄: 4.00 x̃: 3
HURT stats (rel)   min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72%
95% mean confidence interval for instructions value: -6.87 -6.72
95% mean confidence interval for instructions %-change: -5.59% -5.48%
Instructions are helped.

total cycles in shared programs: 355520785 -> 354253799 (-0.36%)
cycles in affected programs: 185869148 -> 184602162 (-0.68%)
helped: 25824
HURT: 6287
helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16
helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41%
HURT stats (abs)   min: 1 max: 3327 x̄: 51.76 x̃: 14
HURT stats (rel)   min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28%
95% mean confidence interval for cycles value: -44.70 -34.21
95% mean confidence interval for cycles %-change: -2.87% -2.76%
Cycles are helped.

total spills in shared programs: 8835 -> 8818 (-0.19%)
spills in affected programs: 613 -> 596 (-2.77%)
helped: 5
HURT: 2

total fills in shared programs: 21738 -> 21744 (0.03%)
fills in affected programs: 1348 -> 1354 (0.45%)
helped: 5
HURT: 11

LOST:   0
GAINED: 12

Haswell
total instructions in shared programs: 13447102 -> 13381508 (-0.49%)
instructions in affected programs: 3770735 -> 3705141 (-1.74%)
helped: 11999
HURT: 29
helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3
helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87%
HURT stats (abs)   min: 3 max: 750 x̄: 54.90 x̃: 3
HURT stats (rel)   min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82%
95% mean confidence interval for instructions value: -5.71 -5.19
95% mean confidence interval for instructions %-change: -2.39% -2.30%
Instructions are helped.

total cycles in shared programs: 376342236 -> 375690458 (-0.17%)
cycles in affected programs: 155699021 -> 155047243 (-0.42%)
helped: 8397
HURT: 2876
helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18
helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49%
HURT stats (abs)   min: 1 max: 15414 x̄: 94.15 x̃: 22
HURT stats (rel)   min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41%
95% mean confidence interval for cycles value: -67.64 -48.00
95% mean confidence interval for cycles %-change: -0.99% -0.74%
Cycles are helped.

total spills in shared programs: 23134 -> 23184 (0.22%)
spills in affected programs: 1675 -> 1725 (2.99%)
helped: 13
HURT: 11

total fills in shared programs: 34550 -> 34686 (0.39%)
fills in affected programs: 1421 -> 1557 (9.57%)
helped: 13
HURT: 11

LOST:   0
GAINED: 11

Ivy Bridge
total instructions in shared programs: 12019642 -> 11987285 (-0.27%)
instructions in affected programs: 1532236 -> 1499879 (-2.11%)
helped: 5522
HURT: 110
helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3
helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88%
HURT stats (abs)   min: 1 max: 750 x̄: 18.07 x̃: 3
HURT stats (rel)   min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15%
95% mean confidence interval for instructions value: -6.25 -5.24
95% mean confidence interval for instructions %-change: -2.43% -2.26%
Instructions are helped.

total cycles in shared programs: 180214667 -> 179761900 (-0.25%)
cycles in affected programs: 31448723 -> 30995956 (-1.44%)
helped: 7191
HURT: 2838
helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17
helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40%
HURT stats (abs)   min: 1 max: 15540 x̄: 64.63 x̃: 24
HURT stats (rel)   min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51%
95% mean confidence interval for cycles value: -53.34 -36.95
95% mean confidence interval for cycles %-change: -0.81% -0.53%
Cycles are helped.

total spills in shared programs: 3599 -> 3642 (1.19%)
spills in affected programs: 1180 -> 1223 (3.64%)
helped: 12
HURT: 2

total fills in shared programs: 4031 -> 4162 (3.25%)
fills in affected programs: 876 -> 1007 (14.95%)
helped: 12
HURT: 2

LOST:   6
GAINED: 5

Sandy Bridge
total instructions in shared programs: 10850686 -> 10822890 (-0.26%)
instructions in affected programs: 1247986 -> 1220190 (-2.23%)
helped: 4699
HURT: 102
helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3
helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88%
HURT stats (abs)   min: 1 max: 16 x̄: 4.70 x̃: 3
HURT stats (rel)   min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10%
95% mean confidence interval for instructions value: -6.10 -5.47
95% mean confidence interval for instructions %-change: -2.42% -2.30%
Instructions are helped.

total cycles in shared programs: 154044149 -> 153920095 (-0.08%)
cycles in affected programs: 26037392 -> 25913338 (-0.48%)
helped: 5974
HURT: 2521
helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16
helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84%
HURT stats (abs)   min: 1 max: 862 x̄: 34.73 x̃: 20
HURT stats (rel)   min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85%
95% mean confidence interval for cycles value: -16.31 -12.90
95% mean confidence interval for cycles %-change: -0.56% -0.45%
Cycles are helped.

total spills in shared programs: 2876 -> 2957 (2.82%)
spills in affected programs: 592 -> 673 (13.68%)
helped: 6
HURT: 35

total fills in shared programs: 3157 -> 3134 (-0.73%)
fills in affected programs: 402 -> 379 (-5.72%)
helped: 6
HURT: 0

LOST:   5
GAINED: 11

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
ff9f526de3 nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)
v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

v3: Add a couple more patterns that just move the negation around.

No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

Skylake
total instructions in shared programs: 15279687 -> 15256058 (-0.15%)
instructions in affected programs: 4344440 -> 4320811 (-0.54%)
helped: 23455
HURT: 18
helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 2 x̄: 1.06 x̃: 1
HURT stats (rel)   min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34%
95% mean confidence interval for instructions value: -1.01 -1.00
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 355593755 -> 355339981 (-0.07%)
cycles in affected programs: 162089552 -> 161835778 (-0.16%)
helped: 20467
HURT: 7158
helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6
helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58%
HURT stats (abs)   min: 1 max: 4814 x̄: 47.46 x̃: 11
HURT stats (rel)   min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98%
95% mean confidence interval for cycles value: -10.39 -7.98
95% mean confidence interval for cycles %-change: -0.57% -0.47%
Cycles are helped.

total spills in shared programs: 8843 -> 8835 (-0.09%)
spills in affected programs: 190 -> 182 (-4.21%)
helped: 2
HURT: 0

total fills in shared programs: 21738 -> 21738 (0.00%)
fills in affected programs: 372 -> 372 (0.00%)
helped: 1
HURT: 1

LOST:   12
GAINED: 22

Broadwell
total instructions in shared programs: 15290523 -> 15266818 (-0.16%)
instructions in affected programs: 4314738 -> 4291033 (-0.55%)
helped: 23391
HURT: 11
helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 189 x̄: 18.09 x̃: 1
HURT stats (rel)   min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50%
95% mean confidence interval for instructions value: -1.04 -0.99
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 388911660 -> 388830827 (-0.02%)
cycles in affected programs: 172903324 -> 172822491 (-0.05%)
helped: 15601
HURT: 13269
helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6
helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55%
HURT stats (abs)   min: 1 max: 14904 x̄: 28.21 x̃: 6
HURT stats (rel)   min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60%
95% mean confidence interval for cycles value: -4.20 -1.40
95% mean confidence interval for cycles %-change: -0.17% -0.08%
Cycles are helped.

total spills in shared programs: 23110 -> 23069 (-0.18%)
spills in affected programs: 656 -> 615 (-6.25%)
helped: 3
HURT: 1

total fills in shared programs: 34399 -> 34398 (<.01%)
fills in affected programs: 905 -> 904 (-0.11%)
helped: 3
HURT: 1

LOST:   6
GAINED: 23

Haswell
total instructions in shared programs: 13465303 -> 13441142 (-0.18%)
instructions in affected programs: 3726999 -> 3702838 (-0.65%)
helped: 22139
HURT: 347
helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12%
95% mean confidence interval for instructions value: -1.08 -1.07
95% mean confidence interval for instructions %-change: -0.99% -0.96%
Instructions are helped.

total cycles in shared programs: 376271308 -> 376273090 (<.01%)
cycles in affected programs: 167496811 -> 167498593 (<.01%)
helped: 13206
HURT: 13281
helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8
helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80%
HURT stats (abs)   min: 1 max: 3828 x̄: 35.32 x̃: 8
HURT stats (rel)   min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61%
95% mean confidence interval for cycles value: -1.33 1.47
95% mean confidence interval for cycles %-change: 0.22% 0.36%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 23158 -> 23134 (-0.10%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 34580 -> 34550 (-0.09%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   23
GAINED: 13

Ivy Bridge
total instructions in shared programs: 12034154 -> 12014301 (-0.16%)
instructions in affected programs: 3636209 -> 3616356 (-0.55%)
helped: 18771
HURT: 459
helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11%
95% mean confidence interval for instructions value: -1.04 -1.02
95% mean confidence interval for instructions %-change: -0.86% -0.84%
Instructions are helped.

total cycles in shared programs: 180186960 -> 180175147 (<.01%)
cycles in affected programs: 44652745 -> 44640932 (-0.03%)
helped: 12979
HURT: 11033
helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6
helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74%
HURT stats (abs)   min: 1 max: 4811 x̄: 37.61 x̃: 9
HURT stats (rel)   min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69%
95% mean confidence interval for cycles value: -2.29 1.31
95% mean confidence interval for cycles %-change: 0.11% 0.26%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3623 -> 3599 (-0.66%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 4061 -> 4031 (-0.74%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   17
GAINED: 18

Sandy Bridge
total instructions in shared programs: 10853968 -> 10834932 (-0.18%)
instructions in affected programs: 3769957 -> 3750921 (-0.50%)
helped: 17944
HURT: 204
helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60%
HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel)   min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93%
95% mean confidence interval for instructions value: -1.05 -1.04
95% mean confidence interval for instructions %-change: -0.81% -0.78%
Instructions are helped.

total cycles in shared programs: 153894864 -> 153885988 (<.01%)
cycles in affected programs: 50643925 -> 50635049 (-0.02%)
helped: 9361
HURT: 10534
helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4
helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22%
HURT stats (abs)   min: 1 max: 1371 x̄: 16.42 x̃: 5
HURT stats (rel)   min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27%
95% mean confidence interval for cycles value: -1.27 0.38
95% mean confidence interval for cycles %-change: -0.03% 0.04%
Inconclusive result (value mean confidence interval includes 0).

LOST:   6
GAINED: 24

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
1259f6d802 nir: intel/vec4: Add flag to disable some algebraic optimizations
A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.

I'm not particularly enamored with the name of this flag.  However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it.  Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.

The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Neil Roberts
eae06b34ea glsl/builtin types: Set the precision on the depth range params
The members of gl_DepthRangeParameters are declared to be highp in
GLSL ES specs.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Neil Roberts
74d71dac20 glsl: Add a constructor for glsl_struct_field to specify the precision
Adds a third constructor to glsl_struct_field which has an extra
parameter to specify the precision.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Neil Roberts
014be60398 glsl: Add a macro for the default values for glsl_struct_field
There are two constructors for glsl_struct_field with different
parameters. Instead of repeating them for both constructors, this
patch adds a convenience macro. This will make it easier to add a
third constructor in a later patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Neil Roberts
ca6ee488e9 glsl/builtin_variables: Add a precision to the builtins
All of the builtin variables mentioned in the GLSL ES spec and the
extensions include a precision declaration which is different
depending on what the variable is used for. This patch makes it set
the corresponding precision when creating the variable. This will make
a difference once we start using the precision information for
optimisation. Previously all of the builtin variables ended up with a
precision of NONE.

v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed
    gl_MaxViewPorts to always be highp. (Eric Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Connor Abbott
133273aa22 nir/lower_io: Don't use variable to get deref mode
Drivers only use lower_io for modes where pointers don't have a
meaningful value, and dereferences can always be traced back to a
variable. But there can be other modes, like global mode with
VK_EXT_buffer_device_address, where pointers cannot be traced back to a
variable, and lower_io would segfault on loads/stores of these since
nir_deref_instr_get_variable() would return NULL.

Just use the mode on the deref itself to filter out these modes before
we try to get the variable.

Fixes: 118a66df99 ("radv: Use NIR barycentric coordinates")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 12:31:41 +02:00
Jason Ekstrand
7e0fcea727 nir/loop_analyze: Pass nir_const_values directly to helpers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
ff972c7a3a nir/loop_analyze: Properly handle swizzles in loop conditions
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles.  Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition.  The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4388 -> 4364 (-0.55%)
    loops in affected programs: 29 -> 5 (-82.76%)
    helped: 29
    HURT: 5

Shader-db results on Haswell:

    total loops in shared programs: 4370 -> 4373 (0.07%)
    loops in affected programs: 2 -> 5 (150.00%)
    helped: 2
    HURT: 5

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
0333649e63 nir/loop_analyze: Refactor detection of limit vars
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure.  This makes their callers
significantly simpler.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
8f7405ed9d nir: Add some helpers for chasing SSA values properly
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code.  In all these cases, it can be very tricky to do properly because
of swizzles.  This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
9a3cb6f5fe nir/loop_analyze: Bail if we encounter swizzles
None of the current code knows what to do with swizzles.  Take the safe
option for now and bail if we see one.  This does have a small shader-db
impact but it is at least safe.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4364 -> 4388 (0.55%)
    loops in affected programs: 5 -> 29 (480.00%)
    helped: 5
    HURT: 29

Shader-db results on Haswell:

    total loops in shared programs: 4373 -> 4370 (-0.07%)
    loops in affected programs: 5 -> 2 (-60.00%)
    helped: 5
    HURT: 2

Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
6455fa9710 nir/loop_analyze: Use new eval_const_* helpers in test_iterations
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
268ad47c11 nir/loop_analyze: Handle bit sizes correctly in calculate_iterations
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means.  Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way.  We also use the new
constant constructors to build the correct size constants.

Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
9f7ffe41dd nir/loop_analyze: Fix phi-of-identical-alu detection
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions.  Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true.  If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.

Fixes: 9e6b39e1d5 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00