Commit graph

2290 commits

Author SHA1 Message Date
Ian Romanick
55c1ac4b75 nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick
9aaaac6080 nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick
37ee462e03 nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%

total cycles in shared programs: 372286583 -> 372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%

No changes on any other Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Alejandro Piñeiro
b2a212ac2e nir/xfb: handle arrays and AoA of basic types
On OpenGL, a array of a simple type adds just one varying. So
gl_transform_feedback_varying_info struct defined at mtypes.h includes
the parameters Type (base_type) and Size (number of elements).

This commit checks this when the recursive add_var_xfb_outputs call
handles arrays, to ensure that just one is addded.

We also need to take into account AoA here

v2: use glsl_type_is_leaf from nir_types (Timothy Arceri)

v3: simplified aoa check, without the need ot using glsl_type_is_leaf,
    using glsl_types_is_struct (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
8d693746e9 nir/xfb: sort varyings too
Right now we are only re-sorting outputs. But it is better to sort too
varyings, as linker expect them to be sorted out (as it was done on
GLSL). For varyings, and to make easier to compute buffer_index, we
sort also by buffer. We could do the same for outputs, but we lack a
reason for that, so we left it as it is (just offset).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
cf0b2ad486 nir/xfb: adding varyings on nir_xfb_info and gather_info
In order to be used for OpenGL (right now for ARB_gl_spirv).

This commit adds two new structures:

  * nir_xfb_varying_info: that identifies each individual varying. For
    each one, we need to know the type, buffer and xfb_offset

  * nir_xfb_buffer_info: as now for each buffer, in addition to the
    stride, we need to know how many varyings are assigned to it.

For this patch, the only case where num_outputs != num_varyings is
with the case of doubles, that for dvec3/4 could require more than one
output. There are more cases though (like aoa), that will be handled
on following patches.

v2: updated after new nir general XFB support introduced for "anv: Add
    support for VK_EXT_transform_feedback"

v3: compute num_varyings beforehand for allocating, instead of relying
    on num_outputs as approximate value (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
b62a8149ab nir/xfb: add component_offset at nir_xfb_info
Where component_offset here is the offset when accessing components of
a packed variable. Or in other words, location_frac on
nir.h. Different places of mesa use different names for it.

Technically nir_xfb_info consumer can get the same from the
component_mask, it seems somewhat forced to make it to compute it,
instead of providing it.

v2: rename local location_frac for comp_offset, more similar to the
intended use (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Jason Ekstrand
1664de5924 nir/builder: Add a build_deref_array_imm helper
Unlike most of the cases in which we do this by hand, the new helper
properly handles non-32-bit pointers.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 21:20:30 +00:00
Jason Ekstrand
fcf2a0122e nir/builder: Cast array indices in build_deref_follower
There's no guarantee when build_deref_follower is called that the two
derefs have the same bit size destination.  Insert a cast on the array
index in case we have differing bit sizes.  While we're here, insert
some asserts in build_deref_array and build_deref_ptr_as_array.  The
validator will catch violations here but they're easier to debug if we
catch them while building.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 21:20:30 +00:00
Jason Ekstrand
cd4c1458ba nir/builder: Emit better code for iadd/imul_imm
Because we already know the immediate right-hand parameter, we can
potentially save the optimizer a bit of work.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-07 21:20:30 +00:00
Tapani Pälli
8b010f3557 nir: free dead_ctx in case of no progress
Fixes a leak:

  ==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26
  ==7576==    at 0x4C2EE3B: malloc (vg_replace_malloc.c:309)
  ==7576==    by 0x53EF0E4: ralloc_size (ralloc.c:119)
  ==7576==    by 0x53EF0C2: ralloc_context (ralloc.c:113)
  ==7576==    by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176)
  ==7576==    by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-07 07:40:19 +02:00
Jason Ekstrand
e02959f442 nir/lower_doubles: Inline functions directly in lower_doubles
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves.  This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
f25ca337b4 nir/deref: Expose nir_opt_deref_impl
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
de8d80f9cc nir/inline_functions: Break inlining into a builder helper
This pulls the guts of function inlining into a builder helper so that
it can be used elsewhere.  The rest of the infrastructure is still
needed for most inlining cases to ensure that everything gets inlined
and only ever once.  However, there are use-cases where you just want to
inline one little thing.  This new helper also has a neat trick where it
can seamlessly inline a function from one nir_shader into another.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
9314084237 nir: Teach loop unrolling about 64-bit instruction lowering
The lowering we do for 64-bit instructions can cause a single NIR ALU
instruction to blow up into hundreds or thousands of instructions
potentially with control flow.  If loop unrolling isn't aware of this,
it can unroll a loop 20 times which contains a nir_op_fsqrt which we
then lower to a full software implementation based on integer math.
Those 20 invocations suddenly get a lot more expensive than NIR loop
unrolling currently expects.  By giving it an approximate estimate
function, we can prevent loop unrolling from going to town when it
shouldn't.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
ebb3695376 nir: Expose double and int64 op_to_options_mask helpers
We already have one internally for int64 but we don't have a similar one
for doubles so we'll have to make one.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Iago Toral Quiroga
ca2b5e9069 compiler/nir: add an is_conversion field to nir_op_info
This is set to True only for numeric conversion opcodes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Timothy Arceri
54522d0506 nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc()
Replace done using:
find ./src -type f -exec sed -i -- \
's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \;

Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 13:10:02 +11:00
Karol Herbst
272e927d0e nir/spirv: initial handling of OpenCL.std extension opcodes
Not complete, mostly just adding things as I encounter them in CTS. But
not getting far enough yet to hit most of the OpenCL.std instructions.

Anyway, this is better than nothing and covers the most common builtins.

v2: add hadd proof from Jason
    move some of the lowering into opt_algebraic and create new nir opcodes
    simplify nextafter lowering
    fix normalize lowering for inf
    rework upsample to use nir_pack_bits
    add missing files to build systems
v3: split lines of iadd/sub_sat expressions

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Karol Herbst
d0b47ec4df nir/vtn: add support for SpvBuiltInGlobalLinearId
v2: use formula with fewer operations

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-05 22:28:29 +01:00
Karol Herbst
f48c672965 nir: add support for address bit sized system values
v2: add assert in else clause
    make local group intrinsics 32 bit wide
v3: always use 32 bit constant for local_size
v4: add comment by Jason

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Karol Herbst
5d48359a2c nir: replace magic numbers with M_PI
we define it inside 'include/c99_math.h' so it is safe to use.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-05 22:28:29 +01:00
Timur Kristóf
6684e039eb nir: Add multiplier argument to nir_lower_uniforms_to_ubo.
Note that locations can be set in different units, and the multiplier
argument caters to supporting these different units. For example,
st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4,
while tgsi_to_nir uses bytes, so the multiplier should be 16.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Timur Kristóf
909d1f50f3 nir: Move nir_lower_uniforms_to_ubo to compiler/nir.
The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Eric Anholt
2780a99ff8 v3d: Move the stores for fixed function VS output reads into NIR.
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them.  Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.

total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)

Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
2019-03-05 10:59:40 -08:00
Eric Anholt
a4f612b4cf nir: Improve printing of load_input/store_output variable names.
We were printing only when the channel was exactly the start channel, so
scalarized loads/stores would be missing the name on the rest.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-05 10:59:40 -08:00
Jason Ekstrand
61e009d2c4 spirv: Use the same types for resource indices as pointers
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver.  This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand
5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Sagar Ghuge
47ec9bdc60 nir/algebraic: Optimize low 32 bit extraction
Optimize a situation where we only need lower 32 bits from 64 bit
result.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Sagar Ghuge
e551040c60 nir/glsl: Add another way of doing lower_imul64 for gen8+
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Jordan Justen
31b35916dd
nir: Add int64/doubles options into nir_shader_compiler_options
This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:41 -08:00
Ian Romanick
bae0c36751 nir/algebraic: Optimize away an fsat of a b2f
The b2f can only produce 0.0 or 1.0, so the fsat does nothing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-02 13:58:56 -08:00
Ian Romanick
ecc9ffa778 nir/algebraic: Replace a-fract(a) with floor(a)
I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.

In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?

As far as I can tell, it's just because our scheduler is terrible.  In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).

Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader.  Once that happens, everything
is different.

I looked at one vertex shader that was hurt (from Goat Simulator).  In
that case, both the floor and fract are used.  The optimization
eliminates the add, and it should allow better scheduling.  In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.

In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions.  It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.

total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.

Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-03-01 12:43:25 -08:00
Ian Romanick
d40640efe8 nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))
I have not investigated the result of doing this during code
generation.  That should be possible, but it would be a bit more
effort.

All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs: 370961508 -> 370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8206587 -> 8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2

total cycles in shared programs: 187657422 -> 187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071804 -> 15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2

total cycles in shared programs: 369914348 -> 369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs)   min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel)   min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
eae19f5f19 nir/algebraic: Replace i2b used by bcsel or if-statement with comparison
All of the helped shaders are in Deus Ex.  I looked at a couple shaders,
and they have a pattern like:

    vec1 32 ssa_373 = i2b32 ssa_345.w
    vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0
    ...
    vec1 32 ssa_377 = ine ssa_345.w, ssa_0
    if ssa_377 {
        ...
        vec1 32 ssa_416 = i2b32 ssa_385.w
        vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374
        ...
    }

The massive help occurs because the i2b32 is removed, then other passes
determine that ssa_374 must be ssa_20 inside the if-statement allowing
the first bcsel to also be deleted.

v2: Rebase on 1-bit Boolean changes.

v3: Fix i2b32 vs ine problem in if-statement replacement.  Noticed by
Bas.

Skylake
total instructions in shared programs: 15241394 -> 15186287 (-0.36%)
instructions in affected programs: 890583 -> 835476 (-6.19%)
helped: 355
HURT: 0
helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149
helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59%
95% mean confidence interval for instructions value: -165.07 -145.39
95% mean confidence interval for instructions %-change: -6.42% -5.77%
Instructions are helped.

total cycles in shared programs: 373846583 -> 371023357 (-0.76%)
cycles in affected programs: 118972102 -> 116148876 (-2.37%)
helped: 343
HURT: 14
helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089
helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77%
HURT stats (abs)   min: 120 max: 4126 x̄: 2482.79 x̃: 3019
HURT stats (rel)   min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11%
95% mean confidence interval for cycles value: -8723.28 -7093.12
95% mean confidence interval for cycles %-change: -2.57% -2.02%
Cycles are helped.

total spills in shared programs: 32401 -> 23465 (-27.58%)
spills in affected programs: 24457 -> 15521 (-36.54%)
helped: 343
HURT: 0

total fills in shared programs: 37866 -> 31765 (-16.11%)
fills in affected programs: 18889 -> 12788 (-32.30%)
helped: 343
HURT: 0

Broadwell and Haswell had similar results. (Haswell shown)
Haswell
total instructions in shared programs: 13764783 -> 13750679 (-0.10%)
instructions in affected programs: 1176256 -> 1162152 (-1.20%)
helped: 334
HURT: 21
helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47
helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37%
HURT stats (abs)   min: 1 max: 61 x̄: 5.76 x̃: 1
HURT stats (rel)   min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03%
95% mean confidence interval for instructions value: -43.99 -35.47
95% mean confidence interval for instructions %-change: -1.35% -1.08%
Instructions are helped.

total cycles in shared programs: 386511910 -> 385402528 (-0.29%)
cycles in affected programs: 143831110 -> 142721728 (-0.77%)
helped: 327
HURT: 39
helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570
helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96%
HURT stats (abs)   min: 16 max: 4881 x̄: 1065.95 x̃: 997
HURT stats (rel)   min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24%
95% mean confidence interval for cycles value: -3375.59 -2686.60
95% mean confidence interval for cycles %-change: -0.92% -0.64%
Cycles are helped.

total spills in shared programs: 100480 -> 97846 (-2.62%)
spills in affected programs: 84702 -> 82068 (-3.11%)
helped: 316
HURT: 21

total fills in shared programs: 96877 -> 94369 (-2.59%)
fills in affected programs: 69167 -> 66659 (-3.63%)
helped: 316
HURT: 9

No changes on Ivy Bridge or earlier platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Caio Marcelo de Oliveira Filho
1458aa1f78 nir/copy_prop_vars: handle indirect vector elements
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load.  Such SSA
values don't help loading of the indirect unless we emit an if-ladder.

Copy_derefs are supported for indirects.

Also enable two tests that now pass.

v2: Remove unnecessary temporaries.  Be clearer when identifying the
    case where copy_entry doesn't help when we are dealing with an
    indirect array_deref (of a vector).  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho
6c0de78cc2 nir/copy_prop_vars: prefer using entries from equal derefs
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.

This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho
61965afd00 nir/copy_prop_vars: add tests for indirect array deref
Both on an actual array and on a vector, and an extra test on a vector
mixing direct and indirect access.  The vector tests are disabled and
will be enabled by a later commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho
96c32d7776 nir/copy_prop_vars: handle load/store of vector elements
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.

Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w.  Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.

The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.

It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.

Tests related to these cases are now enabled.

v2: Instead of asserting on invalid indices, "load" an undef and
    remove the store.  (Jason)

v3: Merge code path for the cases of is_array_deref_of_vector into the
    regular code path.  Add a base_index parameter to
    value_set_from_value.  (code changes by Jason)

v4: Removed the get_entry_for_deref helper, now being used only once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho
33dafdc024 nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS
Also replace uses of 0xf with the appropriate full mask created from
the number of components.

Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho
e84c841fb0 nir/copy_prop_vars: rename/refactor store_to_entry helper
The name reflected this function role back when the pass also did dead
write elimination.  So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Gert Wollny
b7201a468d nir: Add posibility to not lower to source mod 'abs' for ops with three sources
This is useful for r600 since there the abs source modifier is not supported
for ops with three sources

v2: Use correct logic to enable lowering to abs source mod (Eric Anhold)

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-27 11:04:06 +00:00
Kasireddy, Vivek
78fb3fd17e nir/lower_tex: Add support for XYUV lowering
The memory layout associated with this format would be:
Byte:      0 1 2 3
Component: V U Y X

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:51 +00:00
Tapani Pälli
22267feff1 nir: initialize value in copy_prop_vars_block
Fixes following valgrind warning:

   ==27561== Conditional jump or move depends on uninitialised value(s)
   ==27561==    at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78)
   ==27561==    by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797)

Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-26 08:56:25 +02:00
Eric Anholt
7c1bf075f3 nir: Just return when asked to rewrite uses of an SSA def to itself.
The nir_builder swizzling improvement to not emit extra MOVs resulted in
nir_lower_tex() trying to rewrite an SSA def to itself, triggering the
assert on all texturing in v3d.  There's no work to be done in this case,
so just stop asserting.

Fixes: 743700be1f ("nir/builder: Don't emit no-op swizzles")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 21:25:24 +00:00
Daniel Schürmann
0bd45f96b9 nir: Use SM5 properties to optimize shift(a@32, iand(31, b))
This is a common pattern from HLSL->SPIRV translation
and supported in HW by all current NIR backends.

vkpipeline-db results anv (SKL):

    total instructions in shared programs: 6403130 -> 6402380 (-0.01%)
    instructions in affected programs: 204084 -> 203334 (-0.37%)
    helped: 208
    HURT: 0

    total cycles in shared programs: 1915629582 -> 1918198408 (0.13%)
    cycles in affected programs: 1158892682 -> 1161461508 (0.22%)
    helped: 107
    HURT: 86

shader-db results on i965 (KBL):

    total instructions in shared programs: 15284592 -> 15284568 (<.01%)
    instructions in affected programs: 81683 -> 81659 (-0.03%)
    helped: 24
    HURT: 0

    total cycles in shared programs: 375013622 -> 375013932 (<.01%)
    cycles in affected programs: 40169618 -> 40169928 (<.01%)
    helped: 13
    HURT: 9

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:44 -06:00
Daniel Schürmann
0525bdc225 nir: Define shifts according to SM5 specification.
SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:43 -06:00
Jason Ekstrand
743700be1f nir/builder: Don't emit no-op swizzles
The nir_swizzle helper is used some on it's own but it's also called by
nir_channel and nir_channels which are used everywhere.  It's pretty
quick to check while we're walking the swizzle anyway whether or not
it's an identity swizzle.  If it is, we now don't bother emitting the
instruction.  Sure, copy-prop will clean it up for us but there's no
sense making more work for the optimizer than we have to.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-24 20:01:27 -06:00
Jason Ekstrand
724371c6b9 nir/split_vars: Don't compact vectors unnecessarily
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-24 20:01:18 -06:00
Caio Marcelo de Oliveira Filho
4c160b6bd8 nir: fix MSVC build
Zero initialize struct with {0} instead of {}.
2019-02-22 22:38:05 -08:00