Commit graph

3453 commits

Author SHA1 Message Date
Jason Ekstrand
35b8f6f40b nir: Add a new pass to lower array dereferences on vectors
This pass was originally written for lowering TCS output reads and
writes but it is also applicable just about anything including UBOs,
SSBOs, and shared variables.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 23:10:27 -05:00
Jason Ekstrand
fe9a6c0f14 nir/builder: Add a vector extract helper
This one's a tiny bit better than what we had in spirv_to_nir because it
emits a binary tree rather than a linear walk.  It also doesn't leave
around unneeded bcsel instructions for a constant index and returns an
undef for constant OOB access.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 23:10:26 -05:00
Alejandro Piñeiro
34b3b92bbe nir/xfb: move varyings info out of nir_xfb_info
When varyings was added we moved to use to dynamycally allocated
pointers, instead of allocating just one block for everything. That
breaks some assumptions of some vulkan drivers (like anv), that make
serialization and copying easier. And at the same time, varyings are
not needed for vulkan.

So this commit moves them out. Although it seems a little an overkill,
fixing the anv side would require a similar, or more, changes, so in
the end it is about to decide where do we want to put our effort.

v2: (from Jason review)
  * Don't use a temp variable on the _create methods, just return
    result of rzalloc_size
  * Wrap some lines too long.

Fixes: cf0b2ad486 ("nir/xfb: adding varyings on nir_xfb_info and gather_info")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-15 11:59:32 +01:00
Jason Ekstrand
810dde2a6b glsl/nir: Add a pass to lower UBO and SSBO access
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
77e5ec394e glsl/nir: Handle unlowered SSBO atomic and array_length intrinsics
We didn't have any of these before because all NIR consumers always
called lower_ubo_references.  Soon, we want to pass the derefs straight
through to NIR so we need to handle these intrinsics directly.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
76ba225184 glsl/nir: Set explicit types on UBO/SSBO variables
We want to be able to use variables and derefs for UBO/SSBO access in
NIR.  In order to do this, the rest of NIR needs to know the type layout
information.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
8f3ab8aa78 glsl: Don't lower vector derefs for SSBOs, UBOs, and shared
All of these are backed by some sort of memory so if you have multiple
threads writing to different components of the same vector at the same
time, the load-vec-store pattern that GLSL IR emits won't work.  This
shouldn't affect any drivers today as they all call GLSL IR lowering
which lowers access to these variables to index+offset intrinsics before
we get to this point.  However, NIR will start handling the derefs
itself and won't want the lowering.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
3c11fc7654 nir/lower_io: Add a new buffer_array_length intrinsic and lowering
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
c8d42c8cf6 nir: Rename nir_address_format_vk_index_offset to not be vk
It's just a 32-bit index and offset.  We're going to want to use it in
GL as well so stop talking about Vulkan.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
60af3a93e9 nir/deref: Consider COHERENT decorated var derefs as aliasing
If we get to two deref_var paths with different variables, we usually
know they don't alias.  However, if both of the paths are marked
coherent, we don't have to worry about it.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
8b073832ff compiler/types: Add helpers to get explicit types for standard layouts
We also need to modify the current size/align helpers to not blow up
when they encounter an explicitly laid out type.  Previously we
considered using the size/align helpers mutually exclusive with standard
layouts but now we just assert that they match.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
5b2b144566 compiler/types: Add a C wrapper to get full struct field data
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
ef4ca44780 compiler/types: Add a new is_interface C wrapper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
b315f6f82b nir/validate: Allow 32-bit boolean load/store intrinsics
With UBOs and SSBOs we have boolean types but they're actually 32-bit
values.  Make the validator a little less strict so that we can do a
32-bit load/store on boolean types.  We're about to add a lowering pass
called gl_nir_lower_buffers which will lower boolean load/store
operations to 32-bit and insert i2b and b2i instructions to convert
to/from 1-bit booleans.  We want that to be legal.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
5d26f2d3d5 nir/validate: Only require bare types to match for copy_deref
If we want to be able to use copy_deref instructions on explicitly laid
out types, we have to be a little more flexible about what types we
allow.  Instead, of requiring the types to exactly match, only require
the bare types to match.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-15 01:02:19 +00:00
Jason Ekstrand
2b76de9b5d nir/algebraic: Add a couple optimizations for iabs and ishr
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15225213 -> 15222365 (-0.02%)
    instructions in affected programs: 43524 -> 40676 (-6.54%)
    helped: 203
    HURT: 0

Lots of shaders in Shadow Warrior had this pattern along with Deus Ex,
Civ, Shadow of Mordor, and several others.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-15 01:02:19 +00:00
Eduardo Lima Mitev
6ff50a488a nir: Add ir3-specific version of most SSBO intrinsics
These are ir3 specific versions of SSBO intrinsics that add an
extra source to hold the element offset (dword), which is what the
backend instructions need.

The original byte-offset source provided by NIR is not replaced
because on a4xx and a5xx the backend still needs it.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Caio Marcelo de Oliveira Filho
822a8865e4 nir: Add a pass to combine store_derefs to same vector
v2: (all from Jason)
    Reuse existing function for the end of the block combinations.
    Check the SSA values are coming from the right place in tests.
    Document the case when the store to array_deref is reused.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Jason Ekstrand
bd17bdc56b glsl/lower_vector_derefs: Don't use a temporary for TCS outputs
Tessellation control shader outputs act as if they have memory backing
them and you can have multiple writes to different components of the
same vector in-flight at the same time.  When this happens, the load vec
store pattern that gets used by ir_triop_vector_insert doesn't yield the
correct results.  Instead, just emit a sequence of conditional
assignments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2019-03-13 02:10:31 +00:00
Jason Ekstrand
20c4578c55 glsl/list: Add a list variant of insert_after
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-13 02:10:31 +00:00
Jason Ekstrand
83fdefc062 nir/loop_unroll: Fix out-of-bounds access handling
The previous code was completely broken when it came to constructing the
undef values.  I'm not sure how it ever worked.  For the case of a copy
that reads an undefined value, we can just delete the copy because the
destination is a valid undefined value.  This saves us the effort of
trying to construct a value for an arbitrary copy_deref intrinsic.

Fixes: e8a8937a04 "nir: add partial loop unrolling support"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-12 21:06:39 -05:00
Jason Ekstrand
5ef2b8f1f2 nir: Add a pass for lowering IO back to vector when possible
This pass tries to turn scalar and array-of-scalar IO variables into
vector IO variables whenever possible.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
2019-03-12 15:34:06 +00:00
Connor Abbott
5b2ec9c81e nir: Add a stripping pass for improved cacheability
Oftentimes various nir shaders after lowering will be the same, or
almost the same. For example, this can happen when the same shader is
linked with different shaders to form different pipelines and
cross-stage optimizations don't kick in to change it. We want to avoid
running the backend twice on these shaders. We were already doing this
with radeonsi, but we were storing a few extra pieces of information
that made this much less effective compared to TGSI. The worse offender
by far was the program name, which caused most of the cache misses. This
pass strips out these pieces of information, controlled by the NIR_STRIP
debug env variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-12 10:49:48 +01:00
Brian Paul
02c2863df5 nir: silence a couple new compiler warnings
[33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'.
../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’:
../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
    if (*ind == NULL || *ind && (*ind)->type != basic_induction ||
                             ^
[85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'.
../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’:
../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable]
    nir_cf_node *unroll_loc =
                 ^
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-12 14:34:51 +11:00
Timothy Arceri
3235a942c1 nir: find induction/limit vars in iand instructions
This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

On RADV this unrolls a bunch of loops in F1-2017 shaders.

Totals from affected shaders:
SGPRS: 4112 -> 4136 (0.58 %)
VGPRS: 4132 -> 4052 (-1.94 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 515444 -> 587720 (14.02 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 194 -> 196 (1.03 %)
Wait states: 0 -> 0 (0.00 %)

It also unrolls a couple of loops in shader-db on radeonsi.

Totals from affected shaders:
SGPRS: 128 -> 128 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 6880 -> 9504 (38.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 16 -> 16 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
67c3478482 nir: pass nir_op to calculate_iterations()
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
11e8f8a166 nir: add get_induction_and_limit_vars() helper to loop analysis
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
f219f6114d nir: add helper to return inversion op of a comparison
This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

So in order to find the trip count we need to find the inverse of
ilt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
090feaacdc nir: simplify the loop analysis trip count code a little
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.

The new helper will also be used in a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
7571de8eaa nir: unroll some loops with a variable limit
For some loops can have a single terminator but the exact trip
count is still unknown. For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

Shader-db results radeonsi (all affected are from Tropico 5):

Totals from affected shaders:
SGPRS: 144 -> 152 (5.56 %)
VGPRS: 124 -> 108 (-12.90 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5180 -> 6640 (28.19 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 17 -> 21 (23.53 %)
Wait states: 0 -> 0 (0.00 %)

Shader-db results i965 (SKL):

total loops in shared programs: 3808 -> 3802 (-0.16%)
loops in affected programs: 6 -> 0
helped: 6
HURT: 0

vkpipeline-db results RADV (Unrolls some Skyrim VR shaders):

Totals from affected shaders:
SGPRS: 304 -> 304 (0.00 %)
VGPRS: 296 -> 292 (-1.35 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15756 -> 25884 (64.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 29 -> 29 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: fix bug where last iteration would get optimised away by
    mistake.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
68ce0ec222 nir: calculate trip count for more loops
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).

For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
e8a8937a04 nir: add partial loop unrolling support
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.

The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.

We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.

A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:

GPU duration: 350 -> 325 microseconds

shader-db results radeonsi VEGA (NIR backend):

SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)

shader-db results i965 SKL:

total instructions in shared programs: 13098265 -> 13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21

total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9

vkpipeline-db results VEGA:

Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
fba5d275db nir: add new partially_unrolled bool to nir_loop
In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri
03a452b7d0 nir: add guess trip count support to loop analysis
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.

v2: check if the induction var is used to index more than a single
    array and if so get the size of the smallest array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Xavier Bouchoux
c5236fc6e2 nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'
'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.

Previously, shaders failed with:

SPIR-V parsing FAILED:
    In file ../src/compiler/spirv/spirv_to_nir.c:1412
    !is_shadow
    632 bytes into the SPIR-V binary

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-11 23:28:39 +01:00
Connor Abbott
d086d16b81 nir/serialize: Prevent writing uninitialized state_slot data
The nir_state_slot struct had some padding that was never initialized.
Serializing the individual parts of the struct is more robust and avoids
the overhead of zeroing it at creation, so just do that.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-11 15:17:41 +01:00
Kenneth Graunke
da51e3f1b0 Revert MR 369 (Fix extract_i8 and extract_u8 for 64-bit integers)
This broke piles of image load store tests (179 failures on CI,
mesa_master build #15546, previous build right before this landed
was green).  I'd rather not leave the tree on fire over the weekend,
so let's revert for now, and we can figure out what happened next week.
2019-03-09 01:42:16 -08:00
Ian Romanick
18e4bf65de nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick
55c1ac4b75 nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick
9aaaac6080 nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick
37ee462e03 nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%

total cycles in shared programs: 372286583 -> 372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%

No changes on any other Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Alejandro Piñeiro
686b7b1d48 nir/linker: fix ARRAY_SIZE query with xfb varyings
For a non-array varying, it is expecting ARRAY_SIZE as 1, instead of 0.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Antia Puentes
de31fb2f4f nir/linker: Fix TRANSFORM_FEEDBACK_BUFFER_INDEX
From the ARB_enhanced_layouts specification:

  "For the property TRANSFORM_FEEDBACK_BUFFER_INDEX, a single integer
   identifying the index of the active transform feedback buffer
   associated with an active variable is written to <params>.  For
   variables corresponding to the special names "gl_NextBuffer",
   "gl_SkipComponents1", "gl_SkipComponents2", "gl_SkipComponents3",
   and "gl_SkipComponents4", -1 is written to <params>."

We were storing the xfb_buffer value, instead of the value
corresponding to GL_TRANSFORM_FEEDBACK_BUFFER_INDEX.

Note that the implementation assumes that varyings would be sorted by
offset and buffer.

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
7c0f411c27 nir/linker: use nir_gather_xfb_info
Instead of a custom ARB_gl_spirv xfb gather info pass.

In fact, this is not only about reusing code, but the current custom
code was not handling properly how many varyings are enumerated from
some complex types. So this change is also about fixing some corner
cases.

v2: Use util_bitcount, simplify current stage check (Kenneth)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
b2a212ac2e nir/xfb: handle arrays and AoA of basic types
On OpenGL, a array of a simple type adds just one varying. So
gl_transform_feedback_varying_info struct defined at mtypes.h includes
the parameters Type (base_type) and Size (number of elements).

This commit checks this when the recursive add_var_xfb_outputs call
handles arrays, to ensure that just one is addded.

We also need to take into account AoA here

v2: use glsl_type_is_leaf from nir_types (Timothy Arceri)

v3: simplified aoa check, without the need ot using glsl_type_is_leaf,
    using glsl_types_is_struct (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
2b65fecd85 nir_types: add glsl_type_is_struct helper
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
8d693746e9 nir/xfb: sort varyings too
Right now we are only re-sorting outputs. But it is better to sort too
varyings, as linker expect them to be sorted out (as it was done on
GLSL). For varyings, and to make easier to compute buffer_index, we
sort also by buffer. We could do the same for outputs, but we lack a
reason for that, so we left it as it is (just offset).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
cf0b2ad486 nir/xfb: adding varyings on nir_xfb_info and gather_info
In order to be used for OpenGL (right now for ARB_gl_spirv).

This commit adds two new structures:

  * nir_xfb_varying_info: that identifies each individual varying. For
    each one, we need to know the type, buffer and xfb_offset

  * nir_xfb_buffer_info: as now for each buffer, in addition to the
    stride, we need to know how many varyings are assigned to it.

For this patch, the only case where num_outputs != num_varyings is
with the case of doubles, that for dvec3/4 could require more than one
output. There are more cases though (like aoa), that will be handled
on following patches.

v2: updated after new nir general XFB support introduced for "anv: Add
    support for VK_EXT_transform_feedback"

v3: compute num_varyings beforehand for allocating, instead of relying
    on num_outputs as approximate value (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
9f68b9ac71 nir_types: add glsl_varying_count helper
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00
Alejandro Piñeiro
b62a8149ab nir/xfb: add component_offset at nir_xfb_info
Where component_offset here is the offset when accessing components of
a packed variable. Or in other words, location_frac on
nir.h. Different places of mesa use different names for it.

Technically nir_xfb_info consumer can get the same from the
component_mask, it seems somewhat forced to make it to compute it,
instead of providing it.

v2: rename local location_frac for comp_offset, more similar to the
intended use (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-08 15:00:50 +01:00