Commit graph

3545 commits

Author SHA1 Message Date
Timothy Arceri
035759b61b nir/i965/freedreno/vc4: add a bindless bool to type size functions
This required to calculate sizes correctly when we have bindless
samplers/images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst
3b2a9ffd60 nir: move brw_nir_rewrite_image_intrinsic into common code
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Timothy Arceri
9e3740c47f nir: initialise some variables in opt_if_loop_last_continue()
Fixes a couple of Coverity warnings CID 1444626.

Fixes: e30804c602 ("nir/radv: remove restrictions on opt_if_loop_last_continue()")

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-04-11 20:38:03 +10:00
Juan A. Suarez Romero
83f1b0e95b nir/xfb: do not use bare interface type
In commit 3b3653c4cf we decided not to use bare types; hence do not use
bare type when comparing with interface type to find out if the xfb
variable is an array block.

This fixes dEQP-VK.transform_feedback.* tests.

Fixes: 3b3653c4cf ("nir/spirv: don't use bare types, remove assert in
                     split vars for testing")
CC: Dave Airlie <airlied@redhat.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-11 11:52:45 +02:00
Marek Olšák
53f715fafb Revert "glsl: fix shader_storage_blocks_write_access for SSBO block arrays"
This reverts commit b7ca074cc0.

It broke a lot of tests.
2019-04-10 10:48:56 -04:00
Karol Herbst
0c4706563a glsl/standalone: add GLES3.1 and GLES3.2 compatibility
also set some constants for SSBOs.

With that it can compile the shader from:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-10 16:16:36 +02:00
Bas Nieuwenhuizen
282bacab4a nir: Add access qualifiers on load_ubo intrinsic.
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.

Fixes: 8ed583fe52 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f2 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-10 02:04:04 +02:00
Marek Olšák
b7ca074cc0 glsl: fix shader_storage_blocks_write_access for SSBO block arrays
CTS: GL45-CTS.compute_shader.resources-max

Fixes: 4e1e8f684b "glsl: remember which SSBOs are not read-only and pass it to gallium"

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-09 19:25:35 -04:00
Andres Gomez
75a3dd97aa glsl/linker: location aliasing requires types to have the same width
From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers,
Page 67, (Location aliasing):

  " Further, when location aliasing, the aliases sharing the location
    must have the same underlying numerical type and bit
    width (floating-point or integer, 32-bit versus 64-bit, etc.) and
    the same auxiliary storage and interpolation qualification."

Additionally, we have improved the linker error descriptions.
Specifically, when taking structs into account we were producing a
linker error because we assumed that all components in each location
were used and that would cause component aliasing. This is not
accurate of the actual problem. Now, the failure specifies that the
underlying numerical type incompatibility is the cause for the
failure.

Fixes the following piglit test:

tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test

v2:
  - Do not assert if we see invalid numerical types. These come
    straight from shader code, so we should produce linker errors if
    shaders attempt to do location aliasing on variables that are not
    numerical such as records.
  - While we are at it, improve error reporting for the case of
    numerical type mismatch to include the shader stage.

v3:
  - Allow location aliasing of images and samplers. If we get these
    it means bindless support is active and they should be handled
    as 64-bit integers (Ilia)
  - Make sure we produce link errors for any non-numerical type
    for which we attempt location aliasing, not just structs.

v4:
  - Rebased with minor fixes (Andres).
  - Added fixing tag to the commit log (Andres).

v5:
  - Remove the helper function and check individually for the
    underlying numerical type and bit width (Timothy).
  - Implicitly, assume that any non-treated type which is checked for
    its underlying numerical type is either integer or
    float and has a defined bit width (Timothy).
  - Implicitly, assume that structs are the only non-treated
    non-numerical type (Timothy).
  - Improve the linker error descriptions and commit log (Andres).

Fixes: 13652e7516 ("glsl/linker: Fix type checks for location aliasing")
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-09 12:56:50 +02:00
Jason Ekstrand
6279074de1 nir: Get rid of global registers
We have a pass to lower global registers to locals and many drivers
dutifully call it.  However, no one ever creates a global register ever
so it's all dead code.  It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Jason Ekstrand
b28bad89b9 nir: Get rid of nir_register::is_packed
All we ever do is initialize it to zero, clone it, print it, and
validate it.  No one ever sets or uses it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Caio Marcelo de Oliveira Filho
bd73531677 spirv: Add support for DerivativeGroup capabilities
As defined in SPV_NV_compute_shader_derivatives. These control how the
invocations are arranged in a CS when doing derivative and related
operations (which are also enabled by the extension).

Since we expect valid SPIR-V, we don't need to do more work at SPIR-V
level to enable the derivative and related operations to be called.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
fcbc5ccaae nir: Don't set LOD=0 for compute shader that has derivative group
When using NV_compute_shader_derivatives to set a derivative group,
a compute shader supports texture with implicit LOD calculation, so
don't set an explicit LOD.

Note if the extension is used but the derivative group is not
specified, it will default to LOD=0 as before.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho
d08a74d2bf nir/algebraic: Lower CS derivatives to zero when no group defined
In compute shaders if no derivative group is defined, the derivatives
will always be zero.  Specified in NV_compute_shader_derivatives.

To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules.  (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
3c5ddaeacd glsl: Parse and propagate derivative_group to shader_info
NV_compute_shader_derivatives allow selecting between two possible
arrangements (quads and linear) when calculating derivatives and
certain subgroup operations in case of Vulkan.  So parse and propagate
those up to shader_info.h.

v2: Do not fail when ARB_compute_variable_group_size is being used,
    since we are still clarifying what is the right thing to do here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
ca60f0b7ba glsl: Enable texture builtins for NV_compute_shader_derivatives
Renamed a few predicates from "fs_only" to be "derivative_only" (or
similar pairs).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
09a3273fe7 glsl: Enable derivative builtins for NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
289478ea89 glsl: Remove redundant conditions when asserting in_qualifier
As the code evolved, we ended up with a redundant conditions.  Clean
this up.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho
163655b33e mesa: Extension boilerplate for NV_compute_shader_derivatives
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-04-08 19:29:32 -07:00
Timothy Arceri
e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Jason Ekstrand
50f3535d1f nir/search: Search for all combinations of commutative ops
Consider the following search expression and NIR sequence:

    ('iadd', ('imul', a, b), b)

    ssa_2 = imul ssa_0, ssa_1
    ssa_3 = iadd ssa_2, ssa_0

The current algorithm is greedy and, the moment the imul finds a match,
it commits those variable names and returns success.  In the above
example, it maps a -> ssa_0 and b -> ssa_1.  When we then try to match
the iadd, it sees that ssa_0 is not b and fails to match.  The iadd
match will attempt to flip itself and try again (which won't work) but
it cannot ask the imul to try a flipped match.

This commit instead counts the number of commutative ops in each
expression and assigns an index to each.  It then does a loop and loops
over the full combinatorial matrix of commutative operations.  In order
to keep things sane, we limit it to at most 4 commutative operations (16
combinations).  There is only one optimization in opt_algebraic that
goes over this limit and it's the bitfieldReverse detection for some UE4
demo.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
    instructions in affected programs: 1797123 -> 1789467 (-0.43%)
    helped: 6751
    HURT: 2264

    total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
    cycles in affected programs: 15931005 -> 15786914 (-0.90%)
    helped: 6024
    HURT: 3436

    total loops in shared programs: 4360 -> 4360 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 23675 -> 23666 (-0.04%)
    spills in affected programs: 235 -> 226 (-3.83%)
    helped: 5
    HURT: 1

    total fills in shared programs: 32040 -> 32032 (-0.02%)
    fills in affected programs: 190 -> 182 (-4.21%)
    helped: 6
    HURT: 2

    LOST:   18
    GAINED: 5

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-04-08 21:38:48 +00:00
Jason Ekstrand
ad8c145658 nir/algebraic: Add some logical OR and AND patterns
The new OR pattern has been seen in the wild and can end up being
generated by GLSLang.  Not sure about the other two new patterns but we
may as well throw them in for completeness.  While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15321227 -> 15321129 (<.01%)
    instructions in affected programs: 3594 -> 3496 (-2.73%)
    helped: 6
    HURT: 0

    total cycles in shared programs: 357481321 -> 357479725 (<.01%)
    cycles in affected programs: 44109 -> 42513 (-3.62%)
    helped: 6
    HURT: 0

VkPipeline-DB results on Kaby Lake:

    total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
    instructions in affected programs: 19058 -> 18288 (-4.04%)
    helped: 163
    HURT: 0

    total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
    cycles in affected programs: 750958 -> 736984 (-1.86%)
    helped: 158
    HURT: 1

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:06 -05:00
Jason Ekstrand
03a72d96d8 nir/algebraic: Drop some @bool specifiers
Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time.  We can drop these specifiers and now the optimizations
will apply more generally.

Shader-DB results on Kaby Lake:

    total instructions in shared programs: 15321168 -> 15321227 (<.01%)
    instructions in affected programs: 8836 -> 8895 (0.67%)
    helped: 1
    HURT: 31

    total cycles in shared programs: 357481781 -> 357481321 (<.01%)
    cycles in affected programs: 146524 -> 146064 (-0.31%)
    helped: 22
    HURT: 10

    total spills in shared programs: 23675 -> 23673 (<.01%)
    spills in affected programs: 11 -> 9 (-18.18%)
    helped: 1
    HURT: 0

    total fills in shared programs: 32040 -> 32036 (-0.01%)
    fills in affected programs: 27 -> 23 (-14.81%)
    helped: 1
    HURT: 0

No change in VkPipeline-DB

Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb.  In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL.  Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win?  Also it helps spilling in one Car Chase compute shader.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:00 -05:00
Caio Marcelo de Oliveira Filho
c037dbb0ef nir: Take if_uses into account when repairing SSA
If a def is used as an condition before its definition, we should also
consider this a case to repair.  When repairing, make sure we rewrite
any if conditions too.

Found in while inspecting a SPIR-V conversion from a 'continue block'
that contains a conditional branch.  We pull the continue block up to
the beggining of the loop, and the condition in the branch ends up
defined afterwards.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 364212f1ed "nir: Add a pass to repair SSA form"
2019-04-05 09:43:46 -07:00
Samuel Pitoiset
5eb17506e1 nir: do not pack varying with different types
The current algorithm only supports packing 32-bit types.
If a shader uses both 16-bit and 32-bit varyings, we shouldn't
compact them together.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 13:57:42 +02:00
Sergii Romantsov
a7d40a13ec glsl: Fix input/output structure matching across shader stages
Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.30 spec says:

    "Variables or block members declared as structures are considered
     to match in type if and only if structure members match in name,
     type, qualification, and declaration order."

Fixes:
     * layout-location-struct.shader_test

v2: rebased against master and small fixes

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108250
2019-04-05 11:02:23 +11:00
Marek Olšák
4e1e8f684b glsl: remember which SSBOs are not read-only and pass it to gallium
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-04 19:28:52 -04:00
Alyssa Rosenzweig
a83862754e nir: Add "viewport vector" system values
While a partial set of viewport system values exist, these are scalar
values, which is a poor fit for viewport transformations on vector ISAs
like Midgard (where the vec3 values for scale and offset each need to be
coherent in a vec4 uniform slot to take advantage of vectorized
transform math). This patch adds vec3 scale/offset fields corresponding
to the 3D Gallium viewport / glViewport+depth

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-04 03:44:09 +00:00
Dave Airlie
eb8fefe090 nir: use proper array sizing define for vectors
If we increase the vector size in the future it would be good
to not have to fix these up, this should change nothing at present.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-03 13:59:06 +10:00
Timothy Arceri
d8ce915a61 Revert "nir: propagate known constant values into the if-then branch"
This reverts commit 4218b6422c.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110311
2019-04-03 13:24:18 +11:00
Timothy Arceri
4218b6422c nir: propagate known constant values into the if-then branch
Helps Max Waves / VGPR use in a bunch of Unigine Heaven
shaders.

shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 5505440 -> 5505872 (0.01 %)
VGPRS: 3077520 -> 3077296 (-0.01 %)
Spilled SGPRs: 39032 -> 39030 (-0.01 %)
Spilled VGPRs: 16326 -> 16326 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 744 -> 744 (0.00 %) dwords per thread
Code Size: 123755028 -> 123753316 (-0.00 %) bytes
Compile Time: 2751028 -> 2560786 (-6.92 %) milliseconds
LDS: 1415 -> 1415 (0.00 %) blocks
Max Waves: 972192 -> 972240 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

vkpipeline-db results RADV (VEGA):

Totals from affected shaders:
SGPRS: 160 -> 160 (0.00 %)
VGPRS: 88 -> 88 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 18268 -> 18152 (-0.63 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 26 -> 26 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-03 10:04:48 +11:00
Tapani Pälli
06f40f5765 spirv: fix a compiler warning
Fixes implicit conversion from enumeration type 'SpvOp' warning.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-01 07:43:10 +03:00
Rob Clark
1ae0c030cb nir: add lower_all_io_to_elements
I need this part of lower_all_io_to_temps but without the actual
lowering to temps part.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark
e5e67228f5 nir: print var name for load_interpolated_input too
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
2019-03-30 12:55:47 -04:00
Jason Ekstrand
7dbd934e26 nir: Lock around validation fail shader dumping
This prevents getting mixed-up results if a multi-threaded app has two
validation errors in different threads.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-29 21:57:51 -05:00
Karol Herbst
fea0caea2b nir/validate: validate that tex deref sources are actually derefs
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-29 16:03:22 +01:00
Karol Herbst
6ffc72472c nir/print: fix printing the image_array intrinsic index
Fixes: 0de003be03 ("nir: Add handle/index-based image intrinsics")

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-29 16:03:22 +01:00
Brian Paul
4ee057eaef nir: use {0} initializer instead of {} to fix MSVC build
Trivial change.

Fixes: c6ee46a75 ("nir: Add nir_alu_srcs_negative_equal")
2019-03-28 20:34:23 -06:00
Ian Romanick
2cf59861a8 nir: Add partial redundancy elimination for compares
This pass attempts to dectect code sequences like

    if (x < y) {
        z = y - x;
	...
    }

and replace them with sequences like

    t = x - y;
    if (t < 0) {
        z = -t;
	...
    }

On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction.  It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.

Currently only floating point compares and adds are supported.  Adding
support for integer will be a challenge due to integer overflow.  There
are a couple possible solutions, but they may not apply to all
architectures.

v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block().  Add
missing (-A + B) case.  Suggested by Caio.

v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32.  Suggested by Ken.

v4: Add some comments explaining how this works.  Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:53 -07:00
Ian Romanick
c6ee46a753 nir: Add nir_alu_srcs_negative_equal
v2: Move bug fix in get_neg_instr from the next patch to this patch
(where it was intended to be in the first place).  Noticed by Caio.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:52 -07:00
Ian Romanick
be1cc3552b nir: Add nir_const_value_negative_equal
v2: Rebase on 1-bit Boolean changes.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:52 -07:00
Ian Romanick
ae21b52e1d nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick
cbad201c2b nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick
bc17f5a2a3 nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-28 15:35:52 -07:00
Ian Romanick
c152672e68 nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%

total cycles in shared programs: 372286583 -> 372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%

No changes on any other Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Dave Airlie
b779baa9bf nir/deref: fix struct wrapper casts. (v3)
llvm/spir-v spits out some struct a { struct b {} }, but it
doesn't deref, it casts (struct a) to (struct b), reconstruct
struct derefs instead of casts for these.

v2: use ssa_def_rewrite uses, rework the type restrictions (Jason)
v3: squish more stuff into one function, drop unused temp (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-29 08:10:50 +10:00
Samuel Pitoiset
bea540173c spirv: propagate the access flag for store and load derefs
It was only propagated when UBO/SSBO access are lowered to offsets.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: <Jason Ekstrand jason@jlekstrand.net>
2019-03-27 09:57:30 +01:00
Samuel Pitoiset
4d0b03c83d nir: add nir_{load,store}_deref_with_access() helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: <Jason Ekstrand jason@jlekstrand.net>
2019-03-27 09:57:27 +01:00
Timothy Arceri
d163780f81 spirv: make use of the select control support in nir
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
2019-03-27 02:39:12 +00:00
Timothy Arceri
e76ae39ae2 nir: add support for user defined select control
This will allow us to make use of the selection control support in
spirv and the GL support provided by EXT_control_flow_attributes.

Note this only supports if-statements as we dont support switches
in NIR.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
2019-03-27 02:39:12 +00:00