Commit graph

4029 commits

Author SHA1 Message Date
Timur Kristóf
6684e039eb nir: Add multiplier argument to nir_lower_uniforms_to_ubo.
Note that locations can be set in different units, and the multiplier
argument caters to supporting these different units. For example,
st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4,
while tgsi_to_nir uses bytes, so the multiplier should be 16.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Timur Kristóf
909d1f50f3 nir: Move nir_lower_uniforms_to_ubo to compiler/nir.
The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Timur Kristóf
317f10bf40 nir: Add ability for shaders to use window space coordinates.
This patch adds a shader_info field that tells the driver to use window
space coordinates for a given vertex shader. It also enables this feature
in radeonsi (the only NIR-capable driver that supported it in TGSI),
and makes tgsi_to_nir aware of it.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-05 19:13:27 +00:00
Eric Anholt
2780a99ff8 v3d: Move the stores for fixed function VS output reads into NIR.
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them.  Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.

total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)

Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
2019-03-05 10:59:40 -08:00
Eric Anholt
a4f612b4cf nir: Improve printing of load_input/store_output variable names.
We were printing only when the channel was exactly the start channel, so
scalarized loads/stores would be missing the name on the rest.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-05 10:59:40 -08:00
Jason Ekstrand
61e009d2c4 spirv: Use the same types for resource indices as pointers
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver.  This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand
9f7ee4f8e5 spirv: Use the generic dereference function for OpArrayLength
With the new deref changes, the old pointer_offset version may not be
the right one to call.  Just call the generic one and let it sort it
out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-05 10:06:50 -06:00
Jason Ekstrand
f1dbc7e97d spirv: Pull offset/stride from the pointer for OpArrayLength
We can't pull it from the variable type because it might be an array of
blocks and not just the one block.  While we're here, throw in some
error checking.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2019-03-05 10:06:50 -06:00
Jason Ekstrand
5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Jason Ekstrand
ca295ddbfb spirv: OpImageQueryLod requires a sampler
No idea how this fell through the cracks besides the fact that the
sampler bound at 0 almost always works and the CTS isn't amazing.  In
any case, this appears to have been broken for almost forever.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2019-03-04 23:56:39 +00:00
Sagar Ghuge
58bcebd987 spirv: Allow [i/u]mulExtended to use new nir opcode
Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32
bits as needed instead of emitting mul and mul_high.

v2: Surround the switch case with curly braces (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Sagar Ghuge
47ec9bdc60 nir/algebraic: Optimize low 32 bit extraction
Optimize a situation where we only need lower 32 bits from 64 bit
result.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Sagar Ghuge
1d8994a63b glsl: [u/i]mulExtended optimization for GLSL
Optimize mulExtended to use 32x32->64 multiplication.

Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.

v2: Add missing condition check (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Sagar Ghuge
e551040c60 nir/glsl: Add another way of doing lower_imul64 for gen8+
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-04 15:50:25 -08:00
Ilia Mirkin
4eec3a2a36 glsl: fix recording of variables for XFB in TCS shaders
This is purely for conformance, since it's not actually possible to do
XFB on TCS output varyings. However we do have to make sure we record
the names correctly, and this removes an extra level of array-ness from
the names in question.

Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage

v2: Add comment to the new program_resource_visitor::process function.
    (Ilia Mirkin)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo
bf1f49482d glsl: TCS outputs can not be transform feedback candidates on GLES
Avoids regression on:

KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

that is uncovered by the following patch.

"glsl: fix recording of variables for XFB in TCS shaders"

v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders
v3: Move this patch before "glsl: fix recording of variables for XFB in TCS
    shaders" to avoid temporal regressions. (Illia Mirkin)

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo
cc7173b438 glsl: fix typos in comments "transfor" -> "transform"
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Gert Wollny
3214f20914 mesa: Expose EXT_texture_query_lod and add support for its use shaders
EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.

v2: Set ES 3.0 as minimum GLES version as required by the extension

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-03 21:50:42 +01:00
Jordan Justen
7de056e1a9
scons: Generate float64_glsl.h for glsl_to_nir fp64 lowering
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Jordan Justen
31b35916dd
nir: Add int64/doubles options into nir_shader_compiler_options
This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:41 -08:00
Ian Romanick
bae0c36751 nir/algebraic: Optimize away an fsat of a b2f
The b2f can only produce 0.0 or 1.0, so the fsat does nothing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-02 13:58:56 -08:00
Ian Romanick
ecc9ffa778 nir/algebraic: Replace a-fract(a) with floor(a)
I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.

In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?

As far as I can tell, it's just because our scheduler is terrible.  In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).

Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader.  Once that happens, everything
is different.

I looked at one vertex shader that was hurt (from Goat Simulator).  In
that case, both the floor and fract are used.  The optimization
eliminates the add, and it should allow better scheduling.  In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.

In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions.  It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.

total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.

Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-03-01 12:43:25 -08:00
Ian Romanick
d40640efe8 nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))
I have not investigated the result of doing this during code
generation.  That should be possible, but it would be a bit more
effort.

All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs: 370961508 -> 370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8206587 -> 8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2

total cycles in shared programs: 187657422 -> 187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071804 -> 15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2

total cycles in shared programs: 369914348 -> 369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs)   min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel)   min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick
eae19f5f19 nir/algebraic: Replace i2b used by bcsel or if-statement with comparison
All of the helped shaders are in Deus Ex.  I looked at a couple shaders,
and they have a pattern like:

    vec1 32 ssa_373 = i2b32 ssa_345.w
    vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0
    ...
    vec1 32 ssa_377 = ine ssa_345.w, ssa_0
    if ssa_377 {
        ...
        vec1 32 ssa_416 = i2b32 ssa_385.w
        vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374
        ...
    }

The massive help occurs because the i2b32 is removed, then other passes
determine that ssa_374 must be ssa_20 inside the if-statement allowing
the first bcsel to also be deleted.

v2: Rebase on 1-bit Boolean changes.

v3: Fix i2b32 vs ine problem in if-statement replacement.  Noticed by
Bas.

Skylake
total instructions in shared programs: 15241394 -> 15186287 (-0.36%)
instructions in affected programs: 890583 -> 835476 (-6.19%)
helped: 355
HURT: 0
helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149
helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59%
95% mean confidence interval for instructions value: -165.07 -145.39
95% mean confidence interval for instructions %-change: -6.42% -5.77%
Instructions are helped.

total cycles in shared programs: 373846583 -> 371023357 (-0.76%)
cycles in affected programs: 118972102 -> 116148876 (-2.37%)
helped: 343
HURT: 14
helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089
helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77%
HURT stats (abs)   min: 120 max: 4126 x̄: 2482.79 x̃: 3019
HURT stats (rel)   min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11%
95% mean confidence interval for cycles value: -8723.28 -7093.12
95% mean confidence interval for cycles %-change: -2.57% -2.02%
Cycles are helped.

total spills in shared programs: 32401 -> 23465 (-27.58%)
spills in affected programs: 24457 -> 15521 (-36.54%)
helped: 343
HURT: 0

total fills in shared programs: 37866 -> 31765 (-16.11%)
fills in affected programs: 18889 -> 12788 (-32.30%)
helped: 343
HURT: 0

Broadwell and Haswell had similar results. (Haswell shown)
Haswell
total instructions in shared programs: 13764783 -> 13750679 (-0.10%)
instructions in affected programs: 1176256 -> 1162152 (-1.20%)
helped: 334
HURT: 21
helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47
helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37%
HURT stats (abs)   min: 1 max: 61 x̄: 5.76 x̃: 1
HURT stats (rel)   min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03%
95% mean confidence interval for instructions value: -43.99 -35.47
95% mean confidence interval for instructions %-change: -1.35% -1.08%
Instructions are helped.

total cycles in shared programs: 386511910 -> 385402528 (-0.29%)
cycles in affected programs: 143831110 -> 142721728 (-0.77%)
helped: 327
HURT: 39
helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570
helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96%
HURT stats (abs)   min: 16 max: 4881 x̄: 1065.95 x̃: 997
HURT stats (rel)   min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24%
95% mean confidence interval for cycles value: -3375.59 -2686.60
95% mean confidence interval for cycles %-change: -0.92% -0.64%
Cycles are helped.

total spills in shared programs: 100480 -> 97846 (-2.62%)
spills in affected programs: 84702 -> 82068 (-3.11%)
helped: 316
HURT: 21

total fills in shared programs: 96877 -> 94369 (-2.59%)
fills in affected programs: 69167 -> 66659 (-3.63%)
helped: 316
HURT: 9

No changes on Ivy Bridge or earlier platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Caio Marcelo de Oliveira Filho
1458aa1f78 nir/copy_prop_vars: handle indirect vector elements
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load.  Such SSA
values don't help loading of the indirect unless we emit an if-ladder.

Copy_derefs are supported for indirects.

Also enable two tests that now pass.

v2: Remove unnecessary temporaries.  Be clearer when identifying the
    case where copy_entry doesn't help when we are dealing with an
    indirect array_deref (of a vector).  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho
6c0de78cc2 nir/copy_prop_vars: prefer using entries from equal derefs
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.

This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho
61965afd00 nir/copy_prop_vars: add tests for indirect array deref
Both on an actual array and on a vector, and an extra test on a vector
mixing direct and indirect access.  The vector tests are disabled and
will be enabled by a later commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho
96c32d7776 nir/copy_prop_vars: handle load/store of vector elements
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.

Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w.  Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.

The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.

It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.

Tests related to these cases are now enabled.

v2: Instead of asserting on invalid indices, "load" an undef and
    remove the store.  (Jason)

v3: Merge code path for the cases of is_array_deref_of_vector into the
    regular code path.  Add a base_index parameter to
    value_set_from_value.  (code changes by Jason)

v4: Removed the get_entry_for_deref helper, now being used only once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho
33dafdc024 nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS
Also replace uses of 0xf with the appropriate full mask created from
the number of components.

Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho
e84c841fb0 nir/copy_prop_vars: rename/refactor store_to_entry helper
The name reflected this function role back when the pass also did dead
write elimination.  So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:50:05 -08:00
Juan A. Suarez Romero
b43b55d461 nir/spirv: return after emitting a branch in block
When emitting a branch in a block, it does not make sense to continue
processing further instructions, as they will not be reachable.

This fixes a nasty case with a loop with a branch that both then-part
and else-part exits the loop:

%1 = OpLabel
     OpLoopMerge %2 %3 None
     OpBranchConditional %false %2 %2
%3 = OpLabel
     OpBranch %1
%2 = OpLabel
    [...]

We know that block %1 will branch always to block %2, which is the merge
block for the loop. And thus a break is emitted. If we keep continuing
processing further instructions, we will be processing the branch
conditional and thus emitting the proper NIR conditional, which leads to
instructions after the break.

This fixes dEQP-VK.graphicsfuzz.continue-and-merge.

CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 09:47:06 +01:00
Timothy Arceri
7536af670b glsl: fix shader cache for packed param list
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.

Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.

This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.

Fixes: edded12376 ("mesa: rework ParameterList to allow packing")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-28 11:47:37 +11:00
Gert Wollny
b7201a468d nir: Add posibility to not lower to source mod 'abs' for ops with three sources
This is useful for r600 since there the abs source modifier is not supported
for ops with three sources

v2: Use correct logic to enable lowering to abs source mod (Eric Anhold)

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-27 11:04:06 +00:00
Kasireddy, Vivek
78fb3fd17e nir/lower_tex: Add support for XYUV lowering
The memory layout associated with this format would be:
Byte:      0 1 2 3
Component: V U Y X

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:51 +00:00
Tapani Pälli
22267feff1 nir: initialize value in copy_prop_vars_block
Fixes following valgrind warning:

   ==27561== Conditional jump or move depends on uninitialised value(s)
   ==27561==    at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78)
   ==27561==    by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797)

Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-26 08:56:25 +02:00
Eric Anholt
7c1bf075f3 nir: Just return when asked to rewrite uses of an SSA def to itself.
The nir_builder swizzling improvement to not emit extra MOVs resulted in
nir_lower_tex() trying to rewrite an SSA def to itself, triggering the
assert on all texturing in v3d.  There's no work to be done in this case,
so just stop asserting.

Fixes: 743700be1f ("nir/builder: Don't emit no-op swizzles")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 21:25:24 +00:00
Daniel Schürmann
0bd45f96b9 nir: Use SM5 properties to optimize shift(a@32, iand(31, b))
This is a common pattern from HLSL->SPIRV translation
and supported in HW by all current NIR backends.

vkpipeline-db results anv (SKL):

    total instructions in shared programs: 6403130 -> 6402380 (-0.01%)
    instructions in affected programs: 204084 -> 203334 (-0.37%)
    helped: 208
    HURT: 0

    total cycles in shared programs: 1915629582 -> 1918198408 (0.13%)
    cycles in affected programs: 1158892682 -> 1161461508 (0.22%)
    helped: 107
    HURT: 86

shader-db results on i965 (KBL):

    total instructions in shared programs: 15284592 -> 15284568 (<.01%)
    instructions in affected programs: 81683 -> 81659 (-0.03%)
    helped: 24
    HURT: 0

    total cycles in shared programs: 375013622 -> 375013932 (<.01%)
    cycles in affected programs: 40169618 -> 40169928 (<.01%)
    helped: 13
    HURT: 9

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:44 -06:00
Daniel Schürmann
0525bdc225 nir: Define shifts according to SM5 specification.
SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:43 -06:00
Oscar Blumberg
da9c030763 glsl: Fix function return typechecking
apply_implicit_conversion only converts and check base types but we
need actual type equality for function returns, otherwise you can
return a vec2 from a function declared as returning a float.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-25 08:49:06 +02:00
Jason Ekstrand
743700be1f nir/builder: Don't emit no-op swizzles
The nir_swizzle helper is used some on it's own but it's also called by
nir_channel and nir_channels which are used everywhere.  It's pretty
quick to check while we're walking the swizzle anyway whether or not
it's an identity swizzle.  If it is, we now don't bother emitting the
instruction.  Sure, copy-prop will clean it up for us but there's no
sense making more work for the optimizer than we have to.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-24 20:01:27 -06:00
Jason Ekstrand
724371c6b9 nir/split_vars: Don't compact vectors unnecessarily
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-24 20:01:18 -06:00
Caio Marcelo de Oliveira Filho
4c160b6bd8 nir: fix MSVC build
Zero initialize struct with {0} instead of {}.
2019-02-22 22:38:05 -08:00
Caio Marcelo de Oliveira Filho
eb13211997 nir/copy_prop_vars: add tests for load/store elements of vectors
Test using array deref on vectors in loads and stores.  These are
marked DISABLED_ as this optimization is currently not done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
4f3809d389 nir: nir_build_deref_follower accept array derefs of vectors
Code itself already supports it, just make sure we can use it for
those cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
c4beadd28e nir/copy_prop_vars: change test helper to get intrinsics
Replace find_next_intrinsic(intrinsic, after) with
get_intrinsic(intrinsic, index).  This makes slightly more convenient
to check the resulting loads/stores/copies, since in most tests we
know which one we care about.  The cost is to perform more traversals,
but for such tests this is not a problem.

Added the ASSERT_EQ() on count to some tests missing it, so the
indices queried are always expected to find something.

Also, drop two nir_print_shader leftover calls in a test.

v2: Remove redundant assertions.  nir_src_comp_as_uint already
    assert what we need.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
fdcb9779d9 nir/copy_prop_vars: keep track of components in copy_entry
When a copy_entry is SSA, store not only the nir_ssa_def* for each
component, but also the source component they come from.  At the
moment this is always a match (i.e. 'component[i] == i'), because all
the operations for a copy_entry happen using definitions with the same
size.  This prepares the code for array_derefs of vectors, in which
'component[i] != i'.

Also, extract setting all SSA components into a function of its own.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
6624decbb5 nir/copy_prop_vars: add debug helpers
Disabled by default, to be used during development.  Adding those
so I don't rewrite some ad-hoc version of them everytime I'm working
with this pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
60d9bb9ff5 nir/copy_prop_vars: don't get confused by array_deref of vectors
For now these derefs are not handled, so don't let these get into the
copies list -- which would cause wrong propagations.  For load_derefs,
do nothing.  For store_derefs, invalidate whatever the store is
writing to.  For copy_derefs, invalidate whatever the copy is writing
to.

These cases will happen once derefs to SSBOs/UBOs are kept around long
enough to get optimized by copy_prop_vars.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Timothy Arceri
f48527e51a nir: allow nir_lower_phis_to_scalar() on more src types
Rather than only lowering if all srcs are scalarizable we instead
check that at least one src is scalarizable.

We change undef type to return false otherwise it will cause
regressions when it is the only scalarizable src.

total instructions in shared programs: 13219105 -> 13024547 (-1.47%)
instructions in affected programs: 1153797 -> 959239 (-16.86%)
helped: 581
HURT: 74

total cycles in shared programs: 333968972 -> 324807922 (-2.74%)
cycles in affected programs: 129809402 -> 120648352 (-7.06%)
helped: 571
HURT: 131

total spills in shared programs: 57947 -> 29130 (-49.73%)
spills in affected programs: 53364 -> 24547 (-54.00%)
helped: 351
HURT: 0

total fills in shared programs: 51310 -> 25468 (-50.36%)
fills in affected programs: 44882 -> 19040 (-57.58%)
helped: 351
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-23 11:11:51 +11:00
Timothy Arceri
d9e08e753b nir: clone instruction set rather than removing individual entries
This reduces the time spent in nir_opt_cse() by almost a half.

The massif tool from callgrind reported no change in peak
memory use with the large doliphin uber shaders I used for
testing.

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 08:36:36 +11:00