Commit graph

4859 commits

Author SHA1 Message Date
Ian Romanick
23c5501b77 nir/flrp: Lower flrp(#a, #b, c) differently
If the magnitudes of #a and #b are such that (b-a) won't lose too much
precision, lower as a+c(b-a).

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8192503 -> 8192383 (<.01%)
instructions in affected programs: 18417 -> 18297 (-0.65%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43%
95% mean confidence interval for instructions value: -2.48 -1.05
95% mean confidence interval for instructions %-change: -1.56% -0.63%
Instructions are helped.

total cycles in shared programs: 188662536 -> 188661956 (<.01%)
cycles in affected programs: 744476 -> 743896 (-0.08%)
helped: 62
HURT: 0
helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6
helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06%
95% mean confidence interval for cycles value: -12.37 -6.34
95% mean confidence interval for cycles %-change: -0.48% -0.06%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
158370ed2a nir/flrp: Add new lowering pass for flrp instructions
This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp.  Rebase on top of
3766334923 ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick
dc566a033c nir/algebraic: Pull common multiplication out of flrp arguments
All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick
a83a6e9690 nir/algebraic: Pull common addition out of flrp arguments
v2: Augment the late optimization patterns with a couple pre-ffma pass
patterns.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342982 -> 15342485 (<.01%)
instructions in affected programs: 56304 -> 55807 (-0.88%)
helped: 235
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
95% mean confidence interval for instructions value: -2.31 -1.92
95% mean confidence interval for instructions %-change: -1.46% -1.09%
Instructions are helped.

total cycles in shared programs: 355734740 -> 355734320 (<.01%)
cycles in affected programs: 1028807 -> 1028387 (-0.04%)
helped: 134
HURT: 104
helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
95% mean confidence interval for cycles value: -8.51 4.98
95% mean confidence interval for cycles %-change: -0.35% 0.39%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10886815 -> 10886390 (<.01%)
instructions in affected programs: 36883 -> 36458 (-1.15%)
helped: 147
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
95% mean confidence interval for instructions value: -3.12 -2.67
95% mean confidence interval for instructions %-change: -1.83% -1.38%
Instructions are helped.

total cycles in shared programs: 154188360 -> 154186902 (<.01%)
cycles in affected programs: 388094 -> 386636 (-0.38%)
helped: 90
HURT: 58
helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
95% mean confidence interval for cycles value: -22.62 2.92
95% mean confidence interval for cycles %-change: -0.68% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
instructions in affected programs: 54560 -> 53678 (-1.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
95% mean confidence interval for instructions value: -5.21 -4.28
95% mean confidence interval for instructions %-change: -2.23% -1.72%
Instructions are helped.

total cycles in shared programs: 188654442 -> 188650364 (<.01%)
cycles in affected programs: 1454384 -> 1450306 (-0.28%)
helped: 204
HURT: 0
helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
95% mean confidence interval for cycles value: -22.38 -17.60
95% mean confidence interval for cycles %-change: -0.67% -0.46%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Christian Gmeiner
e00fa99b08 glsl_to_nir: drop supports_ints
At initial nir level all drivers are supporting ints.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:59 +02:00
Christian Gmeiner
4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
Vasily Khoruzhick
443c5a3cd6 nir: add int_to_float lowering pass
This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.

It uses newly introduced pass to gather SSA types and should be
used as late as possible.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
John Stultz
c7f2145b4b mesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list
In commit a99c360a46 (nir: add pass to lower fb reads), a new
file was added that needs to also be added to the
Makefile.sources list used by the Android and SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: a99c360a46 ("nir: add pass to lower fb reads")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Alistair Strachan
0fda3eac31 mesa: android: Remove unnecessary dependency tracking rules
The current AOSP master build system breaks building mesa due to the
following error:

external/mesa3d/src/compiler/Android.glsl.gen.mk:94: error:
  writing to readonly directory: "external/mesa3d/src/compiler/glsl/ir.h"

This error is bogus -- nothing "writes" to ir.h -- but the rule is
unnecessary because the generated header that is a dependency of the
non-generated header should be added to LOCAL_GENERATED_SOURCES and this
will track if the dependency needs to be regenerated.

(This change fixes a similar problem affecting nir.h too.)

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
[jstultz: Forward ported and tweaked commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:25 +00:00
Karol Herbst
7f85283103 spirv/cl: support vload/vstore
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Karol Herbst
d11b807da5 nir: Add nir_op_vec helper
with that we can simplify code where nir vectors are created

v2: merge both lines in nir_vec

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Karol Herbst
681fb7ea05 nir: Add a nir_builder_alu variant which takes an array of components
v2: rename to nir_build_alu_src_arr

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Karol Herbst
c91ea6343f vtn: handle bitcast with pointer src/dest
v2: use vtn_push_ssa and vtn_ssa_value

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-04 12:27:51 +02:00
Jason Ekstrand
91899495a1 nir: Add a SSA type gathering pass
This new pass (which isn't even compile-tested) attempts to determine
the ALU type of all the SSA values in a function impl.  It takes a
greedy approach and assigns intness or floatness to everything it thinks
can possibly contain an int or a float.  Some values will be labled as
both int and float and some will be labled as neither and it is up to
the caller to decide what to do with this information.  However, for a
"nice" shader where the original source contained no bit-casts and no
implicit bit-casts were introduced by optimizations, there shouldn't be
any overlap in the two sets save for the odd CSEd zero constant.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-04 03:52:05 +00:00
Connor Abbott
d0ea9877b8 nir/algebraic: Don't emit empty initializers for MSVC
Just don't emit the transform array at all if there are no transforms

v2:
- Don't use len(array) > 0 (Dylan)
- Keep using ARRAY_SIZE to make the generated C code easier to read
(Jason).
2019-05-04 00:13:21 +02:00
Dylan Baker
c613861b23 meson: Don't build glsl cache_test when shader cache is disabled
v2: - Use new with_shader_cache variable instead of
      host_machine.system() == 'windows'

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-03 10:58:31 -07:00
Dylan Baker
5eb0f33e4f glsl/tests: define ssize_t on windows
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-03 10:58:24 -07:00
Dylan Baker
113bb8d448 glsl: fix general_ir_test with mingw
Somewhere down in the depths of the mingw headers 'interface' is
defined, change it to iface like a similar patch did.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-03 10:57:17 -07:00
Dave Airlie
6fd6246d92 nir: fix lower vars to ssa for larger vector sizes.
This has a couple of hardcoded vec4 limits in it, change them
to the proper sizing to avoid future issues.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-03 15:23:00 +10:00
Dave Airlie
2774d39366 spirv: fix SpvOpBitSize return value.
The spir-v spec says this returns a bool.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-03 15:22:57 +10:00
Rob Clark
b73dd91f60 nir: fix nir tex print harder
Fixes: 691d5a825a nir: rework tex instruction printing
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 15:06:01 -07:00
Marek Olšák
b3a26d4628 glsl: fix and clean up NV_compute_shader_derivatives support
- make sure compute shader derivatives are exposed for all extensions
- unify duplicated code

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-02 16:09:24 -04:00
Rob Clark
a99c360a46 nir: add pass to lower fb reads
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark
a2c89a85f4 nir: fix lower_wpos_ytransform in load_frag_coord case
Apparently we never hit this path.  Or at least haven't for a rather
long time.  But in either case (load_deref or load_frag_coord), we can
just directly use the intrinsic's ssa dest.  So stop passing the
nir_variable (which would be NULL in the load_frag_coord case) around
and instead just use &intr->dest.ssa.

(This ofc means we need to setup the cursor to insert *after* the
instruction, which seems to be another bug of the original
implementation.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark
691d5a825a nir: rework tex instruction printing
The extra comma at the end was annoying me.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Connor Abbott
6ec4ed48fc nir/search: Add debugging code to dump the pattern matched
This was useful while debugging the previous commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-02 16:14:06 +02:00
Connor Abbott
7ce86e6938 nir/search: Add automaton-based pre-searching
nir_opt_algebraic is currently one of the most expensive NIR passes,
because of the many different patterns we've added over the years. Even
though patterns are already sorted by opcode, there are still way too
many patterns for common opcodes like bcsel and fadd, which means that
many patterns are tried but only a few actually match. One way to fix
this is to add a pre-pass over the code that scans it using an automaton
constructed beforehand, similar to the automatons produced by lex and
yacc for parsing source code. This automaton has to walk the SSA graph
and recognize possible pattern matches.

It turns out that the theory to do this is quite mature already, having
been developed for instruction selection as well as other non-compiler
things. I followed the presentation in the dissertation cited in the
code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep
the naming similar. To create the automaton, we have to perform
something like the classical NFA to DFA subset construction used by lex,
but it turns out that actually computing the transition table for all
possible states would be way too expensive, with the dissertation
reporting times of almost half an hour for an example of size similar to
nir_opt_algebraic. Instead, we adopt one of the "filter" approaches
explained in the dissertation, which trade much faster table generation
and table size for a few more table lookups per instruction at runtime.
I chose the filter which resulted the fastest table generation time,
with medium table size. Right now, the table generation takes around .5
seconds, despite being implemented in pure Python, which I think is good
enough. Based on the numbers in the dissertation, the other choice might
make table compilation time 25x slower to get 4x smaller table size, but
I don't think that's worth it. As of now, we get the following binary
size before and after this patch:

    text   data	    bss	     dec	   hex	filename
11979455 464720	 730864	13175039	c908ff	before i965_dri.so
   text	   data	    bss	    dec	           hex	filename
12037835 616244	 791792	13445871	cd2aef	after i965_dri.so

There are a number of places where I've simplified the automaton by
getting rid of details in the LHS patterns rather than complicate things
to deal with them. For example, right now the automaton doesn't
distinguish between constants with different values. This means that it
isn't as precise as it could be, but the decrease in compile time is
still worth it -- these are the compilation time numbers for a shader-db
run with my (admittedly old) database on Intel skylake:

Difference at 95.0% confidence
	-42.3485 +/- 1.375
	-7.20383% +/- 0.229926%
	(Student's t, pooled s = 1.69843)

We can always experiment with making it more precise later.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-02 16:14:06 +02:00
Brian Paul
48107b5a2b glsl: fix typo in #warning message
Trivial.  Spotted by Eric Engestrom.
2019-05-02 06:32:57 -06:00
Brian Paul
413e55b5b9 glsl: work around MinGW 7.x compiler bug
I'm not sure what triggered this, but building with
scons platform=windows toolchain=crossmingw machine=x86 build=profile
with MinGW g++ 7.3 or 7.4 causes an internal compiler error.

We can work around it by forcing -O1 optimization.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2019-05-01 20:06:54 -06:00
Ian Romanick
85e6865ff6 nir: Saturating integer arithmetic is not associative
In 8-bits,

    iadd_sat(iadd_sat(0x7f, 0x7f), -1) =
    iadd_sat(0x7f, -1) =
    0x7e

but,

    iadd_sat(0x7f, iadd_sat(0x7f, -1)) =
    iadd_sat(0x7f, 0x7e) =
    0x7f

Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-01 09:07:47 -07:00
Jonathan Marek
0c6702cfa5 nir: improve convert_yuv_to_rgb
Use a different arrangement of constants to allow more ffma.

A vec4 backend will now use 3 fma for yuv_to_rgb. On freedreno/ir3, it is
down from 10 to 7 alu (4 fma, 3 mul, 3 add to 7 fma). Other backends
shouldn't be hurt.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-01 04:13:36 -07:00
Juan A. Suarez Romero
bbbe00a101 spirv: add missing SPV_EXT_descriptor_indexing capabilities
Add ShaderNonUniformEXT, UniformBufferArrayNonUniformIndexingEXT,
SampledImageArrayNonUniformIndexingEXT,
StorageBufferArrayNonUniformIndexingEXT,
StorageImageArrayNonUniformIndexingEXT,
InputAttachmentArrayNonUniformIndexingEXT,
UniformTexelBufferArrayNonUniformIndexingEXT and
StorageTexelBufferArrayNonUniformIndexingEXT capabilities.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-30 09:22:45 +02:00
Caio Marcelo de Oliveira Filho
1fb6630636 spirv: Properly handle SpvOpAtomicCompareExchangeWeak
The code was handling the Weak variant in some cases, but missing
others, e.g. the get_deref_nir_atomic_op.  Add all the missing cases
with the same behavior of the non-Weak SpvOpAtomicCompareExchange.

Note that the Weak variant is basically an alias, as SPIR-V 1.3,
Revision 7 says

    "OpAtomicCompareExchangeWeak

    Deprecated (use OpAtomicCompareExchange).

    Has the same semantics as OpAtomicCompareExchange."

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-29 19:02:44 -07:00
Eric Engestrom
7ca8ba199f delete autotools .gitignore files
One special case, `src/util/xmlpool/.gitignore` is not entirely deleted,
as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`).

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-04-29 21:17:19 +00:00
Andres Gomez
c81fbb42d9 glsl/linker: check for xfb_offset aliasing
From page 76 (page 80 of the PDF) of the GLSL 4.60 v.5 spec:

  " No aliasing in output buffers is allowed: It is a compile-time or
    link-time error to specify variables with overlapping transform
    feedback offsets."

Currently, this is expected to fail, but it succeeds:

  "

    ...

    layout (xfb_offset = 0) out vec2 a;
    layout (xfb_offset = 0) out vec4 b;

    ...

  "

Fixes the following piglit test:
tests/spec/arb_enhanced_layouts/compiler/transform-feedback-layout-qualifiers/xfb_offset/invalid-overlap.vert

Fixes the following test:
KHR-GL44.enhanced_layouts.xfb_output_overlapping

v2:
  - Use a data structure to track the used components instead of a
    nested loop (Ilia).

v3:
  - Take the BITSET_WORD array out from the
    gl_transform_feedback_buffer struct and make it local to the
    validation process (Timothy).
  - Do not use a nested scope for the validation (Timothy).

v4:
  - Add reference to the fixed piglit test in the commit log.
  - Add reference to the fixed VK-GL-CTS test in the commit
    log (Tapani).
  - Empty initialize the BITSET_WORD pointers array (Tapani).

Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-04-29 12:13:29 +02:00
Kenneth Graunke
2b44b27dbe nir: Add a new nir_cf_list_is_empty_block() helper.
Helper and name suggested by Eric Anholt.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-28 22:36:08 -07:00
Kenneth Graunke
08dc93c67c glsl/list: Add an exec_list_is_singular() helper.
Similar to list_is_singular() in util/list.h.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-28 22:35:42 -07:00
Andreas Baierl
b82de2b4d7 nir: add rcp(w) lowering for gl_FragCoord
On some hardware (e.g. Mali400) the shader needs to apply some
transformations for correct gl_FragCoord handling. The lowering
actions look like the following in pseudocode:
   gl_FragCoord.xyz = gl_FragCoord_orig.xyz
   gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w

Add this lowering as a nir pass in preparation for using it in the driver.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-29 02:46:44 +00:00
Tapani Pälli
af06963d24 glsl: use empty brace initializer
fixes following warning with clang:
   warning: suggest braces around initialization of subobject

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-04-26 12:24:41 -07:00
Tapani Pälli
7a7f182dac nir: use braces around subobject in initializer
Used same syntax as elsewhere with Mesa sources, verified result
against MSVC with godbolt.org.

fixes following warning with clang:
   warning: suggest braces around initialization of subobject

v2: empty braces -> braces around subobject (Caio, Kristian)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-04-26 12:01:22 -07:00
Jason Ekstrand
00d4e78ea9 nir/algebraic: Optimize integer cast-of-cast
These have been popping up more and more with the OpenCL work and other
bits causing extra conversions to/from 64-bit.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-26 04:26:08 -05:00
Dave Airlie
d946cbe9f5 nir: fix bit_size in lower indirect derefs.
This fixes a case where we are expecting 64-bit but generate
32-bit consts and validate gets angry.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-26 12:59:43 +10:00
Marek Olšák
c5f65bfe6c glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)
This fixes KHR-GL45.compute_shader.resources-max on radeonsi.

Fixes: 4e1e8f684b "glsl: remember which SSBOs are not read-only and pass it to gallium"

v2: use is_interface_array, protect again assertion failures in u_bit_consecutive

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-04-25 18:57:38 -04:00
Rob Clark
2f0b9d2249 freedreno/ir3: lower load_barycentric_at_offset
Calculates i,j at specified offset within a pixel.  A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
c4f423aa36 freedreno/ir3: lower load_barycentric_at_sample
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
4d08c1b595 compiler: rename SYSTEM_VALUE_VARYING_COORD
And add corresponding enums for different sorts of varying
interpolation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Caio Marcelo de Oliveira Filho
d5ac5d6e83 nir: Add option to lower tex to txl when shader don't support implicit LOD
We already add the LOD src, so go ahead and update the texop as well
when this option is set.

v2: Make it an option. (Rob Clark)

v3: Use a more concise name suggested by Jason.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-04-25 12:13:06 -07:00
Timothy Arceri
b155f74d7b nir: fix nir_remove_unused_varyings()
We were only setting the used mask for the first component of a
varying. Since the linking opts split vectors into scalars this
has mostly worked ok.

However this causes an issue where for example if we split a
struct on one side of the interface but not the other, then we
can possibly end up removing the first components on the side
that was split and then incorrectly remove the whole struct
on the other side of the varying.

With this change we simply mark all 4 components for each slot
used by a struct. We could possibly make this more fine gained
but that would require a more complex change.

This fixes a bug in Strange Brigade on RADV when tessellation
is enabled, all credit goes to Samuel Pitoiset for tracking down
the cause of the bug.

Fixes: f1eb5e6399 ("nir: add component level support to remove_unused_io_vars()")

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 16:37:36 +10:00
Marek Olšák
45ca7798dc glsl: handle interactions between EXT_gpu_shader4 and texture extensions
also, EXT_texture_buffer_object has to be enabled separately.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-24 20:45:15 -04:00