Commit graph

33423 commits

Author SHA1 Message Date
Timothy Arceri
dabff1cf7a radeonsi/nir: add primitive id to inputs scan
Fixes the following piglit tests:

arb_tessellation_shader/fs-primitiveid-instanced
glsl-1.50/primitive-id-no-gs
glsl-1.50/primitive-id-no-gs-first-vertex
glsl-1.50/primitive-id-no-gs-instanced
glsl-1.50/primitive-id-no-gs-strip
glsl-1.50/primitive-id-no-gs-strip-first-vertex

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-23 09:11:21 +11:00
Timothy Arceri
c6a0ce7e54 radeonsi/nir: add nir_intrinsic_load_sample_mask_in to ir scan
Fixes a bunch of ARB_sample_shading piglit tests.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-23 09:11:21 +11:00
Jose Fonseca
dcbb224c68 svga: Prevent use after free.
Courtesy of clang static analyzer.

I was hunting for potential sources of memory corruption using Mesa with
a GL trace, and happened to find this (unrelated) issue.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2018-01-22 21:12:41 +00:00
Lucas Stach
29a0ea699a etnaviv: dirty TS state when framebuffer has changed
When switching between framebuffers with and without TS, the TS state
needs to be flushed to the command stream even if the derived state
isn't changed.

Fixes: 4ee7c2c284 ("etnaviv: enable TS, but disable autodisable")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2018-01-21 12:58:02 +01:00
Vinson Lee
e03c880971 broadcom/vc5: Fix source file name.
Fixes: c9b2cb7897 ("vc5: add missing files to the tarball")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-01-21 11:13:16 +08:00
Vinson Lee
14abbe604b broadcom/vc5: Add missing include paths.
Fixes: 954a704da3 ("broadcom/vc5: Port the RCL setup to V3D4.1.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-01-21 11:05:33 +08:00
Dylan Baker
436ed65d38 autotools: include meson build files in tarball
This adds the meson.build, meson_options.txt, and a few scripts that are
used exclusively by the meson build.

v2: - Remove accidentally included changes needed to test make dist with
      LLVM > 3.9

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-19 16:30:51 -08:00
George Kyriazis
9d80ed0862 swr/rast: Fix llvm5 behavior
For some reason llvm5 is picky about accepting a void * type in the
case of building an argument list.

Since we don't care about the type (we ignore the argument for now),
pick another pointer type

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 17:08:30 -06:00
George Kyriazis
d335b32baf swr/rast: Enable early rasterization
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:43 -06:00
George Kyriazis
bacfbe5a32 swr/rast: Implement Early Rasterization optimization
Early Rasterization is an optimization for small triangles.

Scientific workloads often contain very small triangles that has non-zero
area and cannot be trivially rejected as falling between pixel centers,
but does not cover any pixel center. Those triangles can be initially
rasterized as early as in binner and rejected if they cover no pixels The
optimization can be disabled in compilation using KNOB_ENABLE_EARLY_RAST
option in knobs.h

The Early Rast is disabled by default.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:43 -06:00
George Kyriazis
be3cd7add1 swr/rast: Enable simd16 vertex shaders
Flip the switch(es) to enable simd16 vertex shaders:

USE_SIMD16_SHADERS and USE_SIMD16_VS

Both have to be enabled at the same time.  Currently, just setting
USE_SIMD16_SHADERS does not work correctly.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:42 -06:00
George Kyriazis
8c83d2d371 swr: Support simd16 vertex shaders
Supporting simd16 vertex shaders involves packing the output of the
fetch shader appropriately, especially the vertexID buffers that have to
be formatted in one simd16 register, needed by the VS.

As part of this support, we needed to remove the 2nd JitManager, since it
was not accounting for vector width correctly.

USE_SIMD16_SHADERS is also split into two defines.  The additional
one (USE_SIMD16_VS) controls the width of the vertex shader (VS), while
the original one (USE_SIMD16_SHADERS) controls overall front end width.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:42 -06:00
George Kyriazis
1874d95a8e swr/rast: changed jit debug magic number
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:41 -06:00
George Kyriazis
c719f62621 swr/rast: Added ICLAMP builder function
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:41 -06:00
George Kyriazis
f192502001 swr/rast: Jit debug work
Properly validate DLL matches OBJ for jitted function

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:41 -06:00
George Kyriazis
3c405e32b0 swr/rast: silence generated file warnings
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:40 -06:00
George Kyriazis
fe107e3c17 swr/rast: jit shader lib debug work
Create shader_lib during build, link with shaders at DLL generation time

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:40 -06:00
George Kyriazis
0cd9ad98a3 swr/rast: AVX-512 changes to enable 16-wide VS
Add a new define (USE_SIMD16_VS), to denote calling a 16-wide vertex shader.
This is needed because the mesa driver can do 16-wide shaders, but rasty
cannot yet, so we need to distinguish.

Create a new VertexID entry (VertexID16) for the USE_SIMD16_VS case, since
we need to format the vertex id in a way that is digestible by the 16-wide VS

Disabled for now.  To be enabled in a future checkin when driver work
is complete.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:40 -06:00
George Kyriazis
3140e714d2 swr/rast: x86 autogenerated macro work
Add name argument to x86 autogenerated macros.
Add useful variable names for DCL_inputVec implementation.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:39 -06:00
George Kyriazis
4cd6e2ebfd swr/rast: Shorten some filenames
in shader and fetch dump files

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:39 -06:00
George Kyriazis
3936044d07 swr/rast: work supporting optimizations in Debug builds.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:38 -06:00
George Kyriazis
c4a42f5add swr/rast: Add debugging type support for function types.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:38 -06:00
George Kyriazis
e9e7f3ce0a swr/rast: Shader debugging work
- Move debug .ll files to JIT_CACHE_DIR
- Don't link against jitter SRGBLut table, add global data to shader that needs it.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:34 -06:00
George Kyriazis
34bbcb5052 swr/rast: Debug Symbols work
Added support for Fetch / Sample / LD functions
Added DLL link to JitCache implementation

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:30 -06:00
George Kyriazis
01ab218bbc swr/rast: Initial work for debugging support.
Adds ability to step into jitted llvm IR in Visual Studio.
- Updated llvm type generation script to also generate corresponding debug types.
- New module pass inserts debug metadata into the IR for each function

Disabled by default.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:52:22 -06:00
George Kyriazis
4660e13152 swr/rast: Add private state parameter in fetcher
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:48:41 -06:00
George Kyriazis
079ae3c48d swr/rast: Added missing define for Linux/gcc
+ ZeroMemory() macro definition for non win32-compilation in common/os.h

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:48:41 -06:00
George Kyriazis
70f8eac603 swr/rast: Fix one more invalid object format for windows.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:48:41 -06:00
Chuck Atkins
a4be2bcee2 swr: allow a single swr architecture to be builtin
Part 2 of 2 (part 1 is autoconf changes, part 2 is C++ changes)

When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen.  Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.

This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.

Based on an initial implementation by Tim Rowley.

v2: Refactor repetitive preprocessor checks to reduce code duplication
v3: Formatting changes per Bruce C. Also delay screen creation until end
    to avoid leaks when failure conditions are hit.

Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 13:16:00 -06:00
Chuck Atkins
2ed8b6f827 swr: (autoconf) allow a single swr architecture to be builtin
Part 1 of 2 (part 1 is autoconf changes, part 2 is C++ changes)

When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen.  Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.

This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.

Based on an initial implementation by Tim Rowley.

v2: Fix comment placement pointed out by Bruce C.

Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 13:15:54 -06:00
Greg V
8ff8c82630 swr: fix clang 5 null cast warning
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2018-01-19 16:15:56 +00:00
Gert Wollny
d0e37599ab gallium: Make (num_)samples an unsigned int
According to the ARB_multisample num_samples is a non-negative integer.
Consequently define it as such, fail in glx/choose_visual if a negative
number is given.

v2: split patch into gallium and mesa part

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-19 15:45:57 +00:00
Grazvydas Ignotas
e6abc613e2 st/vdpau: release held lock in error path
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2018-01-19 13:30:22 +02:00
George Kyriazis
f76ca91ae0 gallivm: support avx512 (16x32) in interleave2_half
lp_build_interleave2_half was not doing the right thing for avx512-style
16-wide loads.

This path is hit in the swr driver with a 16-wide vertex shader. It is
called from lp_build_transpose_aos, when doing texel fetches and the
fetched data needs to be transposed to one component per output register.

Special-case the post-load swizzle operations for avx512 16x32 (16-wide
32-bit values) so that we move the xyzw components correctly to the outputs.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-01-18 17:07:06 -06:00
Dylan Baker
26bde1e354 meson: ensure that xmlpool_options.h is generated for targets that need it
Currently a couple of gallium targets race with xmlpool_options.h being
generated, don't do that.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-01-18 13:31:47 -08:00
Dave Airlie
5758a8c402 r600: enable ARB_enhanced_layouts
Only one piglit test fails,
sso-vs-gs-fs-array-interleave

There are 3 tests using ssbo without checking sizes failing also
but those are test bugs.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-19 05:33:44 +10:00
Emil Velikov
c9b2cb7897 vc5: add missing files to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2018-01-18 11:36:36 +00:00
Dave Airlie
44a27cdcec r600/sb: add lds related peepholes.
if no destination:
a) convert _RET instructions to non _RET variants if no dst
b) set src0 to undefined if it's a READ, this should get DCE then.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:38:17 +00:00
Dave Airlie
3bb2b2cc45 r600/sb: use different stacks for tracking lds and queue usage.
The normal ssa renumbering isn't sufficient for LDS queue access,
this uses two stacks, one for the lds queue, and one for the
lds r/w ordering.

The LDS oq values are incremented in their use in a linear
fashion.
The LDS rw values are incremented in their definitions and used
in the next lds operation to ensure reordering doesn't occur.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:38:09 +00:00
Dave Airlie
8cfec333c0 r600/sb: schedule LDS ops in appropriate places.
So LDS ops have to be SLOT_X,
and LDS OQ reads have read port restrictions so we try
and force those into only having one per slot and avoiding
bank swizzles.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:38:05 +00:00
Dave Airlie
71a50de4fc r600/sb: hit the scheduler with a big hammer to avoid lds splits.
This tries to avoid an lds queue read getting scheduled separately
from an lds ret read, the non-sb code uses the same style of hammer,
this isn't foolproof.

We can do better, but it's a bit tricky, as you have to scan ahead
and either schedule more lds oq moves and more lds reads and that
could lead to you running out of space anyways.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:56 +00:00
Dave Airlie
46549bd6b6 r600/sb: adding lds oq tracking to the scheduler
This adds support for tracking the lds oq read/writes
so can avoid scheduling other things in between.

This patch just adds the tracking and assert to show
problems.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:52 +00:00
Dave Airlie
5002dd4052 r600/sb: add gcm support to avoid clause between lds read/queue read
You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP
in the same basic block/clause. This makes sure once we've issues
and MOV we don't add another block until we balance it with an
LDS read.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:42 +00:00
Dave Airlie
046cf68cad r600/sb: handle lds special dest registers.
This adds lds to the geom emit handling

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:39 +00:00
Dave Airlie
d72590032f r600/sb: handle LDS operations in folding.
Don't try and fold LDS using expressions.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:35 +00:00
Dave Airlie
c314b0a27a r600/sb: add finalising for lds output queue special values.
We need to convert these to the hw special registers.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:27 +00:00
Dave Airlie
9f3a1e9b0c r600/sb: add initial support for parsing lds operations.
This handles parsing the LDS ops and queue accessess.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:13 +00:00
Dave Airlie
795512b235 r600/sb: disable if conversion for hs
This fixes bad interactions with the LDS special values.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:37:01 +00:00
Dave Airlie
1ca2eb3bf3 r600/sb: lds ops have no dst register.
Although these are op3s they don't have a dst reg.

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:36:52 +00:00
Dave Airlie
09c1c13c44 r600/sb: introduce special register values for lds support.
For LDS read/write ordering we use the LDS_RW value, reads
will wait on previous writes.
For LDS read/read from LDS queue ordering we use the LDS_OQ
values, we define two for now, though initially we'll just
support OQA.

Also add the check for the lds oq values

Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-18 03:36:47 +00:00