Courtesy of clang static analyzer.
I was hunting for potential sources of memory corruption using Mesa with
a GL trace, and happened to find this (unrelated) issue.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When switching between framebuffers with and without TS, the TS state
needs to be flushed to the command stream even if the derived state
isn't changed.
Fixes: 4ee7c2c284 ("etnaviv: enable TS, but disable autodisable")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fixes: c9b2cb7897 ("vc5: add missing files to the tarball")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: 954a704da3 ("broadcom/vc5: Port the RCL setup to V3D4.1.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the meson.build, meson_options.txt, and a few scripts that are
used exclusively by the meson build.
v2: - Remove accidentally included changes needed to test make dist with
LLVM > 3.9
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For some reason llvm5 is picky about accepting a void * type in the
case of building an argument list.
Since we don't care about the type (we ignore the argument for now),
pick another pointer type
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Early Rasterization is an optimization for small triangles.
Scientific workloads often contain very small triangles that has non-zero
area and cannot be trivially rejected as falling between pixel centers,
but does not cover any pixel center. Those triangles can be initially
rasterized as early as in binner and rejected if they cover no pixels The
optimization can be disabled in compilation using KNOB_ENABLE_EARLY_RAST
option in knobs.h
The Early Rast is disabled by default.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Flip the switch(es) to enable simd16 vertex shaders:
USE_SIMD16_SHADERS and USE_SIMD16_VS
Both have to be enabled at the same time. Currently, just setting
USE_SIMD16_SHADERS does not work correctly.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Supporting simd16 vertex shaders involves packing the output of the
fetch shader appropriately, especially the vertexID buffers that have to
be formatted in one simd16 register, needed by the VS.
As part of this support, we needed to remove the 2nd JitManager, since it
was not accounting for vector width correctly.
USE_SIMD16_SHADERS is also split into two defines. The additional
one (USE_SIMD16_VS) controls the width of the vertex shader (VS), while
the original one (USE_SIMD16_SHADERS) controls overall front end width.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add a new define (USE_SIMD16_VS), to denote calling a 16-wide vertex shader.
This is needed because the mesa driver can do 16-wide shaders, but rasty
cannot yet, so we need to distinguish.
Create a new VertexID entry (VertexID16) for the USE_SIMD16_VS case, since
we need to format the vertex id in a way that is digestible by the 16-wide VS
Disabled for now. To be enabled in a future checkin when driver work
is complete.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add name argument to x86 autogenerated macros.
Add useful variable names for DCL_inputVec implementation.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
- Move debug .ll files to JIT_CACHE_DIR
- Don't link against jitter SRGBLut table, add global data to shader that needs it.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Adds ability to step into jitted llvm IR in Visual Studio.
- Updated llvm type generation script to also generate corresponding debug types.
- New module pass inserts debug metadata into the IR for each function
Disabled by default.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Part 2 of 2 (part 1 is autoconf changes, part 2 is C++ changes)
When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen. Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.
This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.
Based on an initial implementation by Tim Rowley.
v2: Refactor repetitive preprocessor checks to reduce code duplication
v3: Formatting changes per Bruce C. Also delay screen creation until end
to avoid leaks when failure conditions are hit.
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Part 1 of 2 (part 1 is autoconf changes, part 2 is C++ changes)
When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen. Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.
This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.
Based on an initial implementation by Tim Rowley.
v2: Fix comment placement pointed out by Bruce C.
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
According to the ARB_multisample num_samples is a non-negative integer.
Consequently define it as such, fail in glx/choose_visual if a negative
number is given.
v2: split patch into gallium and mesa part
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
lp_build_interleave2_half was not doing the right thing for avx512-style
16-wide loads.
This path is hit in the swr driver with a 16-wide vertex shader. It is
called from lp_build_transpose_aos, when doing texel fetches and the
fetched data needs to be transposed to one component per output register.
Special-case the post-load swizzle operations for avx512 16x32 (16-wide
32-bit values) so that we move the xyzw components correctly to the outputs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Currently a couple of gallium targets race with xmlpool_options.h being
generated, don't do that.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Only one piglit test fails,
sso-vs-gs-fs-array-interleave
There are 3 tests using ssbo without checking sizes failing also
but those are test bugs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
if no destination:
a) convert _RET instructions to non _RET variants if no dst
b) set src0 to undefined if it's a READ, this should get DCE then.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The normal ssa renumbering isn't sufficient for LDS queue access,
this uses two stacks, one for the lds queue, and one for the
lds r/w ordering.
The LDS oq values are incremented in their use in a linear
fashion.
The LDS rw values are incremented in their definitions and used
in the next lds operation to ensure reordering doesn't occur.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So LDS ops have to be SLOT_X,
and LDS OQ reads have read port restrictions so we try
and force those into only having one per slot and avoiding
bank swizzles.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This tries to avoid an lds queue read getting scheduled separately
from an lds ret read, the non-sb code uses the same style of hammer,
this isn't foolproof.
We can do better, but it's a bit tricky, as you have to scan ahead
and either schedule more lds oq moves and more lds reads and that
could lead to you running out of space anyways.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for tracking the lds oq read/writes
so can avoid scheduling other things in between.
This patch just adds the tracking and assert to show
problems.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP
in the same basic block/clause. This makes sure once we've issues
and MOV we don't add another block until we balance it with an
LDS read.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to convert these to the hw special registers.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles parsing the LDS ops and queue accessess.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes bad interactions with the LDS special values.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For LDS read/write ordering we use the LDS_RW value, reads
will wait on previous writes.
For LDS read/read from LDS queue ordering we use the LDS_OQ
values, we define two for now, though initially we'll just
support OQA.
Also add the check for the lds oq values
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>