Commit graph

26778 commits

Author SHA1 Message Date
Samuel Pitoiset
bb4cdee9a4 nvc0: do not break the universe on GK110+
I removed that return 0 by mistake. Ooops.

Fixes: 6e23fd4 ("nvc0: allow to use compute support on GM200")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-14 21:57:21 +02:00
Samuel Pitoiset
6e23fd420d nvc0: allow to use compute support on GM200
This works like a charm but please not that NVF0_COMPUTE have to be set
because compute support is still not enabled by default on GK110+. This
will require more testing to make sure it won't break the 3D state.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-14 21:01:51 +02:00
Emil Velikov
bb949e262c gallium/swr: fold the almost identical Makefiles
Rather than having two almost identical Makefiles, with various VPATH
hacks just fold them, using COMMON_* variables and actually getting
things buildable/shipable.

v2: whitespace fixes, remove Makefile.sources-arch

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-04-14 16:30:57 +01:00
Marek Olšák
112291964e radeonsi: don't overwrite the scratch offset in shader prologs
Prologs only look at num_input_sgprs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák
ffe44d0283 radeonsi: fold num_user_sgprs where it is possible
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák
51c4034f9b radeonsi: fix SGPRS calculation once more
This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS,
which bumped NUM_USER_SGPRS and uncovered this bug on SI.

If this was fixed in LLVM, these workarounds wouldn't be needed.

LLVM would have to look at the calling convention to know how many SGPR
inputs are declared, and add VCC and the scratch wave offset (which is
enabled even if we spill SGPRs but not VGPRs, oh well).

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-14 17:00:14 +02:00
Marek Olšák
aaf5be4a29 radeonsi: disable hw ETC2 on Polaris
not supported by hw directly, but it's still fully supported by the driver

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-14 16:58:59 +02:00
Jose Fonseca
50ddf03ada scons: Add a "check" target to run all unit tests.
Except:
- u_cache_test -- too long
- translate_test -- unreliable (it's probably testing corner cases that
  translate module doesn't care about.)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-14 07:19:04 +01:00
Jose Fonseca
9ae0e8ee3c test/unit: Make translate_test invoke translate_create by default.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-14 07:19:04 +01:00
Jose Fonseca
f8a51034bd test/unit: Make pipe_barrier_test actually check correct bahavior.
So it can run unattended.

Also make it silent by default.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-14 07:19:04 +01:00
Michel Dänzer
171a570f38 clover: Fix build against LLVM SVN >= r266163
createInternalizePass now takes a callback instead of a StringSet.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-04-14 11:53:41 +09:00
Jason Ekstrand
b63a98b121 nir/dead_variables: Configurably work with any variable mode
The old version of the pass only worked on globals and locals and always
left inputs, outputs, uniforms, etc. alone.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-13 15:45:10 -07:00
George Kyriazis
f69a61b1aa gallium/swr: Make flat shading tris work.
- Incorporate flatshade flag into the shader generation
- Use provoking vertex (vc) in shader when flat shading.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-04-13 13:46:37 -05:00
Rob Clark
c53a12fedc Revert "freedreno/a4xx: better occlusion/sample counting"
This reverts commit 62fa868728.

dEQP-GLES3.functional.occlusion_query.* was unhappy about that change.
Still not really sure *what* the other slots in the sample results
buffer are.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:40 -04:00
Rob Clark
46e9bbc918 freedreno/a4xx: rasterizer_discard support
This one is slightly annoying, since trying to write RBRC from draw
would clobber values set in the tiling/gmem code.  We could do command-
stream patching for RBRC, as is done on a3xx.  Although since it seems
to be a rarely used feature, it is easier just to do RMW to set/clear
the bit.

Fixes dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_triangles
and related tests.

a3xx still needs the same feature, although there it probably makes more
sense to take advantage of the existing cmdstream patching which is
required for RBRC for other reasons.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:21 -04:00
Rob Clark
216225ce57 freedreno/ir3: fix array textures on a4xx
Seems like a4xx needs offset added to array index for all arrays,
whereas a3xx only for cubemap arrays.  Fixes a whole swath of dEQP fails
(roughly *sampler2darray*).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:14 -04:00
Rob Clark
7e93b26b5d freedreno: fix stream-out offset handling for lines/tris
We need to increment offset by # of vertices, not by # of prims.  Fixes
a bunch of dEQP fails involving prims other than points.  For example,
dEQP-GLES3.functional.transform_feedback.position.lines_separate

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:16:02 -04:00
Rob Clark
6ca6e80f61 freedreno: fix handling for stream-out offsets
If changed && append, we shouldn't be resetting the internal offset back
to zero.  This fixes issues w/ sequences like:

   glBeginTransformFeedback()
   glDraw()
   glPauseTransformFeedback()
   glDraw()
   glResumeTransformFeedback()
   glDraw()
   glEndTransformFeedback()

Fixes dEQP-GLES3.functional.transform_feedback.array.separate.points.lowp_vec3
and related tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:54 -04:00
Rob Clark
0a4b0fc315 freedreno: fix prims-emitted query
This should only count when TF is not paused.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:47 -04:00
Rob Clark
a7eb12d089 freedreno: fix max-line-width
dEQP noticed that we were advertising completely bogus values.  The
actual maximum is 127.0f.

*But* we have to use an artifically low maximum to work around a bug
in the dEQP test, which gets confused when the max line width is too
large and lines start going off-screen.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:31 -04:00
Rob Clark
6bf462a1ab freedreno: add flag to enable dEQP hacks
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:24 -04:00
Rob Clark
f68f6c0246 freedreno/ir3: hack to avoid getting stuck in a loop
There are still some edge cases which result in a neighbor-loop.  Which
needs to be fixed, but this hack at least makes deqp tests finish.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:13 -04:00
Rob Clark
dd70945e09 freedreno/ir3: use (ss) instead of (sy) for ldlv
Fixes a bunch of flat-varying fail on a4xx (where we need to use ldlv to
read the un-interpolated varying).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:15:05 -04:00
Rob Clark
b35ad6e701 freedreno/ir3: cleanup double cmps.s from frontend
Since we cannot mov into a predicate register, the frontend uses a
'cmps.s p0.x, cond, 0' as a stand-in for mov to p0.x.  It does this
since it has no way to know that the source cond instruction (ie.
for a kill, br, etc) will only be used to write the predicate reg.
Detect this, and re-write the instruction writing p0.x to skip the
original cmps.[sfu].  (It is done like this, rather than re-writing
the dest of the first cmps.[sfu] in case the first cmps.[sfu]
actually has other users.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-13 14:14:41 -04:00
Jose Fonseca
9586468c03 gallivm: Workaround LLVM PR 27332.
The credit for finding and isolating this bug goes to Vinson and Roland.

The buggy LLVM versions were found by doing

  opt -instcombine llvm-pr27332.ll > /dev/null

where llvm-pr27332.ll is the IR from
https://llvm.org/bugs/show_bug.cgi?id=27332#c3

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-04-13 16:42:55 +01:00
Marek Olšák
dd0a296895 gallium/radeon: move a comment to the correct place
trivial
2016-04-13 17:31:03 +02:00
Nicolai Hähnle
9e9a2bb44a radeonsi: gate PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT by LLVM version
Otherwise we incorrectly claim ARB_ssbo support even with older LLVM versions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94917
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-13 10:06:22 -05:00
Marek Olšák
04f15e491f gallium/radeon: add an env variable to force a level of aniso filtering
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-13 12:42:28 +02:00
Jose Fonseca
cc5d8b678e llvmpipe: Test rounding of x.5.
Leverage nearbyintif function, which should be available on all C99
implementations.

Trivial.
2016-04-13 11:13:05 +01:00
Roland Scheidegger
cb438d8b3e gallivm: use llvm.nearbyint instead of llvm.round.
We used to use sse roundps intrinsic directly, but switched to use the llvm
intrinsics for rounding with e4f01da15d.
However, llvm semantics follows standard math lib round function which is
specced to do roundNearestAwayFromZero but we really want roundNearestEven
(moreoever, using round generates atrocious code since the cpu can't do it
directly and it results in scalar calls to libm __roundf).
So, use llvm.nearbyint instead, which does exactly the right thing, and even
has the advantage of being available with llvm 3.3 too. (I've verified it
actually generates a roundps instruction with llvm 3.3.)

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-13 11:13:03 +01:00
Pierre Moreau
f525db6358 nv50/ra: isinf() is in namespace std since C++11.
This fixes a compile error while building Nouveau with C++11 enabled (and
glibc >= 2.23). This happens if SWR is enabled, as it forces C++11.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>

https://bugs.freedesktop.org/show_bug.cgi?id=94907
2016-04-13 07:41:13 +01:00
Jose Fonseca
fa46848e51 scons: Allow building with Address Sanitizer.
libasan is never linked to shared objects (which doesn't go well with
-z,defs).  It must either be linked to the main executable, or (more
practically for OpenGL drivers) be pre-loaded via LD_PRELOAD.

Otherwise works.

I didn't find anything with llvmpipe.  I suspect the fact that the
JIT compiled code isn't instrumented means there are lots of errors it
can't catch.

But for non-JIT drivers, the Address/Leak Sanitizers seem like a faster
alternative to Valgrind.

Usage (Ubuntu 15.10):

   scons asan=1 libgl-xlib
   export LD_LIBRARY_PATH=$PWD/build/linux-x86_64-debug/gallium/targets/libgl-xlib
   LD_PRELOAD=libasan.so.2 any-opengl-application

Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-04-13 06:54:32 +01:00
Jose Fonseca
46bfcd61f5 softpipe: Free tgsi.image elements on context destruction.
Courtesy of address sanitizer.

[airlied: free buffers as well]
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 13:21:37 +10:00
Edward O'Callaghan
5a3d928e2c softpipe: Enable ARB_framebuffer_no_attachments
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 13:21:37 +10:00
Eric Anholt
3b63301d9f vc4: Work around hardware limits on the number of verts in a single draw.
Fixes rendering failures in glmark2's refract and
bump:render-mode=high-poly demos, and partially in its terrain demo.
2016-04-12 19:10:51 -07:00
Thomas Hindoe Paaboel Andersen
6d6525a377 softpipe: avoid buffer overflow
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 11:51:35 +10:00
Thomas Hindoe Paaboel Andersen
b89708f95f tgsi: fix buffer overflow
Increase r to four channels as rgba is written to it
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-13 11:51:34 +10:00
Tim Rowley
b9294bc345 swr: handle pci cap requests
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-04-12 20:18:00 -05:00
Tim Rowley
b19d214b23 swr: support samplers in vertex shaders
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2016-04-12 20:18:00 -05:00
Nicolai Hähnle
10cfd7a604 radeonsi: enable GLSL 4.20 and therefore OpenGL 4.2
This is the last necessary bit for OpenGL 4.2 support. All driver-specific
functionality has already been implemented as part of extensions.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 20:13:49 -05:00
Iurie Salomov
047e3264f6 va: check null context in vlVaDestroyContext
Signed-off-by: Iurie Salomov <iurcic@gmail.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
2016-04-13 00:52:53 +01:00
Marek Olšák
8e70a58af3 radeonsi: fix a critical SI hang since PIPELINESTAT_START/STOP was added
For some reason unknown to me, SI hangs if the event is written after
CONTEXT_CONTROL.
2016-04-13 01:05:15 +02:00
Nicolai Hähnle
a191e6b719 radeonsi: fix bounds check in si_create_vertex_elements
This was triggered by
dEQP-GLES3.functional.vertex_array_objects.all_attributes

Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-04-12 16:32:46 -05:00
Nicolai Hähnle
bfd11c5996 radeonsi: enable shader buffer pipe caps
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:48 -05:00
Nicolai Hähnle
4e81843b13 radeonsi: add shader buffer support to TGSI_OPCODE_RESQ
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:45 -05:00
Nicolai Hähnle
01109282ce radeonsi: add shader buffer support to TGSI_OPCODE_STORE
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:43 -05:00
Nicolai Hähnle
745014c502 radeonsi: add shader buffer support to TGSI_OPCODE_LOAD
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:41 -05:00
Nicolai Hähnle
68bc25c931 radeonsi: add shader buffer support to TGSI_OPCODE_ATOM*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:38 -05:00
Nicolai Hähnle
c6f5d000db radeonsi: add offset parameter to buffer_append_args
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:35 -05:00
Nicolai Hähnle
c565466eea radeonsi: adjust buffer_append_args to take a 128 bit resource
Move the buffer resource extraction code out into its own function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-04-12 16:30:32 -05:00