The algorithm used is different from both the naive suggestion from the
GLSL spec and the one used in GLSL IR today. Unfortunately, the GLSL IR
implementation that we have today doesn't handle denormals (for those that
care) or the case where the float source is +-inf.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's not really doing enough anymore to justify a helper function.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reveiewed-by: Kristian Høgsberg <krh@bitplanet.net>
There are several passes where we need to specify some set of variable
modes that the pass needs top operate on. This lets us easily do that.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
- Incorporate flatshade flag into the shader generation
- Use provoking vertex (vc) in shader when flat shading.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This reverts commit 62fa868728.
dEQP-GLES3.functional.occlusion_query.* was unhappy about that change.
Still not really sure *what* the other slots in the sample results
buffer are.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This one is slightly annoying, since trying to write RBRC from draw
would clobber values set in the tiling/gmem code. We could do command-
stream patching for RBRC, as is done on a3xx. Although since it seems
to be a rarely used feature, it is easier just to do RMW to set/clear
the bit.
Fixes dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_triangles
and related tests.
a3xx still needs the same feature, although there it probably makes more
sense to take advantage of the existing cmdstream patching which is
required for RBRC for other reasons.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems like a4xx needs offset added to array index for all arrays,
whereas a3xx only for cubemap arrays. Fixes a whole swath of dEQP fails
(roughly *sampler2darray*).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to increment offset by # of vertices, not by # of prims. Fixes
a bunch of dEQP fails involving prims other than points. For example,
dEQP-GLES3.functional.transform_feedback.position.lines_separate
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If changed && append, we shouldn't be resetting the internal offset back
to zero. This fixes issues w/ sequences like:
glBeginTransformFeedback()
glDraw()
glPauseTransformFeedback()
glDraw()
glResumeTransformFeedback()
glDraw()
glEndTransformFeedback()
Fixes dEQP-GLES3.functional.transform_feedback.array.separate.points.lowp_vec3
and related tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
dEQP noticed that we were advertising completely bogus values. The
actual maximum is 127.0f.
*But* we have to use an artifically low maximum to work around a bug
in the dEQP test, which gets confused when the max line width is too
large and lines start going off-screen.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
There are still some edge cases which result in a neighbor-loop. Which
needs to be fixed, but this hack at least makes deqp tests finish.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes a bunch of flat-varying fail on a4xx (where we need to use ldlv to
read the un-interpolated varying).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since we cannot mov into a predicate register, the frontend uses a
'cmps.s p0.x, cond, 0' as a stand-in for mov to p0.x. It does this
since it has no way to know that the source cond instruction (ie.
for a kill, br, etc) will only be used to write the predicate reg.
Detect this, and re-write the instruction writing p0.x to skip the
original cmps.[sfu]. (It is done like this, rather than re-writing
the dest of the first cmps.[sfu] in case the first cmps.[sfu]
actually has other users.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
vertex_input_slots would be an appropriate name for an integer, but not
a bool.
Also remove a cond ? true : false from a count_attribute_slots() call
site, noticed during the rename.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The credit for finding and isolating this bug goes to Vinson and Roland.
The buggy LLVM versions were found by doing
opt -instcombine llvm-pr27332.ll > /dev/null
where llvm-pr27332.ll is the IR from
https://llvm.org/bugs/show_bug.cgi?id=27332#c3
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Otherwise we incorrectly claim ARB_ssbo support even with older LLVM versions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94917
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We used to use sse roundps intrinsic directly, but switched to use the llvm
intrinsics for rounding with e4f01da15d.
However, llvm semantics follows standard math lib round function which is
specced to do roundNearestAwayFromZero but we really want roundNearestEven
(moreoever, using round generates atrocious code since the cpu can't do it
directly and it results in scalar calls to libm __roundf).
So, use llvm.nearbyint instead, which does exactly the right thing, and even
has the advantage of being available with llvm 3.3 too. (I've verified it
actually generates a roundps instruction with llvm 3.3.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This fixes a compile error while building Nouveau with C++11 enabled (and
glibc >= 2.23). This happens if SWR is enabled, as it forces C++11.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
https://bugs.freedesktop.org/show_bug.cgi?id=94907
libasan is never linked to shared objects (which doesn't go well with
-z,defs). It must either be linked to the main executable, or (more
practically for OpenGL drivers) be pre-loaded via LD_PRELOAD.
Otherwise works.
I didn't find anything with llvmpipe. I suspect the fact that the
JIT compiled code isn't instrumented means there are lots of errors it
can't catch.
But for non-JIT drivers, the Address/Leak Sanitizers seem like a faster
alternative to Valgrind.
Usage (Ubuntu 15.10):
scons asan=1 libgl-xlib
export LD_LIBRARY_PATH=$PWD/build/linux-x86_64-debug/gallium/targets/libgl-xlib
LD_PRELOAD=libasan.so.2 any-opengl-application
Acked-by: Roland Scheidegger <sroland@vmware.com>
This is supposed to be INVALID_OPERATION in ES. We already did this
for the fv/iv variants, but not Iiv/Iuv, which are new in ES 3.2 (or
extensions).
Fixes:
ES31-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Courtesy of address sanitizer.
[airlied: free buffers as well]
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Increase r to four channels as rgba is written to it
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the last necessary bit for OpenGL 4.2 support. All driver-specific
functionality has already been implemented as part of extensions.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This is kind of a hack. We currently track precise requirements
by decorating ir_variables. Propagating or grafting the RHS of an
assignment to a precise value into some other expression tree can
lose those decorations.
In the long run, it might be better to replace these ir_variable
decorations with an "exact" decoration on ir_expression nodes,
similar to what NIR does.
In the short run, this is probably good enough. It preserves
enough information for glsl_to_nir to generate "exact" decorations,
and NIR will then handle optimizing these expressions reasonably.
Fixes ES31-CTS.gpu_shader5.precise_qualifier.
v2: Drop invariant handling, as it shouldn't be necessary (caught
by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Copy and paste error in commit eafeb8db66:
i965/tiled_memcpy: Unroll bytes==64 case.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ARB_program_interface_query requires that we add struct fields
recursively down to basic types.
Fixes 52 struct test cases in dEQP-GLES31.functional.program_interface_query.*
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change here, but this now lets us recurse throught structs
in add_shader_variable().
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us pass in the absolution location of a variable instead of
computing it in add_shader_variable() based on variable location and
bias. This is in preparation for recursing into struct variables.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This consolidates the combination of create_shader_variable() and
add_program_resource() into a new helper function. No functional
difference, but we'll expand add_shader_variable() in the next few
commits.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing code uses SSSE3, and because it isn't compiled in a
separate file compiled with that, it is usually not used (that, of
course, could be fixed...), whereas SSE2 is always present with 64-bit
builds. This should be pretty much as fast as the pshufb version,
albeit those code paths aren't really used on chips without llc in any
case.
v2: fix andnot argument order, add comments
v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments
v4: [mattst88] Rebase
Reviewed-by: Matt Turner <mattst88@gmail.com>
Replaces four byte loads and four byte stores with a load, bswap,
rotate, store; or a movbe, rotate, store.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>