Nowadays -1 for slots means that the semantic is not present, so
we need to store it in a signed variables, otherwise <0 comparisons
are pointless. Fixes
http://bugzilla.eng.vmware.com/show_bug.cgi?id=67811 (at least
with softpipe, edgeflags don't work wit llvmpipe)
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
And as a side effect fix a crash in the following piglit test:
general/attribs GL3
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "9.2 and 9.1" mesa-stable@lists.freedesktop.org
Fixes spurious 'Assertion `num_sgprs <= 104' failed.' with shaders using
all 104 SGPRs.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
If gs is null, then freeing state->shader.tokens would result in a null
dereference.
Fixes "Dereference after null check" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes "Uninitialized pointer read" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Some videos specify mb_adaptive_frame_field_flag instead of
field_pic_flag. This implies that the pic height needs to be halved, and
this field needs to be passed to the VP engine.
Cc: "9.2" mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The loop was iterating over all the fs inputs and setting them
to perspective interpolation, then after the loop we were
creating extra output slots with the correct interpolation. Instead
of injecting bogus extra outputs, just set the interpolation
on front face and prim id correctly when doing the initial scan
of fs inputs.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Draw module can decompose primitives into wireframe models, which
is a fancy word for 'lines', unfortunately that decomposition means
that we weren't able to preserve the original front-face info which
could be derived from the original primitives (lines don't have a
'face'). To fix it allow draw module to inject a fake face semantic
into outputs from which the backends can figure out the original
frontfacing info of the primitives.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The spec says that front-face is true if the value is >0 and false
if it's <0. To make sure that we follow the spec, lets just
subtract 0.5 from our value (llvmpipe did 1 for frontface and 0
otherwise), which will get us a positive num for frontface and
negative for backface.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
MP_TEMP_SIZE must be aligned to 0x8000, while TEMP_SIZE on NVE4_3D
must be aligned to 0x20000, so perform both alignments to be sure
we allocate enough space (actually the bo will most likely use 128
KiB pages and not aligning to that would be a waste anyway).
Cc: "9.2" mesa-stable@lists.freedesktop.org
We should probably be using map()/unmap() when accessing resource
data, but this is a little better.
v2: assert that the resource is not a display target, per Jose.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This was never a problem since the Mesa state tracker always gives
us a user-space constant buffer with buffer_offset=0. But if another
state tracker ever gave us a "HW" constant buffer with non-zero
buffer_offset we'd mis-render.
Also, use the correct buffer size. And move an assertion to the
top of the function.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
To have non-static buffers in local memory, it is necessary to pass them
as arguments to the kernel.
For r600, the correct lds size must be set to the SQ_LDS_ALLOC register.
The correct size is the clover size plus the size reported by the
compiler.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Looks like a thinko, "Hey, constant buffers can be at most 64 KiB
in size, offset can't be larger." But it can, of course.
I think piglit lacks a test for UBO and BindBufferRange that
tests if it actually works.
Test infs, zeros and nans with our arith functions to assure
correct/defined behavior with those values.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Adds H.264 and MPEG2 codec support via VP2, using firmware from the
blob. Acceleration is supported at the bitstream level for H.264 and
IDCT level for MPEG2.
Known issues:
- H.264 interlaced doesn't render properly
- H.264 shows very occasional artifacts on a small fraction of videos
- MPEG2 + VDPAU shows frequent but small artifacts, which aren't there
when using XvMC on the same videos
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Scheduler/register allocator in r600-sb was developed and optimized
on evergreen (VLIW-5) hardware, so currently it's not optimal for
VLIW-4 chips.
This patch should improve performance on cayman gpus due to better alu
packing, but also it tends to increase register usage, so overall positive
effect on performance has to be proven by real benchmarks yet.
Some results with bfgminer kernel on cayman:
source bytecode: 60 gprs, 3905 alu groups,
sbcl before the patch: 45 gprs, 4088 alu groups,
sbcl with this patch: 55 gprs, 3474 alu groups.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Ex-scalar instructions that became multislot on cayman do replicate result
to all channels - handle them similar to DOT4.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Mark values that are members of the 'same register' constraint as
preallocated in ra_init pass, this will prevent incorrect
reallocation in scheduler in some cases.
Should fix https://bugs.freedesktop.org/show_bug.cgi?id=66713
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Actually PS doesn't make sense for cayman and isn't even mentioned in
cayman docs, but llvm backend currently uses it in bytecode and, assuming
that hw seems to be mostly ok with it, this will allow sb to parse such
source bytecode correctly.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.
v2: prettify a bit, use separate function for packing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>