This was a very serious bug. We were always doing the viewport
transformations on the first output of the vertex shader. That means
that every application that was storing position in anything but
OUT[0] was outputing untransformed vertices and had broken output
for whatever it was storing at OUT[0]. Correctly take into
consideration where the vertex position is actually stored.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
There can be more stream output decls than shader outputs because
individual components from them can be split and distributed
among different so buffers.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before, we'd incorrectly generate an error if we we tried to
replace a non-4x4 block near the edge of a NPOT compressed texture.
For example, if the dest image was 15 texels wide and xoffset=12
and width=3 we'd incorrectly generate GL_INVALID_OPERATION.
Verified with new tests added to piglit s3tc-errors test.
Note: This is a candidate for the stable branches.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
That is just not supported by the hardware.
v2: fix compare
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
patch fixes a crash that happens if glTexSubImage2D is called with a
negative xoffset.
NOTE: This is a candidate for stable branches.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
After more thought/discussion, it seems it is better to handle this sort
of stuff in the state tracker.
So this reverts commit 12096f334b, except the
variant->key -> key shorthands.
This is a simple shader compiler that performs almost zero optimizations. The
generated code is usually much larger comparing to that generated by i965.
The generated code also requires many more registers.
Function-wise, it lacks register spilling and does not support most TGSI
indirections. Other than those, it works alright.
Courtesy of clang:
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1483:10: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1487:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1491:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
Only 13 affected programs in shader-db, but they were all helped.
total instructions in shared programs: 368877 -> 368851 (-0.01%)
instructions in affected programs: 1576 -> 1550 (-1.65%)
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Three-source instructions have a vertical stride overloaded to 4, which
prevents directly using vec4 uniforms as arguments. Instead we need to
insert a MOV instruction to do the replication for the three-source
instruction.
With this in place, we can use three-source instructions in the vertex
shader. While some thought needs to go into deciding whether its better
to use a three-source instruction rather than a sequence of equivalent
instructions (when one or more sources are uniforms or immediates), this
will allow us to skip a lot of ugly lowering code and use the BFE and
BFI2 instructions directly.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
This move the tracing timeout and printing into winsys and add
an debug environement variable for it (R600_DEBUG=trace_cs).
Lot of file touched because of winsys API changes.
v2: Do not write lockup file if ib uniq id does not match last one
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
There was a lot of code in evergreen_compute_internal.c that was not
being used at all and most of it was duplicating code from other parts
of the driver.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2:
- Fix usage of set_constant_buffer()
- Fix typo in comment
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
We don't support pre-2.6 kernels anyway - the install docs say 2.6.28
for DRI - and apparently this confuses ld.so's sorting when multiple
libGLs are installed. Just remove it.
Note: this is a candidate for the stable branches.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
New textures or vertex buffers don't always require patching and
re-emitting the shaders. So do a better job of figuring out when we
actually have to patch the shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are
due to FBO-rendering size predictions). We currently expose
GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm
about to send a patch for removing that silly extension in that case.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>