This reverts commit 755c11dc5e.
We agreed that this is band-aid that's not very useful and
the proper solution is to rewrite the rasterization algo
so that it operates on 64 bit values.
Signed-off-by: Zack Rusin <zackr@vmware.com>
When subdiving a triangle we're using a temporary array to store
the new coordinates for the subdivided triangles. Unfortunately
the array used for that was not aligned properly causing
random crashes in the llvm jit code which was trying to load
vectors from it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Unfortunately d3d10 requires a lot higher precision (e.g.
wgf11clipping tests for it). The smallest number of precision
bits with which it passes is 8. That means that we need to
decrease the maximum length of an edge that we can handle without
subdivision by 4 bits. Abstracted the code a bit to make it easier
to change once to switch to 64bit rasterization.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Share the winsys between different fd's if they point to the same device.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Waiting for an empty queue is nonsense and can lead to deadlocks if we have
multiple waiters or another thread that continuously sends down new commands.
Just post the cs to the queue and immediately wait for it to finish.
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Kill the thread only after we checked that it's not used any more, not before.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Allocate a CMASK on demand and use it to fast clear single-sample
colorbuffers. Both FBOs and window system colorbuffers are fast
cleared. Expand as needed when colorbuffers are mapped or displayed
on screen.
v2: cosmetics, move transfer expansion into dma_blit
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
r600g needs explicit flushing before DRI2 buffers are presented on the screen.
v2: add (stub) implementations for all drivers, fix frontbuffer flushing
v3: fix galahad
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
We must take rounding in consideration when re-scaling to narrow
normalized channels, such as 2-bit normalized alpha.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
xlib_sw_winsys.h:5:22: fatal error: X11/Xlib.h: No such file or directory
The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).
It appears the intent was to use X11_INCLUDES defined in configure.ac.
The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Technically without seamless filtering enabled GL allows any wrap mode, which
made sense when supporting true borders (can get seamless effect with border
and CLAMP_TO_BORDER), but gallium doesn't support borders and d3d9 requires
wrap modes to be ignored and it's a pain to fix up the sampler state (as it
makes it texture dependent). It is difficult to imagine a situation where an
app really wants another behavior so just cheat here. (It looks like some
graphics hw (intel) actually requires this too hence it should be safe.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This removes a lot of code, but not everything, as util_blit_pixels_tex
is still useful when one needs to override pipe_sampler_view::swizzle_?.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
By calling util_map_texcoords2d_onto_cubemap.
A new parameter for util_blit_pixels_tex is necessary, as
pipe_sampler_view::first_layer is always supposed to point to the first
face when sampling from cubemaps.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Only compile-tested but it seems straightforward.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a
correct implementation for nearest filtering, and it's way better than
using repeat wrap for instance for linear filtering (though obviously this
doesn't actually do seamless filtering).
v2: fix s/t wrap not r/s...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fix the return type and allow src and dst types for comparison
to be separate, this at least fixes the two test cases I've written.
v2: drop the u32->s32 change
Acked-by: Christoph Bumiller <christoph.bumiller@speed.at>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the old contents do not need to be preserved, it is faster to
create a new backing bo rather than stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
max_index may be 0xffffffff. The hardware does not need 1 + max_index
(although it does not hurt unless max_index wraps around to zero).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For mem->gmem we don't sample depth/stencil as it's native type. So we
need to setup the swizzle state for the sampler based on the format used
for sampling.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Needed by some games, like etuxracer and supertuxkart which use alpha
test rather than blending, to handle texture transparency.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With a debug option to force DIRECT (mainly to make it easier for
capturing cmdstream dumps). Using INDIRECT for large shaders at least
makes a noticable reduction in CPU load, which helps for CPU limited
games.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Because of how the tiling works, we can't really flush at arbitrary
points very easily. So wraparound is handled by resetting to top of
ringbuffer. Previously this would stall until current rendering is
complete. Instead cycle through multiple ringbuffers to avoid a stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Emit markers by writing to scratch registers in order to "triangulate"
gpu lockup position from post-mortem register dump. By comparing
register values in post-mortem dump to command-stream, it is possible to
narrow down which DRAW_INDX caused the lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Have a single helper that all draws come through.. mainly for a
convenient debug and instrumentation point.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The varying-out config comes from the inputs of the frag shader (so that
we aren't exporting unneeded varyinges). The varyings-count should come
from the frag shader as well, to avoid a discrepency in configuration
and resulting gpu lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
start_instance doesn't affect gl_InstanceID.
There's no piglit test, but it's kinda obvious the code was wrong.
Reviewed-by: Christian König <christian.koenig@amd.com>
The shader is responsible for writing to streamout buffers using
the TBUFFER_STORE_FORMAT_* instructions.
The locations of some input SGPRs and VGPRs are assigned dynamically, because
the input SGPRs controlling streamout are not declared if they are not needed,
decreasing the indices of all following inputs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>