When adding a new buffer to the beginning of the memory pool, we were
accidentally deleting the buffer that was first in the buffer list.
This was caused by a bug in the memory pool's linked list
implementation.
The fixup code emulates non-BGRA render targets by adding an
extra instruction at the end of fragment shaders to swizzle the
output. To do this, we also swizzle the blend function. However
an oversight until now was that the writemask wasn't getting
swizzled. This patch fixes that which fixes a bunch of piglit
tests.
This patch adds liveness analysis to i915g and a couple
optimizations which benefit from it. One interesting
optimization turns (fake) indirect texture accesses into direct
texture accesses (the i915 supports a maximum of 4 indirect
texture accesses). Among other things this fixes a bunch of
piglit tests.
Fixes assertion failues in 24 piglit tests with
MESA_GL_VERSION_OVERRIDE=3.0, 12 of which are now passing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Based on calim's original fix in the nine branch.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "9.2 and 9.1" <mesa-stable@lists.freedesktop.org>
This mimics r600g. The R600_CONTEXT_xxx flags are added to rctx->b.flags
and si_emit_cache_flush emits the packets. That's it. The shared radeon code
tells us when the streamout cache should be flushed, so we have to check
the flags anyway.
There is a new atom "cache_flush", because caches must be flushed *after*
resource descriptors are changed in memory.
Functional changes:
* Write caches are flushed at the end of CS and read caches are flushed
at its beginning.
* Sampler view states are removed from si_state, they only held the flush
flags.
* Everytime a shader is changed, the I cache is flushed. Is this needed?
Due to a hw bug, this also flushes the K cache.
* The WRITE_DATA packet is changed to use TC, which fixes a rendering issue
in openarena. I'm not sure how TC interacts with CP DMA, but for now it
seems to work better than any other solution I tried. (BTW CIK allows us
to use TC for CP DMA.)
* Flush the K cache instead of the texture cache when updating resource
descriptors (due to a hw bug, this also flushes the I cache).
I think the K cache flush is correct here, but I'm not sure if the texture
cache should be flushed too (probably not considering we use TC
for WRITE_DATA, but we don't use TC for CP DMA).
* The number of resource contexts is decreased to 16. With all of these cache
changes, 4 doesn't work, but 8 works, which suggests I'm actually doing
the right thing here and the pipeline isn't drained during flushes.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
There is a new "class" si_buffer_resources, which should be good enough for
implementing any kind of buffer bindings (constant buffers, vertex buffers,
streamout buffers, shader storage buffers, etc.)
I don't even keep a copy of pipe_constant_buffer - we don't need it.
The main motivation behind this is to have a well-tested infrastrusture
for setting up streamout buffers.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Also r600_hw_context_priv.h and si_state_streamout.c are removed, because
they are no longer needed.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
This streamout state code will be used by radeonsi.
There are new structures r600_common_context and r600_common_screen.
What is inherited by what is shown here:
pipe_context -> r600_common_context -> r600_context
pipe_screen -> r600_common_screen -> r600_screen
The common structures reside in drivers/radeon. Currently they only contain
enough functionality to be able to handle streamout. Eventually I'd like
the whole pipe_screen implementation to be shared and some of the context
stuff too.
This is quite big, but most changes are because of the new structures and
the fact r600_write_value is replaced by radeon_emit.
Thanks to Tom Stellard for fixing the build for r600g/compute.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
No idea if this is working right but copied straight from llvmpipe.
(Not only does this check the so_target but also use buffer->data instead
of buffer for the mapping.)
Just trying to get rid of a segfault testing something else...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes "Missing break in switch" defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to set the flag on all the .xyzw components that are written by
the instruction, not just on .x. Otherwise a later use of rN.y (for
example) will not trigger the appropriate sync bit to be set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems like most/all instructions have some restrictions about const src
registers. In seems like the 2 src (cat2) instructions can take at most
one const, and the 3 src (cat3) instructions can take at most one const
in the first 2 arguments. And so on. Handle this properly now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
MSAA was tested by one user on RS690 and it works for him with color
compression (CMASK) disabled. Our theory is that his chipset lacks CMASK RAM.
Since we don't have hardware documentation about which chipsets actually have
CMASK RAM, I had to take a guess based on the presence of HiZ.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
In particular noone is interested in the vertex count, so drop that,
and also drop the duplicated num_primitives_generated /
so.primitives_storage_needed variables in drivers. I am unable for now to figure
out if primitives_storage_needed in SO stats (used for d3d10) should
increase if SO is disabled, though the equivalent num_primitives_generated
used for OpenGL definitely should increase. In any case we were only counting
when SO is active both in softpipe and llvmpipe anyway so don't pretend there's
an independent num_primitives_generated counter which would count always.
(This means the PIPE_QUERY_PRIMITIVES_GENERATED count will still be wrong just
as before, should eventually fix this by doing either separate counting for this
query or adjust the code so it always counts this even if SO is inactive depending
on what's correct for d3d10.)
Reviewed-by: Brian Paul <brianp@vmware.com>
I assume this should have been part of commit
7727fbb7c5. This (obviously) fixes a lot tests.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Commit 53e20b8b introduced the use of a template to initialize some
common fields. Move this copying of fields to before the common vp3
fields are initialized.
Reported-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christian König <christian.koenig@amd.com>
The cmps.f.* instruction doesn't actually seem to give a float 1.0 or
0.0 output. It either needs a cov.u16f16 or add.s + sel.f16. This
makes SGT/SLT/etc more similar to CMP, so handle them in trans_cmp().
This fixes a bunch of piglit tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems there are a number of cases where instructions have limitations
about taking reading src's from const register file, so make
get_unconst() a bit easier to use.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We probably should get rid of assert() entirely, but at this stage it is
more useful for things to crash where we can catch it in a debugger.
With compile_error() we have a single place to set an error flag (to
bail out and return an error on the next instruction) so that will be a
small change later when enough of the compiler bugs are sorted.
But re-arrange/cleanup the error/assert stuff so we at least get a dump
of the TGSI that triggered it. So we see some useful output in piglit
logs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>