r600_gpu_load.c: In function ‘r600_gpu_load_thread’:
../../../../src/util/os_time.h:82:7: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow]
if (start <= end)
They might lead to unrecoverable GPU hang.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
color_interp_vgpr_index was declared as a generic char value.
Because signed values are used in this variable, the result
was not safe across architectures and crashed on ppc64[el]
and arm.
Declare color_interp_vgpr_index as a signed type.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Dependencies between rings are inserted correctly if a buffer is
represented by only one unique amdgpu_winsys_bo instance.
Use a hash table keyed by amdgpu_bo_handle to have exactly one
amdgpu_winsys_bo per amdgpu_bo_handle.
v2: return offset and stride properly
Tested-by: Leo Liu <leo.liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Core infrastructure for performance counters, using gallium's batch
query interface (to support AMD_performance_monitor).
Signed-off-by: Rob Clark <robdclark@gmail.com>
For batch queries we have N different query_type's for one query, so
mapping a single query_type to a sample_provider doesn't really work
out. Instead add a new constructor to construct a query directly
from a sample_provider.
Also, the sample buffer size needs to be determined at runtime, as
it depends on the number of query_types.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Take the query object, rather than the ctx. The ctx ptr isn't hugely
useful but for back queries we will need the query object to properly
get the results.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For now it still goes to stdout, this will make it easier to support
output on stderr like what frameretrace expects.
(If we eventually have a proper GL extension for this, implementation
probably looks like dumping shader disasm to a tmp file and then dumping
that out over whatever mechanism is used.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
v2: reword comment about lower_helper_invocations to be more clear
that it might not work on all hardware
v3: add special variant of load_sample_id which does not imply per-
sample shading
Signed-off-by: Rob Clark <robdclark@gmail.com>
Abort all dependent events.
v2: Abort the current event as well.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Just a nice hint for both peoples and compilers.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Just a nice hint for both peoples and compilers.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Just a nice hint for both peoples and compilers.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Ported from radeonsi. Improves windowed glxgears ran as
vblank_mode=0 glxgears -info -geometry 0+0+512+512
from ≈2270 FPS to ≈2360 FPS. Tested with AMD TURKS.
v2: turned out glxgears ignores the option above, the correct way would
be "512x512+0+0". Now it can be seen 512x512 actually loses 30 FPS.
300×300 however wins around a hundred FPS, and to leave some room in
case results may differ for other cards I want not to nitpick in search
of an optimum but to simply leave 300×300 in the code.
v3: remove redundant braces, and try harder for the mail to stick to
the rest of the series.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Annoyingly we still have to briefly drop the lock to unref resources..
but push the lock down into __fd_batch_destroy() so we can invalidate
the batch and reset resources before dropping the lock.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Re-allocate rather than re-use. Originally we had an unnecessarily
complex design to avoid re-allocating cmdstream buffers. But now that
support for "growable" cmdstream buffers has been in place for a couple
years, I guess we can care a bit less about the extra overhead on older
kernels.
But making the batches one-shot removes a class of potential race
conditions vs the flush_queue.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of the reading batch setting a dependency on the writing batch,
simply flush the writing batch immediately. This avoids situations
where we have to flush the context's current batch later.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This was basically to avoid a zero-dword IB (indirect-branch), but
instead just don't emit the IB packet in that case.
Signed-off-by: Rob Clark <robdclark@gmail.com>
pipe_framebuffer_state can have samples=0 in various cases, which is
actually the same thing as samples=1. So use the _get_num_samples()
helper to populate the key, to avoid this looking like two distinct
fb states to the cache.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It is possible for a batch to be freed under our feet when flushing, so
it is best to hold a reference to all of them up-front.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The current code is buggy: if there are only 12 dwords left in cbuf,
we emit a zero data length command which will be rejected by virglrenderer.
Fix it by calling flush in this case.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
This allows us to implement glMinSampleShading correctly, which up
until now just got ignored.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Converting from a switch statement that would not allow intermediate sample counts
to use an if-else chain went a bit wrong, so that in some cases the range that
should be inclusive was exclusive and the line for 16 samples was copies wrongly.
v2: elaborate commit message.
Fixes: 91f48cdfe5
virgl: Add support for glGetMultisample
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v1)
fixes a couple of packed_pixel CTS tests. No regressions inside a CTS run.
v2: simplify the changes a bit
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>