do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
similar to gl_buffer_object::UsageHistory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
not used anymore. It was used when the polygon stipple texture was constant.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
We want to emit IA_MULTI_VGT_PARAM less often because it's a context reg.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Caches are always flushed at IB boundary.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
If primitive restart is enabled for two consecutive draws which use
different primitive restart indices, then the first draw's primitive
restart index was incorrectly used for the second draw.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98025
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previous fundamental change in stats gathering added a temporary
SwrWaitForIdle to begin_query and end_query. Code has been reworked to
remove stall.
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Create worker pool now computes number of worker threads based on
things like topologies, etc. and creates the pool but doesn't actually
launch the threads. Instead there is a separate start thread pool
function. This allows thread resources to be constructed first before
threads start.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- Immediately sleep threads until thread data is initialized
- Fix some compile bugs with AR enabled
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- Move most jitter functionality into SwrJit namespace
- Avoid global "using namespace llvm" in headers
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Detect all of the CPUs in the system. Expose metrics
for min, max and current frequency in Hz.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Regression introduced by "gallium/radeon: zero all query buffers".
Cc: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is only needed for r600 which doesn't have ARB_query_buffer_object and
therefore wouldn't really need the fences, but let's be optimistic about
filling in this feature gap eventually.
Cc: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We don't plan to use sub-allocated buffers with UVD, but just in case one
slips through, this increases the chances of things working out anyway.
Reviewed-by: Christian König <christian.koenig@amd.com>
We don't plan to use sub-allocated buffers with VCE, but just in case one
slips through, this increases the chances of things working out anyway.
Reviewed-by: Christian König <christian.koenig@amd.com>
Implement support for power based sensors, reporting units in
milli-watts and watts.
Also, minor cleanup - change the related if block to a switch.
Tested with two different power sensors, including the nouveau
'power1' sensors on a GTX950 card.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commutativity is not allowed with SHLADD, but src2 can accept
loads. To allow the load propagation pass to do its job, add a
special case like for SUCLAMP because src1 is always an immediate.
This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
Warrior.
GF100/GK104:
total instructions in shared programs :2838045 -> 2834712 (-0.12%)
total gprs used in shared programs :396684 -> 396386 (-0.08%)
total local used in shared programs :34416 -> 34416 (0.00%)
local gpr inst bytes
helped 0 326 1105 1105
hurt 0 55 3 3
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Only and only if src1 is a power of 2 we can replace IMAD by SHLADD.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Unfortunately, we can't use the emit helpers for GF100/GK110
because src1 and src2 are swapped.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
envyas now uses a much better representation for those control
codes and it displays the different flags instead of an
unreadable hex number.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Most of the time, even the 512 bytes that we now get is more than sufficient
(pipeline stats queries are the largest at 184 bytes per shot).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To ensure that fences are properly initialized.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We will support the waiting option in ARB_query_buffer_object using
WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently
write their results with the highest bit set, and we can just use that;
for others, we have to write a fence explicitly.
ZPASS_DONE for occlusion queries writes its results with the high bit
set, but it writes up to 8 pairs of results (one for each DB). We have
to wait for all of these results, so let's just add an explicit fence.
The new function provides summary information to be used by subsequent
patches.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>