The line stipple pattern and factor only matter if line stippling is
actually enabled. Otherwise, we can safely ignore it.
PBO upload may give us zero for line stipple information, while normal
drawing tends to give us an actual stipple pattern such as 0xffff. This
was causing us to flag IRIS_DIRTY_LINE_STIPPLE way too often, leading to
useless 3DSTATE_LINE_STIPPLE commands, which are non-pipelined and thus
very expensive.
Improves performance in Manhattan 3.0 on Skylake GT4e by
0.149261% +/- 0.0380796% (n=210). On an Icelake 8x8 with the GPU
frequency locked at 700Mhz, improves by 0.423756% +/- 0.222843% (n=3).
These are supposed to be lowered into sge/slt/seq/sne equivalents.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
GP doesn't support fceil so we need to lower it.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
The entire point of schedule_first is that the node has to be scheduled
as soon as possible without any moves because it doesn't produce a
proper floating-point value, or its value changes depending on where you
read it. We were still introducing a move for preexp2 in some cases
though, even if it got scheduled as soon as possible, which broke some
exp() tests. Fix that.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The whole point of schedule_first nodes is that they need to be
scheduled as soon as possible, so if a schedule_first node is the
successor in a fake dependency that prevents it from being scheduled
after its parent, that can cause problems. We need to add these fake
dependencies to the parent as well, and we need to guarantee that the
pre-RA scheduler puts schedule_first nodes right before their parents in
order to prevent this from adding cycles to the dependency graph.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The idea was to make sure schedule_first nodes were always first in the
ready list. I made sure they were inserted first, but not that other
nodes wouldn't later be scheduled ahead of them. Fixes
spec@glsl-1.10@execution@built-in-functions@vs-exp-float and probably
others.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The point of the function is to avoid creating a complex move which is
used by certain slots in the next instruction, but unscheduled
successors will never be in the next instruction. Found while debugging
a crash that the previous commit fixed.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
The scheduler assumes that load nodes are always duplicated so that they
can always be scheduled eventually and therefore they never need to be
spilled. But some lowerings were running after the pre-RA scheduler,
whereas duplication has to happen before then since it's needed for the
scheduler to do a better job reducing register pressure. This meant
that lowerings were introducing multiple uses of a load instruction,
which broke the scheduler's expectation and resulted in infinite loops
in situations where the only nodes available to spill were load nodes.
Spilling load nodes would be silly, so we want to fix the lowerings
rather than the scheduler. Just do all lowerings before the pre-RA
scheduler, which also helps with reducing pressure since the scheduler
can more accurately compute the pressure.
Fixeslima/mesa#104.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
So we can move all the BO logic into this file instead of having it
spread over pan_resource.c, pan_drm.c and pan_bo_cache.c.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The last users have been converted to use plain BOs. Let's get rid of
this abstraction. We can always consider adding it back if we need it
at some point.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Some fields in panfrost_context are unused (probably leftovers from
previous refactor). Let's get rid of them.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
ctx->{scratchpad,tiler_heap,tiler_dummy} are allocated using
panfrost_drm_allocate_slab() but they never any of the SLAB-based
allocation logic. Let's convert those fields to plain BOs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Right now, the transient memory allocator implements its own BO caching
mechanism, which is not really needed since we already have a generic
BO cache. Let's simplify things a bit.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The context can be retrieved from batch->ctx.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Given the function name it makes more sense to pass it a job batch
directly.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
What we currently call a job is actually a batch containing several jobs
all attached to a rendering operation targeting a specific FBO.
Let's rename structs, functions, variables and fields to reflect this
fact.
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: 3844ed8d44 ("radv: Add adaptive_sync driconfig option and enable it by default.")
Fixes: e260493f2a ("radeonsi: Enable adaptive_sync by default for radeon")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This bit redirects the state cache from the unified/RO sections of the
L3 cache to the "CS command buffer" section of the cache, which would
be set up via TCCNTLREG. The documentation says:
"Additionaly, this redirection should be enabled only if there is a
non-zero allocation for the CS command buffer section."
We don't allocate any cache to the CS command buffer section, so
enabling this redirection effectively disabled the state cache.
The Windows driver only sets up that section when using POSH, which
we do not currently use. So, leave it unallocated and disable the
redirection to get a functional state cache again.
Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%,
and Car Chase by 2%.
Jason pointed out that the caches likely refer to offsets from dynamic
and surface state base addresses, so when we change those, we need to
invalidate the caches.
Comment borrowed from src/intel/vulkan/genX_cmd_buffer.c.
The driver can't determine PIPE_QUERY_PRIMITIVES_GENERATED or
PIPE_QUERY_PRIMITIVES_EMITTED once we support geometry or
tessellation, since these stages add primitives at runtime. Use the
WRITE_PRIMITIVE_COUNTS event to write back the primitive counts and
implement a hw query for this.
Reviewed-by: Rob Clark <robdclark@gmail.com>
The GPU writes out streamout offsets as it goes to the FLUSH_BASE
pointer. We use that value with CP_MEM_TO_REG when appending to the
stream so that we don't have to track the offsets with the CPU in the
driver. This ensures that streamout continues to work once we enable
geometry and tessellation shader stages that add geometry.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Should fix some issues we're seeing. And use REALLOC instead of realloc.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Sometimes LAVA jobs will timeout due to transient issues, and the Gitlab
job will fail in that case. Increase the timeouts to reduce the
likeliness of that happening and reduce false positives.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
So repositories don't need to be specially configured with a token to
access LAVA, store this token in a bind volume for a special runner.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This commit removes the GLSL dependency in TTN by manually recording
the textures used and calling nir_lower_samplers
instead of its GL counterpart.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This fixes a memory leak in the flush code:
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 in __interceptor_realloc .../gcc-8.3.0/libsanitizer/asan/asan_malloc_linux.cc:105
#1 in si_buffer_do_flush_region src/gallium/drivers/radeonsi/si_buffer.c:573
#2 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:608
#3 in si_buffer_flush_region src/gallium/drivers/radeonsi/si_buffer.c:597
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>