The mechanism implemented in the hardware to synchronize against Write
after Read hazards with the visibility stream for concurrent binning is
for BV and BR to keep track of the number of render passes they have
finished and BV waits until
BR_count >= BV_count - vis stream count. For example, if
there are two visibility streams and the user submits three
renderpasses, before starting renderpass #3 BV will wait for BR to
finish renderpass #1. It's assumed that renderpass #3 and #2 use
different visibility streams, so it's safe to start working on #3 once
#2 is done.
This mechanism is assumed to work across renderpasses and even submits,
and the only way to reset the BR/BV counts is via
CP_RESET_CONTEXT_STATE which is only done by the kernel when
switching contexts. This vastly complicates things for Vulkan,
where we have no idea what order command buffers will be submitted. This
means that we have to defer emitting the actual pointers until
submission time and create patchpoints instead. This gets unfortunately
very complicated with SIMULTANEOUS_USE_BIT where we have to update the
patchpoints on the GPU.
I've taken the liberty of also deferring the allocation of the
visibility stream until submit time. This will help us later move to
per-queue visibility streams, which will be necessary for supporting
multiple simultaneous queues.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
Start introducing commands to setup BV. We need to run the initial
register setup with both BR and BV enabled, and similarly we need to
setup a bin preamble for BV. A few magic registers are BR-only and
should be skipped when initializing BV. The VPC attribute carveout
registers are a bit special because they must be initialized in BR and
BV bin preambles, so they are pulled into a separate function.
This commit also switches the "default" thread control from BR with
concurrent binning disabled to BR with concurrent binning enabled. GMEM
renderpasses now explicitly disable concurrent binning. This is
necessary because switching from CB enabled to disabled and vice versa
imposes a synchronization, and we want BV to be able to skip over
compute dispatches, sysmem renderpasses, etc. to find the next binning
pass. GMEM renderpasses re-enable concurrent binning at the end to keep
the "default" thread control and avoid having to sprinkle
THREAD_CONTROL(BR) around all the other Vulkan commands that can run
outside of a renderpass (compute dispatch, blits, query pool operations,
etc.).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
With concurrent binning, reading directly from the VSC_STATE registers
for conditional loads/stores doesn't work because BR and BV have their
own copy of VSC registers and the BV is running ahead of BR. Instead,
the blob's solution is to allocate extra space in memory for the
contents of VSC_STATE and upload it there. It has to be allocated
together with draw stream/prim stream because it is also written by BV
and read by BR and therefore it also needs to be duplicated to avoid
hazards. The memory is then read in BR via CP_MEM_TO_SCRATCH_MEM, which
is purpose-made for this.
Make turnip use this strategy on a7xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
The BV mempool is even further cut down compared to the "small mem pool"
layout which seems to be used by a610. It also shrinks the block size to
4 chunks instead of 8. This layout happens to be shared by a702, so
abstract out the layout into a "mempool size" enum.
While we're here, fix a bug with how the mempool offset for chunks is
printed. This accounts for the test diff.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
This was already broken for BINDLESS_BASE, but we didn't notice it
because we weren't using the builders. We have to cast fields that we OR
in, and we need to return uint64_t from the legacy field functions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36590>
The arr::in_bounds field was set unconditionally for every deref created
for a chain. For struct derefs, which don't have this field, this would
write to an unused memory location, which is probably why this never
caused issues.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: f19cbe98e3 ("nir,spirv: Preserve inbounds access information")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38110>
memmove was slow on large number of descriptor when destroying some of
them.
util_vma_heap is perfect for the task, ANV and RADV already use it.
It also simplifies the code.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38053>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Some GPUs, like AMD, has specific stride align requirements in order to
display the content correctly.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
As the BOs created in GPU needs to be accessible from the simulator,
create them in GTT memory, with CPU access.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Some GPUs, like AMD, require specific stride alignment in order to
display the content correctly.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
As the BOs created in GPU needs to be accessible from the simulator,
create them in GTT memory, with CPU access.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37887>
Disable the max-fails feature in deqp-runner for this job, since it
aborts the run due to failures that don't occur otherwise.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38051>
This job runs on MT8196 Rauru Chromebooks.
Also remove the old G725 expectation files, as G725 is a smaller variant
of G925.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38051>
VCN requires 64x16 alignment for HEVC. When the app requests non-aligned
resolutions, make up for it with conformance window cropping.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38061>
doorbell does not require va address mapping from userspace and hence
amdgpu_bo_va_op_common() function is not called and therefore doorbell
bo->vm_timeline_point is not updated. Currently to wait for all mappings
to be ready doorbell vm_timeline_point is used which is incorrect.
This patch updates vm_timeline_point to wait for all bos. The bos
can be real bo or slab bo. slab bo can be from old buffer and hence
there is a check to update vm_timeline_point to wait only if it is
new.
Reported-by: Zhang, ShanYi (Ken) <ShanYi.Zhang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38059>
We don't actually advertise compute-only or depth-only queues right now
but nothing in the spec says you have to advertise the queues in order
to advertise the bits. Setting them now ensures we don't forget them
when compute-only or transfer-only queues get added.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38094>
This contains resolve modes which override the format-based defaults as
well as resolve flags to allow disabling sRGB conversion.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38094>