We implement this by emitting a draw call, which should not be registered
during occlusion query counting.
Fixes:
dEQP-VK.query_pool.occlusion_query*clear*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22131>
Currently, we postpone binning syncs until we record draw calls
and can validate if any of them require accessing protected
resources in the binning stage, however, if the draw calls are
recorded in a secondary command buffer and the barriers have
been recorded in the primary command buffer, we won't apply the
binning sync in the secondary when we record the draw calls
and so we must apply it when we execute the secondary in the
primary.
Fixes flakyness in:
dEQP-VK.api.command_buffers.record_many_draws_secondary_2
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21162>
Original patches wrote by Ella Stanforth.
Alejandro Piñeiro main changes (skipping the small fixes/typos):
* Reduced the list of supported formats to
VK_FORMAT_G8_B8_R8_3PLANE_420_UNORM and
VK_FORMAT_G8_B8R8_2PLANE_420_UNORM, that are the two only
mandatory by the spec.
* Fix format features exposed with YCbCr:
* Disallow some features not supported with YCbCr (like blitting)
* Disallow storage image support. Not clear if really useful. Even
if there are CTS tests, there is an ongoing discussion about the
possibility to remove them.
* Expose VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT, that is
mandatory for the formats supported.
* Not expose VK_FORMAT_FEATURE_2_MIDPOINT_CHROMA_SAMPLES_BIT. Some
CTS tests are failing right now, and it is not mandatory. Likely
to be revisit later.
* We are keeping VK_FORMAT_FEATURE_2_DISJOINT_BIT and
VK_FORMAT_FEATURE_2_MIDPOINT_CHROMA_SAMPLES_BIT. Even if they
are optional, it is working with the two formats that we are
exposing. Likely that will need to be refined if we start to
expose more formats.
* create_image_view: don't use hardcoded 0x70, but instead doing an
explicit bit or of VK_IMAGE_ASPECT_PLANE_0/1/2_BIT
* image_format_plane_features: keep how supported aspects and
separate stencil check is done. Even if the change introduced was
correct (not sure about that though), that change is unrelated to
this work
* write_image_descriptor: add additional checks for descriptor type,
to compute properly the offset.
* Cosmetic changes (don't use // for comments, capital letters, etc)
* Main changes coming from the review:
* Not use image aliases. All the info is already on the image
planes, and some points of the code were confusing as it was
using always a hardcoded plane 0.
* Squashed the two original main patches. YCbCr conversion was
leaking on the multi-planar support, as some support needed
info coming from the ycbcr structs.
* Not expose the extension on Android, and explicitly assert that
we expect plane_count to be 1 always.
* For a full list of review changes see MR#19950
Signed-off-by: Ella Stanforth <estanforth@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19950>
So far we have been only restoring dirty dynamic states used by meta
pipelines however, static state from meta pipelines will also clear
dirty flags, preventing follow-up draw calls in the command buffer
to honor these if they are flagged as dynamic states in their
pipelines. Fix this by always resetting all dirty state flags after
a meta operation so we re-emit all the state we need with the next draw
call.
Fixes:
dEQP-VK.dynamic_state.monolithic.image.clear
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20356>
When falling back to handling subpass resolves via separate
image resolves we were resolving the entire attachment instead
of limiting the resolve to the render area defined for the render
pass.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
If we can't use the TLB to do a subpass resolve we have a fallaback
that emits separate image resolves, but this fallback was only
handling color resolves. This adds depth/stencil as well.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
attachment state is only relevant during render passes, however,
there is a corner case: if we can't resolve an attachment in a
subpass using the hardware, we emit a manual image resolve in the
driver which can trigger a meta operation via blit. In this case,
we pretend we are not in a render pass (since vulkan disallows
blits/resolves in a render pass) but we really want to keep the
attachment state after the meta operation.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
Delete v3dv_CmdDispatch, v3dv_CmdSetDeviceMask, and
v3dv_GetDeviceGroupPeerMemoryFeatures so that the vk_common_*
versions will be used instead. This will avoid repeated code.
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20218>
Our implementation was mostly CPU-based, with things such as query
resets and result copying handled in the CPU, as well as some aspects
of query availability tracking.
This new implementation handles all GPU-side query functions by
dispatching compute shaders to push the work to the GPU. This
involves query availability, reset and result copying.
For now, only occlusion queries are managed this way. Performance
queries can also be implemented in a similar fashion in the future
with some additional work, however, for timestamp queries our only
option to improve this would be to execute the actual timestamp in the
kernel, since we can't take a timestamp from a shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19770>
If we have any pending jobs queued in the command buffer state
to be emitted at the end of a given job, make sure we reset
that state once these have been processed.
cc: mesa-table
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19770>
This replaces our current implementation, which is 100% CPU based,
with an implementation that uses compute shaders for the GPU-side
event functions. The benefit of this solution is that we no longer
need to stall on the CPU when we need to handle GPU-side event
commands.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19313>
If the render area is not aligned to tile boundaries it means we have partially
covered tiles in the framebuffer. In this case, we always need to load the tile
buffer from memory in order to preserve the contents outside the render area
on the tile buffer store. However, if in this scenario we know we won't be
storing the tile buffer we can skip the load safely.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18570>
This is the standard pattern in the kernel for providing vfunc tables
for C objects. We're using it in the pipeline cache code but we're
about to start adding more stuff and so it really helps if we have it
for command buffers as well.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18324>
Most other init functions follow the Vulkan API convention of putting
the parent object first.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18324>
The common code in Mesa will rewrite the old entry points to these
if available.
Since we implement events and timestamps queries in the driver the
API changes don't quite affect us.
vkQueueSubmit2 is already implemented in the common synchronization
framework and the driver works in terms of a submit hook that is
independent of the actual entry point used by the application.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18290>
Added with KHR_synchronization2. The common code in Mesa will
rewrite vkCmdPipelineBarrier to vkCmdPipelineBarrier2.
With synchronization2 barriers now have a per-barrier stage
and access flags (previously these were shared by all the barriers
in a vkCmdPipelineBarrier commands), so we need to rewrite a bit
the logic to account for this.
Also, stage and access flag bits have been expanded and renamed.
Particularly, some new flags have been added that we need to account
for.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18290>
For some reason we were only checking the binning VS stage, but we
should also check the GS, like we do for other access types.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18290>
The main issue was the inconsistent use of `unlikely()`, but the macro
also simplifies the code a little bit.
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18086>
We improved this a long time ago and now it emits a clear rect inside
the current subpass job instead of emitting its own job with its own
RCL, so we no longer need to handle this as an exception to the rule.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
For this we add a scoring system that evaluates various aspects of
the draw calls in a job.
If the cost of the geometry side of the pipeline is too high, then
we may pay too high a price in double-buffer mode because with smaller
tile size may will probably have more vertex shader invocations in the
render and binning stages.
On the other hand, if rendering cost is not high enough, we may not
have enough rendering work to hide the latency of tile stores in
double-buffer mode.
Also, because we need to make a decision after we know all the draw
calls in a job, but the double-buffer enable bit comes in the
TILE_BINNING_MODE_CFG that needs to be emitted first in the binning
command list before the draw calls are recorded, if we decide to
enable it we need to rewrite that packet and we need to size the
tile state properly to account for the extra tiles. For this
purpose we delay tile state setup for render pass jobs until we are
finishing a job.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
We want to have control over the double-buffer setting here so we can control
explicitly from the driver when we want to enable this mode.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
These are jobs for which we may want to enable double-buffering,
which affects tile state allocation. Since the idea is that we
want to decide about double buffering late, we also want to
postpone allocation of the tile state until we are about to
emit the RCL for the job.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
If we enable double-buffer we are reducing the tile size, and thus,
we'll need more tiles and a larger tile state allocation, so we'll
need to call to this helper.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
MSAA is not compatible with double-buffer mode. Also, jobs that emit
tile loads or that don't have any stores can't take advantage
of double-buffer mode.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
We only want to cleant BCL barrier flags if we consume a BCL barrier.
For example, if the client records a barrier for an index buffer
it should apply to the next draw call that uses an index buffer
which may not be the current draw call but one coming after it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17594>
This feature allows shaders to use pointers to buffers which may
not be bound via descriptor sets. Access to these buffers is done
via global intrinsics.
Because the buffers are not accessed through descriptor sets, any
live buffer flagged with VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT_KHR
can be accessed by any shader using global intrinsics, so the driver
needs to make sure all these buffers are mapped by the kernel when
it submits the job for execution.
We handle this by tracking if any draw call or compute dispatch in
a job uses a pipeline that has any such shaders. If so, the job is
flagged as using buffer device address and the kernel submission
for that job will add all live BOs bound to buffers flagged with the
buffer device address usage flag.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
If we have 2 pipelines that consume the same push constant data
but where one of them only uses direct access and the other has
indirect access, a draw with the first pipeline would clear the
dirty flag without updating the UBO and by the time we bind and
draw with the second pipeline we won't upload the constants either
because the first draw cleared the dirty flag.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
If the command buffer didn't have any push constants or the meta
operation didn't write any new constants we don't need to restore
the state.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
This only works if the framebuffer config is exactly the same so
testing both subpasses have the same attachments is not enough,
they also need to be exactly in the same order.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17358>
When we switched to using structs to track barrier state we made a mistake
and started to overwrite barrier state in primary command buffers with
the pending state from secondary command buffers executed inside them, when we
should've been merging the state instead.
Fixes flakyness with some CTS barrier tests.
Fixes: f7ce42636c ('v3dv: use an explicit struct type to track barrier state')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
Until know when we consumed a barrier we would implement it by
setting the serialize flag on a job, which would cause it to
be serialized across all hardware queues (CL, CSD, TFU). However,
now that we track the source(s) of the barrier, we can restrict this
to only the relevant queue(s) instead (multisync path only).
It should be noted that we can implement transfers via TFU or CL
jobs, so if the source of a barrier is a transfer, we currently
synchronize against both the TFU and the CL queues, however, we
may be able to more effectively track this in the future to
restrict this to just one of the queues.
Also, for secondary command buffers we are taking the easy way
out and always synchronize against all queues, but we should
be able to do the same for secondaries without too much effort.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now we have been tracking the dstStageMask of barriers (where they
are consumed) but not where they are produced (the srcStageMask). With
this change we extend our barrier state to keep track of this as well.
This allows the driver to have better knowledge of the intended barrier
semantics so it can limit the amount of synchronization it does only
to the source stages involved with a barrier. We will do this in a
later patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>