Original patches wrote by Ella Stanforth.
Alejandro Piñeiro main changes (skipping the small fixes/typos):
* Reduced the list of supported formats to
VK_FORMAT_G8_B8_R8_3PLANE_420_UNORM and
VK_FORMAT_G8_B8R8_2PLANE_420_UNORM, that are the two only
mandatory by the spec.
* Fix format features exposed with YCbCr:
* Disallow some features not supported with YCbCr (like blitting)
* Disallow storage image support. Not clear if really useful. Even
if there are CTS tests, there is an ongoing discussion about the
possibility to remove them.
* Expose VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT, that is
mandatory for the formats supported.
* Not expose VK_FORMAT_FEATURE_2_MIDPOINT_CHROMA_SAMPLES_BIT. Some
CTS tests are failing right now, and it is not mandatory. Likely
to be revisit later.
* We are keeping VK_FORMAT_FEATURE_2_DISJOINT_BIT and
VK_FORMAT_FEATURE_2_MIDPOINT_CHROMA_SAMPLES_BIT. Even if they
are optional, it is working with the two formats that we are
exposing. Likely that will need to be refined if we start to
expose more formats.
* create_image_view: don't use hardcoded 0x70, but instead doing an
explicit bit or of VK_IMAGE_ASPECT_PLANE_0/1/2_BIT
* image_format_plane_features: keep how supported aspects and
separate stencil check is done. Even if the change introduced was
correct (not sure about that though), that change is unrelated to
this work
* write_image_descriptor: add additional checks for descriptor type,
to compute properly the offset.
* Cosmetic changes (don't use // for comments, capital letters, etc)
* Main changes coming from the review:
* Not use image aliases. All the info is already on the image
planes, and some points of the code were confusing as it was
using always a hardcoded plane 0.
* Squashed the two original main patches. YCbCr conversion was
leaking on the multi-planar support, as some support needed
info coming from the ycbcr structs.
* Not expose the extension on Android, and explicitly assert that
we expect plane_count to be 1 always.
* For a full list of review changes see MR#19950
Signed-off-by: Ella Stanforth <estanforth@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19950>
Our implementation was mostly CPU-based, with things such as query
resets and result copying handled in the CPU, as well as some aspects
of query availability tracking.
This new implementation handles all GPU-side query functions by
dispatching compute shaders to push the work to the GPU. This
involves query availability, reset and result copying.
For now, only occlusion queries are managed this way. Performance
queries can also be implemented in a similar fashion in the future
with some additional work, however, for timestamp queries our only
option to improve this would be to execute the actual timestamp in the
kernel, since we can't take a timestamp from a shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19770>
If the render area is not aligned to tile boundaries it means we have partially
covered tiles in the framebuffer. In this case, we always need to load the tile
buffer from memory in order to preserve the contents outside the render area
on the tile buffer store. However, if in this scenario we know we won't be
storing the tile buffer we can skip the load safely.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18570>
For this we add a scoring system that evaluates various aspects of
the draw calls in a job.
If the cost of the geometry side of the pipeline is too high, then
we may pay too high a price in double-buffer mode because with smaller
tile size may will probably have more vertex shader invocations in the
render and binning stages.
On the other hand, if rendering cost is not high enough, we may not
have enough rendering work to hide the latency of tile stores in
double-buffer mode.
Also, because we need to make a decision after we know all the draw
calls in a job, but the double-buffer enable bit comes in the
TILE_BINNING_MODE_CFG that needs to be emitted first in the binning
command list before the draw calls are recorded, if we decide to
enable it we need to rewrite that packet and we need to size the
tile state properly to account for the extra tiles. For this
purpose we delay tile state setup for render pass jobs until we are
finishing a job.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
When we switched to using structs to track barrier state we made a mistake
and started to overwrite barrier state in primary command buffers with
the pending state from secondary command buffers executed inside them, when we
should've been merging the state instead.
Fixes flakyness with some CTS barrier tests.
Fixes: f7ce42636c ('v3dv: use an explicit struct type to track barrier state')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
Until now we would only disable EZ globally if we had a depth or stencil
load operation or if we had no draw calls at all, but even if we have draw
calls if all of them disable EZ we should also us the global disable.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16794>
When we have a draw call that is incompatible with EZ we should only
disable EZ for the remaining of the job in the case that both of the
following conditions are met:
1. The cause for the incompatibility is an incompatible depth test
direction.
2. The pipeline does Z writes.
Otherwise it is enough to disable EZ temporarily only for draw calls
with the incompatible pipeline.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16794>
Until know when we consumed a barrier we would implement it by
setting the serialize flag on a job, which would cause it to
be serialized across all hardware queues (CL, CSD, TFU). However,
now that we track the source(s) of the barrier, we can restrict this
to only the relevant queue(s) instead (multisync path only).
It should be noted that we can implement transfers via TFU or CL
jobs, so if the source of a barrier is a transfer, we currently
synchronize against both the TFU and the CL queues, however, we
may be able to more effectively track this in the future to
restrict this to just one of the queues.
Also, for secondary command buffers we are taking the easy way
out and always synchronize against all queues, but we should
be able to do the same for secondaries without too much effort.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now, we have always consumed barriers with the next GPU job
recorded into the command buffer after the barrier even if the job
was not the target of the barrier itself. This works based on the
idea that when we consume a barrier in a job we serialize it against
all queues, so effectively we are ensuring that whatever came
before it has completed, so if the barrier was intended for an
even later job, it would have served its purpose anyway.
It should be noted that CL jobs are special because they are actually
split in two different queues: the binning queue and the render
queue, with a dependency between them to ensure render runs after
binning. With our current implementation, if we have 3 jobs (A, B,
C) and we have a barrier after job A which is intended to block job C
on A's completion, with our implementation we would instead block
B on A's completion. If C is a CL job, and the barrier was targetting
the binning stage then we can have the following scenarios:
1. If B) is a CL job, it will consume the barrier at its binning
stage, so we know that B's binning will not start until A has
completed. Then C's binning will not start until B's binning
has completed, and thus, will not start until A has completed,
as intended.
2. If B) is not a CL job, it will consume the barrier and will not
start until A has completed, however, C's binning job will be
submitted to the binning queue without any sync requirements
and since B did not put any jobs in the binning queue it will
start as soon as A's binning has completed, but not A's render,
which would be incorrect.
Further, since a981ac0539 we now skip consumming BCL barriers if
a job does not have draw calls that can be affected by them. In the
same scenarios as before, now case 1) would also be problematic,
since B may skip the binning sync in that case and start immediately,
and since C's binning would be allowe to start immediately after B's
binning, there is no guarantee that this doesn't happen in parallel
with A's render.
With this patch we fix this situation by tracking the intended
consumer of each barrier: graphics, compute or transfer, and we make
sure to consume them only with jobs that match those semantics.
This fixes flakyness in dEQP-VK.device_group.*
Fixes: a981ac0539 ('v3dv: skip binning sync if binning shaders don't access external resources')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now we have been enabling binning sync if we found a barrier
involving geometry stages (a bcl barrier), however, if the actual
binning shaders involved with the job don't access any external
buffers or images there is no reason to sync at the binning stage.
In this patch we don't immediately consume the bcl barrier flag from
the command buffer state when we create a new job. Instead, we check
this state when we are about to emit a draw call by checking if the
shaders involved with binning may access external resources, such as
vertex buffers, UBOs, or textures. If none of the draw calls in the
job use binning shaders that access external resources then we never
enable binning sync for the job.
It is possible that a binning shader uses resources that are not
synchronized through a barrier though, so we keep track of the
access masks used with barriers for both buffers and images separately
to better identify if the binning shader is affected by the barrier.
If a serialized job never consumes the bcl barrier flag because none
of its draw calls ever required a bcl sync, then the flag will be
cleared when the job is finished.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
When I originally added vk_image_view, I was overly clever when it came
to the format field. I decided to make it only contain the bits of the
format contained in the selected aspects. However, this is confusing
(not generally a good thing) and it's also not always what you want.
The Vulkan 1.3.204 spec says:
"When using an image view of a depth/stencil image to populate a
descriptor set (e.g. for sampling in the shader, or for use as an
input attachment), the aspectMask must only include one bit, which
selects whether the image view is used for depth reads (i.e. using a
floating-point sampler or input attachment in the shader) or stencil
reads (i.e. using an unsigned integer sampler or input attachment in
the shader). When an image view of a depth/stencil image is used as
a depth/stencil framebuffer attachment, the aspectMask is ignored
and both depth and stencil image subresources are used."
So, while the restricted format makes sense for texturing, it doesn't
for when the image is being used as an attachment. What we probably
actually want is both versions of the format. We'll call the one given
by the VkImageViewCreateInfo vk_image_view::format and the restricted
one vk_image_view::view_format.
This is just the first commit which switches format to view_format so
the compiler will make sure we get them all. The next commit will
re-add vk_image_view::format but this time unmodified.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15007>
Looks like 3 implementations already have that field in their private
command_buffer struct, and having it at the vk_command_buffer opens the
door for generic (but suboptimal) secondary command buffer support.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14917>
Just as with all other TLB operations, we can only use the TLB if the render
area is aligned to tile boundaries. If it is not, then the operation would
overwrite pixels outside the render area, which is not allowed.
In this case, we can't even emit a previous TLB load to fix this because the
TLB has the multisampled attachment, not the resolve attachment, which is
just a destination buffer for the tile store.
Because the condition for tile alignment has to be determined for each
subpass, we handle this by storing this information in the attachment
state of the command buffer with the start of each subpass. We store
whether the attachment is to be resolved and whether it can use the
TLB (considering tile alignment restrictions).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14752>
When we create a image view with D24S8 format we made a reformatting
to RGBA8UI if the aspect selected is just STENCIL. But when
configuring the stores we select the aspects based on the attachment
format. Quoting from cmd_buffer_render_pass_emit_stores:
/* From the Vulkan spec, VkImageSubresourceRange:
*
* "When an image view of a depth/stencil image is used as a
* depth/stencil framebuffer attachment, the aspectMask is ignored
* and both depth and stencil image subresources are used."
*
* So we ignore the aspects from the subresource range of the image
* view for the depth/stencil attachment, but we still need to restrict
* the to aspects compatible with the render pass and the image.
*/
const VkImageAspectFlags aspects =
vk_format_aspects(ds_attachment->desc.format);
So we could ending trying to store on a Z+Stencil buffer, using a
RGBA8UI format.
So far this only affected some tests when using the simulator
(assert). Those were working on the real hw, but probably would fail
on other situations, so lets use the original image format on that
case.
v2 (Iago)
* Improve comment grammar
* Do the same on load too (not just store)
v3 (Iago)
* Re-word comments.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14635>
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.
don't store the enum in shader_info as it changes size, and confuses
other things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
Double buffer mode splits the tile buffer size in half so we can
start processing the next tile while the current one is being
stored to memory. This mode is available only if MSAA is not enabled
and can, in theory, improve performance by reducing tile store
overhead, however, it comes at the cost of reducing the tile size,
which also causes some overhead of its own.
Testing shows that this helps some cases (i.e the Vulkan Quake
ports) but hurts others (i.e. Unreal Engine 4), so for the time
being we don't enable this by default but we allow to enable it
selectively by using V3D_DEBUG.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14551>
The early-Z test uses Z values produced from FEP, so when
we write Z from a shader we need to disable EZ. However, there
are some instances where want to write the FEP-Z from the shader,
in which case we would not need to disable EZ.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>
v3dv, radv, and turnip are using several C&P format helpers (most of
them wrappers over util_format_description based helpers). methods.
This commit moves the common helpers to the already existing common
vk_format.h. For the case of v3dv we were able to remove the vk_format
header. For turnip and radv, a local vk_format.h header remains, with
methods that are only used for those drivers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13858>
When multiview is enabled, we no longer care about when a particular
attachment is first or last used in a render pass, since not all views
in the attachment will meet that criteria. Instead, we need to track
each individual view (layer) in each attachment and emit our stores,
loads and clears accordingly.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
The Vulkan spec states that when multiview is enabled the number of
layers in the framebuffer is set to one and that each attachment
must then have at least as many layers as referenced by view masks
in the subpasses in which is used.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
We implement multiview by replicating draw commands for all enabled
views and setting a command buffer state for the currently active
view we are broadcasting to.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
OpenGL ES 3.1 specifies that a geometry shader can write to gl_PrimitiveID,
which can then be read by a fragment shader.
OpenGL ES 3.2 additionally adds the capacity for the fragment shader
to read gl_PrimitiveID even if there is no geometry shader. This
commit adds support for this feature, which is also implicitly
expected by the geometry shader feature in Vulkan 1.0.
Fixes:
dEQP-VK.pipeline.framebuffer_attachment.no_attachments
dEQP-VK.pipeline.framebuffer_attachment.no_attachments_ms
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11874>
This was required to handle the case of secondary command buffers that
did not have framebuffer information available from the primary, since
we used to have an implementation that required this information to
be available for the fallback path of vkCmdClearAttachments. Now that
we can handle our our attachment clears in the current subpass by
emitting draw calls, we no longer need the framebuffer information to
be available and we can remove this.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>
This includes code from:
* v3dv_cmd_buffer
* v3dv_meta_copy
* v3dv_meta_clear
v2: move some of the functions to source files that makes more sense
now (Iago).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11310>