I saw this while inspecting CL dumps from the UE Shooter demo,
where they disable Z writes for occlusion queries. The hardware
is probably doing this internally, but it doesn't hurt
to do this explicitly and make CL traces consistent with intended
behavior.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8571>
If we have dirty descriptor set state we have to update our uniform
data to reference the new resources such as addresses for textures
or UBOs. This is known to have a high CPU cost, so we want to limit
this as much as we can.
It is a common rendering pattern in applications to render many objects
using the same pipeline, but modifying the descriptor sets bound to update
textures, UBOs, etc. In this scenario, we would be incurring in unnecessary
uniform stream updates for stages that don't access descriptor sets at all.
This change makes it so we track which shader stages in a pipeline
use descriptor set state and skips updating uniform streams for them
when dirty descriptor set state is the only reason requiring us to
generate new uniform streams for a draw call.
v2: reuse shader stage information from the pipeline set layouts
to track shader stages that use descriptor state.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8555>
Caused to return early wrongly on CmdPushConstants with some tests
using several calls to that method. As we are here we are also
replacing the (void *) casting at the memcpy below.
Fixes: e1c8041cde ("v3dv: try harder to skip emission of redundant state")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7718>
Writing uniform streams is performance sensitive so we should try our
best to avoid writing new uniforms if they have not changed. Particularly,
if only the vertex buffers have changed, we should not write new uniforms.
This improves performance in vkQuake2 by about 11.15%.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7683>
Used as reference Hyujun's commit
5d3fdbc52b, that does the same for
turnip.
This commit also replaces in several cases alloc for zalloc, and adds
checks on more Destroy methods if the object to be free is NULL or
not. Most of them were needed to avoid crashes/weird behaviour due
trying to use un-initialized data. Note that now that vk_object_free
iterates over a array, making it more against un-initialized or just
NULL data.
Additionally, using zalloc we can also remove some memset to 0. In
fact we needed to remove them, as if not, they would override the
vk_object_base object to 0 (the alternative would me doing a memset
computing a pointer offset, but that's is not needed as we can just
use zalloc).
v2:
* Call memset(0) on reused descriptor sets when calling
ResetDescriptorPool, not when reallocating them (Iago)
* Add null check when calling DestroyImageView (detected by a full CTS run)
v3: Fixed rebase conflicts after last meta copy/clear changes
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7627>
In preparation to the changes that would allow to not need them.
It is worth to note that it is likely (we have some ideas in mind)
that we would need to bring back pre-generate variants on the
future. The approach is slightly different on v3dv_pipeline vs
v3dv_cmd_buffer:
* v3dv_pipeline: even after the clean-up, we had code for all the
functions they have, even if they were doing less things
(specifically, a second shader variant), so they still make sense
on their own, and serve as template for adding support of multiple
pre-generated shader variants in the future.
* v3dv_cmd_buffer: as we really don't need to fill up the key with
some after-pipeline data, we would end with some functions empty
(specifically cmd_buffer_populate_v3d_key). Even as a placeholder,
that would be odd. Additionally the current code has a lot of
boilerplate code (functions to fill up vs, cs and fs keys are
basically the same), and we already have in mind refactor them. So
it would be better to remove all of them, instead of keeping
around some code we would not be happy with. If in the future we
pregenerate more that one variant, hopefully the new code to chose
between them would be better.
v2: clarify the commit message, and fix typos on the comments (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>
If a secondary command buffer has occlusion query inheritance then
draw calls recorded in it should update an active occlusion query
counter started in the primary command buffer.
If executing the secondary in a primary required to emit jobs and
not just a branch instruction, then we might need to create a new
job for the primary as well, and in that case we would lose the
occlusion query state, so we need to re-emit it at that point so
any additional draw calls recorded into the secondary that is being
executed continue to update the counter.
Fixes:
dEQP-VK.query_pool.concurrent_queries.secondary_command_buffer
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7373>
V3D doesn't provide any means to acquire timestamps from the GPU
so we have to implement these in the CPU.
v2: enable timestampComputeAndGraphics and set timestampPeriod (Piñeiro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7373>
So far for the formats E5B9G9R9_UFLOAT_PACK32 and
B10G11R11_UFLOAT_PACK32 we were using a XYZW swizzle. But from Vulkan
spec those are three-component, without alpha, formats. So we should
use XYZ1 instead, as we were already doing for other three-component
formats.
Curiously the only case where this raised a problem were when using
clamp to border with transparent black. This change allows us to
remove the code that handled only that specific case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7355>
Our blit shader path allocates a descriptor pool to create
combined image sampler descriptors for blit source images. So
far, we had sized this pool statically and the driver would
fail if we ever need to allocate more descriptors than that.
With this change, we switch to using a dynamic allocation
mechanism instead where we allocate as many pools as we need to
meet descriptor set allocation requirements for the command buffer.
Also, every time a new pool needs to be created, we double its
size (up to a limit), so we can start small and avoid wasting
memory for command buffers that only have a small number of blits,
while trying to keep allocation overhead low for command buffers
that record a lot of blits.
v2: use existing framework for automatic destruction of private
driver objects to free allocated pools.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7311>
If we are blitting to tile boundaries we don't need to emit
tile loads. The exception to this is the case where we are
blitting only a subset of the pixel components in the image
(which we do for single aspect blits of D24S8), since in that
case we need to preserve the components we are not writing.
There is a corner case where some times we create framebuffers
that alias subregions of a larger image. In that case the edge
tiles are not padded and we can't skip the loads.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7247>
This patch addresses various issues, mostly from secondary command buffers
that recorded pipeline barriers that are not consumed in the secondary itself,
so they need to be applied to jobs that come right after the execution of the
secondary in a primary command buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If a subpass clears one aspect of Depth/Stencil but loads the other
the clear might get lost. Fix this by emitting the clear as a draw
call instead of relying on the TLB clear.
Fixes:
dEQP-VK.renderpass.suballocation.attachment.3.307
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
PTB assumes that instance id to be 0 at start of tile, but hw would
not do that, we need to set it.
This fixes some Vulkan CTS tests that start to fails after some other
tests used an instance id.
So for example, before this commit for the following tests, executed
in that order, we got the following behaviour:
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Pass
dEQP-VK.draw.indexed_draw.draw_instanced_indexed_triangle_strip => Pass
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Fails
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were pre-generating two variants, an all 16 bit return_size
and an all 32-bit return_size, as at pipeline creation time we don't
know the texture format that it would be used finally used.
But it is possible to override or at least refine the 32bit case, as
we know in advance that all shadow textures can (and in fact should)
use return_size 16bit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the framebuffer has no attachments then multisample rasterization
is enabled based on the rasterizationSamples multisample state of
the pipelines. It should be noted that since we don't support
the variableMultisampleRate feature, all pipelines in the same
subpass must have matching number of samples.
V3D requires that we specifically setup our frames to enable
multisampling or not, and we do this when we create jobs inside
a subpass. Since we create the first job for a subpass as soon as
the subpas starts, this is problematic: if we don't have any
attachments, we don't won't enable MSAA at this point, but later
on we might bind an MSAA pipeline, since pipelines can be bound
at any point in the lifespan of a command buffer.
Here, we fix this by testing if the first draw call in a job uses
an MSAA pipeline but the job the was setup to not use MSAA, and in
that case we re-start the job with MSAA enabled.
We also take care of a corner case that seems to be tested by CTS
where a framebuffer with no attachments doesn't bind any pipelines
with MSAA enabled (so according to the Vulkan spec, multisample
rasterization must be disabled) but the fragment shader in use
reads gl_SampleID (which enables per-sample shading). This would
lead to enabling per-sample shading with single-sample rasterization,
which doesn't make sense and makes the simulator complain, so we just
disable per-sample shading in that case.
Fixes:
dEQP-VK.pipeline.multisample.mixed_count.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The TLB multisample resolve feature is only limited to specific format types.
For everything else, including sfloat and integer formats, we need to
fallback to a blit resolve. This needs to be handled both for in-pass
resolves as well as for vkCmdResolveImage.
Because these blits would happen after the tile store operations, we need
to make sure we store the multisampled buffers so we can then read them for
the blit resolve.
Fixes the remaining test failures in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In order to reduce the number of shader builds after pipeline creation
(that ideally shouldn't happen) we pre-generate two shader variants at
pipeline creation time. In addition to the default one, that set the
return size for all texture to 16 bit, we build another variant
setting the return size for all textures to 32-bit. cmd buffer selects
the latter if any of the textures requires 32bit.
So we are using an all 16-bit return size or an all 32-bit return size
variants. This could be slightly improved by pre-generating return
size combinations if the texture number is below a threshold. But that
would require more space, and bigger pipeline creation time, so would
need to be evaluated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This also includes being able to serialize them as part of
GetPipelineCacheData and to deserialize it as part of
CreatePipelineCache.
So now we can also upload the assembly of the variant as part of the
PipelineCache creation.
Note that from all this the tricky part was the prog_data
serialization. v3d_compile allocates and fill a new prog_data, with
rzalloc. Among other things because it also allocates internally the
uniform list. So we needed to replicate that when deserializating the
prog_data. Ideally we would like to avoid that, and allocate as much
resources as possible using vk_alloc, but that would mean a somewhat
deep change on the v3d_compiler, that we want to avoid as much
possible for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We should always save state on a push before starting a meta operation,
even if we don't have a pipeline, since dynamic state can be set at any
time directly on the command buffer. Similarly, we should always restore
it if the pop after the meta operation signals that it has written any
state, not only if we have a graphics pipeline to restore.
Fixes a rendering artifact in VkQuake.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since vkCmdClearAttachments executes inside a render pass, we would
benefit from converting it to a draw within the current subpass job to
improve batching and avoid expensive tile load/store operations.
This can dramatically improve performance for applications using this
command, however, we can only use this if we are clearing the base
layers of framebuffer attachments, since otherwise we would need to
use layered rendering, which we don't support yet.
This improves vkQuake3 performance dramatically (almost 100%
performance improvement at 1080p), which calls this twice per frame.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This bit is helpful if we already need to store the buffer, but otherwise
we should not emit the store only to get the clear, we can use the global
clear packet for that and save us an expensive tile buffer store operation.
Also, we have not been using the per-buffer clear bit for depth/stencil
stores since "v3dv: fix depth/stencil clears on hardware", so we should
never been considering clearing needs to flag stores.
This improves vkQuake performance somewhere between 5%-15% by allowing
us to skip the store of the depth attachment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This gets vkQuake to render correctly, which creates render passes with
stencil load operations even when the depth/stencil attachment format
doesn't have a stencil aspect. While this is a bit weird, it seems to
be allowed by the spec:
"If the format has depth and/or stencil components, loadOp and storeOp
apply only to the depth data, while stencilLoadOp and stencilStoreOp
define how the stencil data is handled."
In our case we were not ignoring it and this was causing that we emitted a
Z buffer load that seemed to clobber the Z clear, preventing all draw calls
from passing the depth test.
While we are at it, also change the depth/stencil store operation (which
was already handling this scenario correctly) to use the format of the
render pass attachment description rather than the underlying image
format.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with finishing the current job in the
presence of a pipeline barrier and relying on the RCL serialization,
but of course this is not always enough.
This patch addresses synchronization across different GPU units
(i.e. draw indirect after compute), as well as cases where we need to
sync before binning.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.barrier.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The order in which we emit the attributes is relevant, since
GL_SHADER_STATE_ATTRIBUTE_RECORD packets don't include an explicit
attribute index. This means that we need to emit them in driver
location order, since the compiler uses that location to compute
attribute offsets in the VPM.
Fixes ~1300 CTS tests in:
dEQP-VK.pipeline.vertex_input.multiple_attributes.out_of_order.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For some reason, CTS expects E5B9G9R9 and B10G11R11 with
transparent black border clamping produce alpha 1 instead of 0.
Since border color takes precedence over the texture state swizzle,
the only way to fix this is to lower the texture swizzle in the shader
to set alpha to 1.
Fixes:
dEQP-VK.pipeline.sampler.view_type.*b10g11r11*clamp_to_border_transparent_black
dEQP-VK.pipeline.sampler.view_type.*e5b9g9r9*.clamp_to_border_transparent_black
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Currently, we end the current job whenever the user emits a
pipeline barrier, but we then expect to have a valid job when
we emit a draw call.
If by the time we have to emit a draw call we don't have a valid
job, we need to create one by resuming execution of the current
subpass.
Fixes some tests in:
dEQP-VK.renderpass.suballocation.attachment_allocation.input_output.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The first attribute must be active if using builtins.
This fixes a lot of simulator crashes for vertex input CTS tests.
It should be noted that some of these tests still fail after this
fix though, so there may be some other bug.
Fixes crashes in:
dEQP-VK.pipeline.vertex_input.multiple_attributes.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the meta operation did not change descriptor state then we should keep it,
not reset it.
Fixes:
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We are treating them as a special case of texture, so the commit is
mostly about integrating them with the existing
SAMPLER/SAMPLER_IMAGE/COMBINED_IMAGE_SAMPLER infrastructure.
This commit doesn't use in any special way the render pass
information, including the dependencies, so it is possible that we
would need to do something else. But this commit gets several CTS
tests, and two Sascha Willem Vulkan demos, so let's start with this
commit and handle any other use case for following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When rasterization is disabled there are a number of CreateInfo
structs that should be ignored. We were managing this correctly
for some cases, but not all of them. Specifically, viewport state
must be ignored and we weren't doing that.
Fixes:
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>