We were using the pipeline layout to discard uniform updates for
stages that don't use descriptors, but we can do better by keeping
track of the stages used by the specific dirty descriptor sets and
only update uniforms for stages that are included in those.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10283>
Dedicated BOs waste memory and are also a significant cause of CPU
overhead when applications use hundreds of them per frame due to
all the work the kernel has to do to page in all these BOs for a job.
The UE4 Vehicle demo was hitting this causing it to freeze and stutter
under 1fps.
The hardware allows us to setup groups of 16 queries in consecutive
4-byte addresses, requiring only that each group of 16 queries is
aligned to a 1024 byte boundary. With this change, we allocate all
the queries in a pool in a single BO and we assign them different
offsets based on the above restriction. This eliminates the freezes
and stutters in the Vehicle sample.
One caveat of this solution is that we can only wait or test for
completion of a query by testing if the GPU is still using its BO,
which basically means that we can only wait for all active queries
in a pool to complete and not just the ones being requested by the
API. Since the Vulkan recommendation is to use a different query
pool per frame this should not be a big issue though.
If this ever becomes a problem (for example if an application does't
follow the recommendation and instead allocates a single pool and
splits its queries between frames), we could try to group queries
in a pool into a number of BOs to try and find a balance, but for
now this should work fine in most cases.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10253>
In these cases we know that the BO has not been added to the job
before, so we can skip the usual process for adding the BO where
we check if we had already added it before to avoid duplicates.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10210>
Ensure subpass_idx has a valid value; we use "-1" as invalid one.
Fixes CID#1468096 "Macro compares unsigned to 0 (NO_EFFECT)"
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10203>
This can be called outside a render pass so we should not expect to have
a job available. Also, we should not be emitting state here, instead we
should do in the pre-draw handler with all the other draw call state.
Fixes cases of crashes in RenderDoc when selecting elements in the
Event Browser.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10130>
Fix defect reported by Coverity Scan.
Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement: return;.
Fixes: bdf93f4e3b ("v3dv/cmd_buffer: return early for draw commands if there is nothing to draw")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9890>
Do not assign to a variable that won't be used.
Fixes CID#1468098 "Unused value (UNUSED_VALUE)".
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9910>
We are providing a BO with the default attribute values for the
GL_SHADER_STATE_RECORD, that contains 16 vec4. Such default value for
each vec4 is (0, 0, 0, 1). As the attribute format could be int or
float, the "1" value needs to take into account the attribute format.
But in the practice, the most common case is all floats. So we create
one default attribute values BO assuming that all attributes will be
floats, and we store it at v3dv_device and only create a new one if a
int format type is defined. That allows to reduce the amount of BOs
needed.
Note that we could still try to reduce the amount of BOs used by the
pipelines if we create a bigger BO, and we just play with the
offsets. But as mentioned, that's not the usual, and would add an
extra complexity,so it is not a priority right now.
This makes the following test passing when disabling the pipeline
cache support:
dEQP-VK.api.object_management.max_concurrent.graphics_pipeline
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9845>
So for example, on v3dv_CmdDrawIndexed we can return early if
instanceCount is 0.
This fixes failures when using the simulator with tests with the
following pattern:
dEQP-VK.draw.instanced.draw_indexed_vk_primitive_topology*
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9820>
Until now we were always doing a two-step cache lookup, as we were
using the NIR shaders to fill up the key to lookup for the compiled
shaders. But since we were already generating the sha1 key with the
original SPIR-V shader (or its internal NIR representation) any info
we were collecting from from NIR is already implicit in the original
shader, so we can avoid using the NIR in most cases.
Because the v3d_key that is used to compile a shader is populated with
data coming directly from the NIR shader or produced during NIR
lowerings, we can't use it directly as part of the pipeline cache
entry. We could split them, but that would be confusing, so we add a
new struct, v3dv_pipeline_key used specifically to search for the
compiled shaders on the pipeline cache. v3d_key would be still used to
compile the shaders.
As we are using the same sha1 key for all compiled shaders in a
pipeline, we can also group all of them in the same cache entry, so we
don't need a lookup for each stage. This also allows to cache pipeline
data shared by all the stages (like the descriptor maps).
While we are here, we also create a single BO to store the assembly
for all the pipeline stages.
Finally, we remove the link to the variant on the pipeline stage
struct, to avoid the confusion of having two links to the same
data. This mostly means that we stop to use the pipeline stage
structures after the pipeline is created, so we can freed them.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
As we plan to try to get directly the compiled variant from the cache,
it would be possible to not have available the nir shaders, so we add
this info on prog data.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
Fix defect reported by Coverity Scan.
Side effect in assertion (ASSERT_SIDE_EFFECT)
assignment_where_comparison_intended: Assignment job->ez_state =
VC5_EZ_DISABLED has a side effect. This code will work differently in a
non-debug build.
Fixes: cec2ed7c80 ("v3dv: fix disabling Early Z for the whole frame")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8666>
From vkCmdBindPipeline spec:
"pipelineBindPoint is a VkPipelineBindPoint value specifying to
which bind point the pipeline is bound. Binding one does not disturb
the others."
But internally we were only handling one pipeline per command buffer,
so binding a pipeline of one type would override an alredy bound
pipeline of other type.
Note that for push constants, in the same way that we were keeping one
client array and one bo for the values, for all stages, independently
of the stageFlags specified by vkCmdPushConstants, we are keeping the
same idea here, so such client array and bo is still tied to the
command buffer, and used by the two pipeline bind points. That makes
far easier tracking the push constants. We could revisit in the future
if we want a more fine grained tracking.
Fixes the following crashes:
dEQP-VK.pipeline.push_constant.lifetime.pipeline_change_diff_range_bind_push_vert_and_comp
dEQP-VK.pipeline.push_constant.lifetime.pipeline_change_same_range_bind_push_vert_and_comp
v2 (from Iago review)
* Move removal of v3dv_resource definition to a different commit.
* Use the new v3dv_cmd_pipeline_state on the cmd buffer meta
sub-struct, call it gfx for consistency
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8613>
The documentation states that if we disable Early Z for the whole
frame in the RCL Tile Rendering Mode packet, then we should not
emit any draw calls with it enabled (which we can do by enabling
it in the CFG_BITS packet).
Since we emit our RCL after recording our draw calls in the BCL
and we were not considering there if any condition for global disable
would be met, it was possible that we end up with an incorrect
configuration when we decide for a global disable in the RCL, which
can cause rendering artifacts. This can be easily observed by simply
forcing the RCL bit to disable early Z in applications that are known
to enable it in CFG_BITS (such as the UE Shooter demo for example).
With this change we keep track of this scenario when we record
draw calls in the BCL and if decide that we need to disable EZ for
the entire job, we make sure we never enable it for any draw calls
in the frame.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8589>
This is an optimization that should make Z/S clears faster. To enable
this we can't have any Z/S loads or stores in the job. Also, it seems
that enabling early Z/S clearing is independent of whether early Z/S
testing is enabled.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8589>
There was a misunderstanding regarding the scope of some hardware bugs
that led us to think that:
1. The Clear Tile Buffer Z/S bit was broken
2. The Clear Tile Buffer RTs bit would also clear Z/S.
1) is not really true, what happened was that some other bugs for which
we need workarounds anyway would have that effect. 2) was only true
for V3D 4.1, so it doesn't affect v3dv.
This change makes proper use of the Z/S bit instead of falling back to
clearing all tile buffers every time we have a Z/S clear. This also
allows us to do color clears on the tile store (which is faster) rather
than falling back to the clear all RTs bit every time we have a Z/S clear.
v2: rewrite the original comment about the hardwarebug description to
include recent discussions with Broadcom instead of keeping it as
is and amending it with an update note.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8589>
I saw this while inspecting CL dumps from the UE Shooter demo,
where they disable Z writes for occlusion queries. The hardware
is probably doing this internally, but it doesn't hurt
to do this explicitly and make CL traces consistent with intended
behavior.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8571>
If we have dirty descriptor set state we have to update our uniform
data to reference the new resources such as addresses for textures
or UBOs. This is known to have a high CPU cost, so we want to limit
this as much as we can.
It is a common rendering pattern in applications to render many objects
using the same pipeline, but modifying the descriptor sets bound to update
textures, UBOs, etc. In this scenario, we would be incurring in unnecessary
uniform stream updates for stages that don't access descriptor sets at all.
This change makes it so we track which shader stages in a pipeline
use descriptor set state and skips updating uniform streams for them
when dirty descriptor set state is the only reason requiring us to
generate new uniform streams for a draw call.
v2: reuse shader stage information from the pipeline set layouts
to track shader stages that use descriptor state.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8555>
Caused to return early wrongly on CmdPushConstants with some tests
using several calls to that method. As we are here we are also
replacing the (void *) casting at the memcpy below.
Fixes: e1c8041cde ("v3dv: try harder to skip emission of redundant state")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7718>
Writing uniform streams is performance sensitive so we should try our
best to avoid writing new uniforms if they have not changed. Particularly,
if only the vertex buffers have changed, we should not write new uniforms.
This improves performance in vkQuake2 by about 11.15%.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7683>
Used as reference Hyujun's commit
5d3fdbc52b, that does the same for
turnip.
This commit also replaces in several cases alloc for zalloc, and adds
checks on more Destroy methods if the object to be free is NULL or
not. Most of them were needed to avoid crashes/weird behaviour due
trying to use un-initialized data. Note that now that vk_object_free
iterates over a array, making it more against un-initialized or just
NULL data.
Additionally, using zalloc we can also remove some memset to 0. In
fact we needed to remove them, as if not, they would override the
vk_object_base object to 0 (the alternative would me doing a memset
computing a pointer offset, but that's is not needed as we can just
use zalloc).
v2:
* Call memset(0) on reused descriptor sets when calling
ResetDescriptorPool, not when reallocating them (Iago)
* Add null check when calling DestroyImageView (detected by a full CTS run)
v3: Fixed rebase conflicts after last meta copy/clear changes
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7627>
In preparation to the changes that would allow to not need them.
It is worth to note that it is likely (we have some ideas in mind)
that we would need to bring back pre-generate variants on the
future. The approach is slightly different on v3dv_pipeline vs
v3dv_cmd_buffer:
* v3dv_pipeline: even after the clean-up, we had code for all the
functions they have, even if they were doing less things
(specifically, a second shader variant), so they still make sense
on their own, and serve as template for adding support of multiple
pre-generated shader variants in the future.
* v3dv_cmd_buffer: as we really don't need to fill up the key with
some after-pipeline data, we would end with some functions empty
(specifically cmd_buffer_populate_v3d_key). Even as a placeholder,
that would be odd. Additionally the current code has a lot of
boilerplate code (functions to fill up vs, cs and fs keys are
basically the same), and we already have in mind refactor them. So
it would be better to remove all of them, instead of keeping
around some code we would not be happy with. If in the future we
pregenerate more that one variant, hopefully the new code to chose
between them would be better.
v2: clarify the commit message, and fix typos on the comments (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>
If a secondary command buffer has occlusion query inheritance then
draw calls recorded in it should update an active occlusion query
counter started in the primary command buffer.
If executing the secondary in a primary required to emit jobs and
not just a branch instruction, then we might need to create a new
job for the primary as well, and in that case we would lose the
occlusion query state, so we need to re-emit it at that point so
any additional draw calls recorded into the secondary that is being
executed continue to update the counter.
Fixes:
dEQP-VK.query_pool.concurrent_queries.secondary_command_buffer
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7373>
V3D doesn't provide any means to acquire timestamps from the GPU
so we have to implement these in the CPU.
v2: enable timestampComputeAndGraphics and set timestampPeriod (Piñeiro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7373>
So far for the formats E5B9G9R9_UFLOAT_PACK32 and
B10G11R11_UFLOAT_PACK32 we were using a XYZW swizzle. But from Vulkan
spec those are three-component, without alpha, formats. So we should
use XYZ1 instead, as we were already doing for other three-component
formats.
Curiously the only case where this raised a problem were when using
clamp to border with transparent black. This change allows us to
remove the code that handled only that specific case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7355>
Our blit shader path allocates a descriptor pool to create
combined image sampler descriptors for blit source images. So
far, we had sized this pool statically and the driver would
fail if we ever need to allocate more descriptors than that.
With this change, we switch to using a dynamic allocation
mechanism instead where we allocate as many pools as we need to
meet descriptor set allocation requirements for the command buffer.
Also, every time a new pool needs to be created, we double its
size (up to a limit), so we can start small and avoid wasting
memory for command buffers that only have a small number of blits,
while trying to keep allocation overhead low for command buffers
that record a lot of blits.
v2: use existing framework for automatic destruction of private
driver objects to free allocated pools.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7311>
If we are blitting to tile boundaries we don't need to emit
tile loads. The exception to this is the case where we are
blitting only a subset of the pixel components in the image
(which we do for single aspect blits of D24S8), since in that
case we need to preserve the components we are not writing.
There is a corner case where some times we create framebuffers
that alias subregions of a larger image. In that case the edge
tiles are not padded and we can't skip the loads.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7247>
This patch addresses various issues, mostly from secondary command buffers
that recorded pipeline barriers that are not consumed in the secondary itself,
so they need to be applied to jobs that come right after the execution of the
secondary in a primary command buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If a subpass clears one aspect of Depth/Stencil but loads the other
the clear might get lost. Fix this by emitting the clear as a draw
call instead of relying on the TLB clear.
Fixes:
dEQP-VK.renderpass.suballocation.attachment.3.307
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
PTB assumes that instance id to be 0 at start of tile, but hw would
not do that, we need to set it.
This fixes some Vulkan CTS tests that start to fails after some other
tests used an instance id.
So for example, before this commit for the following tests, executed
in that order, we got the following behaviour:
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Pass
dEQP-VK.draw.indexed_draw.draw_instanced_indexed_triangle_strip => Pass
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Fails
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were pre-generating two variants, an all 16 bit return_size
and an all 32-bit return_size, as at pipeline creation time we don't
know the texture format that it would be used finally used.
But it is possible to override or at least refine the 32bit case, as
we know in advance that all shadow textures can (and in fact should)
use return_size 16bit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the framebuffer has no attachments then multisample rasterization
is enabled based on the rasterizationSamples multisample state of
the pipelines. It should be noted that since we don't support
the variableMultisampleRate feature, all pipelines in the same
subpass must have matching number of samples.
V3D requires that we specifically setup our frames to enable
multisampling or not, and we do this when we create jobs inside
a subpass. Since we create the first job for a subpass as soon as
the subpas starts, this is problematic: if we don't have any
attachments, we don't won't enable MSAA at this point, but later
on we might bind an MSAA pipeline, since pipelines can be bound
at any point in the lifespan of a command buffer.
Here, we fix this by testing if the first draw call in a job uses
an MSAA pipeline but the job the was setup to not use MSAA, and in
that case we re-start the job with MSAA enabled.
We also take care of a corner case that seems to be tested by CTS
where a framebuffer with no attachments doesn't bind any pipelines
with MSAA enabled (so according to the Vulkan spec, multisample
rasterization must be disabled) but the fragment shader in use
reads gl_SampleID (which enables per-sample shading). This would
lead to enabling per-sample shading with single-sample rasterization,
which doesn't make sense and makes the simulator complain, so we just
disable per-sample shading in that case.
Fixes:
dEQP-VK.pipeline.multisample.mixed_count.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The TLB multisample resolve feature is only limited to specific format types.
For everything else, including sfloat and integer formats, we need to
fallback to a blit resolve. This needs to be handled both for in-pass
resolves as well as for vkCmdResolveImage.
Because these blits would happen after the tile store operations, we need
to make sure we store the multisampled buffers so we can then read them for
the blit resolve.
Fixes the remaining test failures in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>