We were inserting as key directly the local key variable used to
search for entries, but hash_table expect a real pointer. Fixed by
using the array of keys that we already had at v3dv_pipeline.
Fixed failures on the rpi4 like:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.a1r5g5b5_unorm_pack16.a1r5g5b5_unorm_pack16.general_general_linear
but fwiw, this tests on the simulator, and several other tests on both
the simulator and rpi4, were working just by luck.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the framebuffer has no attachments then multisample rasterization
is enabled based on the rasterizationSamples multisample state of
the pipelines. It should be noted that since we don't support
the variableMultisampleRate feature, all pipelines in the same
subpass must have matching number of samples.
V3D requires that we specifically setup our frames to enable
multisampling or not, and we do this when we create jobs inside
a subpass. Since we create the first job for a subpass as soon as
the subpas starts, this is problematic: if we don't have any
attachments, we don't won't enable MSAA at this point, but later
on we might bind an MSAA pipeline, since pipelines can be bound
at any point in the lifespan of a command buffer.
Here, we fix this by testing if the first draw call in a job uses
an MSAA pipeline but the job the was setup to not use MSAA, and in
that case we re-start the job with MSAA enabled.
We also take care of a corner case that seems to be tested by CTS
where a framebuffer with no attachments doesn't bind any pipelines
with MSAA enabled (so according to the Vulkan spec, multisample
rasterization must be disabled) but the fragment shader in use
reads gl_SampleID (which enables per-sample shading). This would
lead to enabling per-sample shading with single-sample rasterization,
which doesn't make sense and makes the simulator complain, so we just
disable per-sample shading in that case.
Fixes:
dEQP-VK.pipeline.multisample.mixed_count.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
According to the spec, if a fragment shader reads gl_SampleID then the
shader must be evaluated per-sample.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.write_sample_mask.4_samples
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For example, regarding gl_SampleID, the GLSL spec states:
"Any static use of this variable in a fragment shader causes the
entire shader to be evaluated per-sample."
So we need to track if the fragment shader does anything that implicitly
enables per-sample shading in the compiler for the driver to
auto-enable sample rate shading if needed.
v2:
- Instead of tracking reads of gl_SampleID, check SYSTEM_BIT_SAMPLE_ID
and SYSTEM_BIT_SAMPLE_POS as well as the sample layout qualifier like
other drivers are doing to activate this behavior (Eric).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Now that we added support for texel_buffers, on all the cases that we
were checking for a image_view we end checking for a image_view or
buffer_view, so we stopped to use it. Remove it as it become
superfluous.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is intended to return the sample location within the pixel.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_position.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the formats are not suitable as texture type, then they can't be
used as texel buffers.
Gets tests like the following one:
dEQP-VK.image.load_store.without_format.buffer.r32g32b32_sfloat_minalign_uniform
to be properly skipped (instead of Crash on the simulator)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The TLB multisample resolve feature is only limited to specific format types.
For everything else, including sfloat and integer formats, we need to
fallback to a blit resolve. This needs to be handled both for in-pass
resolves as well as for vkCmdResolveImage.
Because these blits would happen after the tile store operations, we need
to make sure we store the multisampled buffers so we can then read them for
the blit resolve.
Fixes the remaining test failures in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we understand that texture accesses should be aligned to the UIF
block size.
Fixes several of the CTS tests under this pattern:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.uniform_texel_buffer.*.offset_nonzero
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_texel_buffer.*.offset_nonzero
Note: for those tests, using a lower value (64) was enough to get them
working, but again, we understand that the real alignment is the UIF
block size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are several definitions for hw limits on v3dv_image that we want
to share, but v3dv_private was already growing bigger and messier.
So let's move them to a specific header. Note that there is already a
broadcom/common/v3d_limits.h. We are not putting them there because
right now they are only used by the Vulkan driver, but are candidates
to be moved.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We didn't need this until now, since this was included with GLES 3.2,
but we need it for Vulkan.
Eric had already done the plumbing for it though, we just need to
actually emit the mask.
Fixes some tests in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This should be able to handle partial copies of multisampled images.
This change extends our blit shader interface to also handle multisampled
destinations so that if the blit destination is a multisampled image,
the blit will rely on sample rate shading to copy all samples from
the source image (which must have a matching number of samples).
I have not found any tests in CTS that do partial copies of
multisampled images, so I tested this with a full multisampled image
copy, using this test:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This fallback is required when we have to do partial resolves. It
works the same way as other blit fallbacks for copy operations: it
will bind the source image as a source texture and blit the selected
region to the destination image.
The difference in this case is that the source image is multisampled
and the blit shader needs to fetch and average individual samples for
each texel.
This gets us to pass all the remaining test cases in
dEQP-VK.api.copy_and_blit.core.resolve_image.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
vkCmdCopyImage can be used to copy multisampled images. We can
easily support that on the TLB path, which copies full images.
For partial copies we will need to amend our blit shader path
to support multisampling resolve.
Fixes:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were just assuming that it would work (so we could try to
access a NULL pointer), and not freeing it properly.
Fixing that was somewhat messy due pipeline_compile_graphics using a
temporary array and then update the internal pipeline stages. As we
are here we just removed the array and simplified
pipeline_compile_graphics code.
Fixes following tests:
dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In order to reduce the number of shader builds after pipeline creation
(that ideally shouldn't happen) we pre-generate two shader variants at
pipeline creation time. In addition to the default one, that set the
return size for all texture to 16 bit, we build another variant
setting the return size for all textures to 32-bit. cmd buffer selects
the latter if any of the textures requires 32bit.
So we are using an all 16-bit return size or an all 32-bit return size
variants. This could be slightly improved by pre-generating return
size combinations if the texture number is below a threshold. But that
would require more space, and bigger pipeline creation time, so would
need to be evaluated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far, when checking for a variant fulfilling a specific v3d key, we
were checking the caches, and if that failed, we compiled a new
variant, and update the current variant.
But we could check first if the current variant fullfils that. This
was not really problematic so far, as checking on the caches was fast,
but now that we could be without any kind of shader cache using
V3DV_ENABLE_PIPELINE_CACHE, it is far better to check first current
variant.
Without this vkQuake3 at 720p drops to 1fps when disabling the cache.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
That it would be used as fallback. Three advantages:
* Having a cache for user operations even if the user doesn't
provide it.
* Having a cache for internal operations. v3dv_meta_copy creates
pipelines for some copy path, so it is interesting to have them
cached.
* Testing: so now the pipeline cache is tested by more CTS tests.
As any other pipeline cache, it can be disabled with the
V3DV_ENABLE_PIPELINE_CACHE. It was suggested that would make sense to
have a specific envvar for the default pipeline cache, but for now
just one envvar is enough.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far for private pipelines we were creating dummy shader modules
where we directly provided the nir shader. But for the pipeline cache
we were using the SPIR-V to generate part of the cache key sha1.
The main use case for private pipelines are meta_copy/clear. Those nir
shaders depend on parameters like the format etc, so we use directly
the serialized form of the NIR shader to generate the sha1.
The other case are the no-op fragment shader that we need to provide
if no fragment shader is defined by the user. For that case we can
just use the default shader name, as the no-op shader is always the
same.
This is required as we plan to add a default pipeline cache, that
would include our private shaders too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This also includes being able to serialize them as part of
GetPipelineCacheData and to deserialize it as part of
CreatePipelineCache.
So now we can also upload the assembly of the variant as part of the
PipelineCache creation.
Note that from all this the tricky part was the prog_data
serialization. v3d_compile allocates and fill a new prog_data, with
rzalloc. Among other things because it also allocates internally the
uniform list. So we needed to replicate that when deserializating the
prog_data. Ideally we would like to avoid that, and allocate as much
resources as possible using vk_alloc, but that would mean a somewhat
deep change on the v3d_compiler, that we want to avoid as much
possible for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Heavily based on anv nir caching. One of the bigger difference is that
we don't create the nir shader using a ralloc_context local to the
main compile graphics method. On anv, after compiling the shader, they
discard the nir shader. We need it as we could need it to build shader
variants later.
As anv, we introduce a environment variable to disable the cache:
V3DV_ENABLE_PIPELINE_CACHE
By default is enabled. The main purpose for this envvar is debugging,
in order to provide a easy way to discard a bug on the cache.
It is pending to serialize/deserialize the NIR shaders as part of
GetPipelineCacheData and PipelineCacheCreate. We also plan is to cache
too shader variants. We would do that on following patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
And this means providing a proper cache object, and being able to
load/retrieve a cache data with a proper header. Not really caching
anything yet. That would be tackle on following patches.
Note that this no-op cache got all the specific pipeline_cache and
pipeline.cache tests passing on the rpi4.
The following tests are still crashing when using the simulator:
dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_compute
dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics
But those are an issue of synchronization tests on the simulator, and
not related with the pipeline cache itself. In general synchronization
tests should be tested on the rpi4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We should always save state on a push before starting a meta operation,
even if we don't have a pipeline, since dynamic state can be set at any
time directly on the command buffer. Similarly, we should always restore
it if the pop after the meta operation signals that it has written any
state, not only if we have a graphics pipeline to restore.
Fixes a rendering artifact in VkQuake.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since vkCmdClearAttachments executes inside a render pass, we would
benefit from converting it to a draw within the current subpass job to
improve batching and avoid expensive tile load/store operations.
This can dramatically improve performance for applications using this
command, however, we can only use this if we are clearing the base
layers of framebuffer attachments, since otherwise we would need to
use layered rendering, which we don't support yet.
This improves vkQuake3 performance dramatically (almost 100%
performance improvement at 1080p), which calls this twice per frame.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>