Until now we would only disable EZ globally if we had a depth or stencil
load operation or if we had no draw calls at all, but even if we have draw
calls if all of them disable EZ we should also us the global disable.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16794>
When we have a draw call that is incompatible with EZ we should only
disable EZ for the remaining of the job in the case that both of the
following conditions are met:
1. The cause for the incompatibility is an incompatible depth test
direction.
2. The pipeline does Z writes.
Otherwise it is enough to disable EZ temporarily only for draw calls
with the incompatible pipeline.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16794>
Blending configuration needs to be adapted in case the RT format does
not have an alpha channel. This is handled so far correctly.
But when we have two RT, one with alpha and other without it, we need
to split the blend configuration, so one is adapted and the other not.
Otherwise we would be changing the blend config for the wrong RT.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16747>
Until know when we consumed a barrier we would implement it by
setting the serialize flag on a job, which would cause it to
be serialized across all hardware queues (CL, CSD, TFU). However,
now that we track the source(s) of the barrier, we can restrict this
to only the relevant queue(s) instead (multisync path only).
It should be noted that we can implement transfers via TFU or CL
jobs, so if the source of a barrier is a transfer, we currently
synchronize against both the TFU and the CL queues, however, we
may be able to more effectively track this in the future to
restrict this to just one of the queues.
Also, for secondary command buffers we are taking the easy way
out and always synchronize against all queues, but we should
be able to do the same for secondaries without too much effort.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now we have been tracking the dstStageMask of barriers (where they
are consumed) but not where they are produced (the srcStageMask). With
this change we extend our barrier state to keep track of this as well.
This allows the driver to have better knowledge of the intended barrier
semantics so it can limit the amount of synchronization it does only
to the source stages involved with a barrier. We will do this in a
later patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now, we have always consumed barriers with the next GPU job
recorded into the command buffer after the barrier even if the job
was not the target of the barrier itself. This works based on the
idea that when we consume a barrier in a job we serialize it against
all queues, so effectively we are ensuring that whatever came
before it has completed, so if the barrier was intended for an
even later job, it would have served its purpose anyway.
It should be noted that CL jobs are special because they are actually
split in two different queues: the binning queue and the render
queue, with a dependency between them to ensure render runs after
binning. With our current implementation, if we have 3 jobs (A, B,
C) and we have a barrier after job A which is intended to block job C
on A's completion, with our implementation we would instead block
B on A's completion. If C is a CL job, and the barrier was targetting
the binning stage then we can have the following scenarios:
1. If B) is a CL job, it will consume the barrier at its binning
stage, so we know that B's binning will not start until A has
completed. Then C's binning will not start until B's binning
has completed, and thus, will not start until A has completed,
as intended.
2. If B) is not a CL job, it will consume the barrier and will not
start until A has completed, however, C's binning job will be
submitted to the binning queue without any sync requirements
and since B did not put any jobs in the binning queue it will
start as soon as A's binning has completed, but not A's render,
which would be incorrect.
Further, since a981ac0539 we now skip consumming BCL barriers if
a job does not have draw calls that can be affected by them. In the
same scenarios as before, now case 1) would also be problematic,
since B may skip the binning sync in that case and start immediately,
and since C's binning would be allowe to start immediately after B's
binning, there is no guarantee that this doesn't happen in parallel
with A's render.
With this patch we fix this situation by tracking the intended
consumer of each barrier: graphics, compute or transfer, and we make
sure to consume them only with jobs that match those semantics.
This fixes flakyness in dEQP-VK.device_group.*
Fixes: a981ac0539 ('v3dv: skip binning sync if binning shaders don't access external resources')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
VK_KHR_format_feature_flags2 is mostly about define a new 64-bit
VkFormatFeatureFlagBits2KHR format feature flag type, as 29 bits of
the 32-bit VkFormatFeatureFlagBits are already in use.
So all the bits from VkFormatFeatureFlagBits are being replicated, and
most of the work here consist on switch to the new flags.
From the new (not replicated from VkFormatFeatureFlagBits) flag bits,
we don't support
VK_FORMAT_FEATURE_2_STORAGE_READ_WITHOUT_FORMAT_BIT_KHR or
VK_FORMAT_FEATURE_2_STORAGE_WRITE_WITHOUT_FORMAT_BIT_KHR, as right now
we require the format on the shader for doing the read and stores.
We use now VK_FORMAT_FEATURE_2_SAMPLED_IMAGE_DEPTH_COMPARISON_BIT_KHR,
but only applying it for depth formats.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16718>
As it could happens that when a bo is reused from the cache, it is
being mapped with a smaller size that needed. So let's just unmap it,
and let be remapped with the needed size.
Even if we could try to be smarter when deciding when to unmap or not,
to avoid uneeded re-mappings later, it is also true that doing the
unmap would help to reduce the memory used.
This fixes an assert when running the following tests in a row (same
deqp-vk execution):
dEQP-VK.pipeline.creation_feedback.graphics_tests.vertex_stage_fragment_stage
dEQP-VK.pipeline.executable_properties.graphics.vertex_stage_geometry_stage_fragment_stage
dEQP-VK.pipeline.executable_properties.graphics.vertex_stage_fragment_stage_internal_representations
That hits the following assertion:
assert(qpu_bo && qpu_bo->map_size >= variant->assembly_offset +
variant->qpu_insts_size);
at v3dv_pipeline.c, pipeline_get_qpu.
v2: use just one call to v3dv_bo_unmap (move call at v3dv_bo_free,
replace call at bo_free for assert) (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16678>
If the sample mask is being written it means we want to discard some of the
samples generated so we should not be promoting the fragment shader to
do early tests, since that would not take into account the sample mask
written from the shader.
Fixes:
dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_samples_4
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16626>
In f99ac7f2de ("v3dv: Don't use color aspects for depth/stencil
images"), we stopped using color aspects for depth/stencil images in a
bunch of cases. This causes us to trigger an assert in
copy_buffer_to_image_shader where it assumes 16-bit is always color but
now it can also be D16_UNORM. The assert isn't protecting us from
anything we weren't already doing before so we can just loosen it a bit.
Fixes: f99ac7f2de ("v3dv: Don't use color aspects for depth/stencil images")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16592>
We already had a little workaround for v3dv where, for some if its meta
ops, it had to bind a depth/stenicil image as color. Instead of
special-casing binding depth/stencil as color, let's flip on the
drier_internal flag and get rid of most of the checks in that case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16376>
This will allow us to disable the GLSL IR loop unroller in a
following patch and rely on the NIR loop unroller instead.
This allows the piglit test spec@!opengl 2.0@max-samplers border
to pass on the v3d rpi4 driver.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
Also document additional piglit failures and crashes with new tests.
Multiple changes, mostly notable:
- few new tests
- traces downloader improvements
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16428>
Typically we free them when we upload the QPU code from the variant
to the assembly BO in the pipeline, however, if there is an error
during pipeline compilation that may not happen and we would leak
the QPU code from the variants.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
We can output the final NIR form (which we store in the pipeline
stage) and the final QPU (which we can retrive from the assembly BO).
We should be careful not to fetch the shaders from the cache when
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR is present,
since we don't store NIR shader in the pipeline shader data that is
cached, so a cache hit would leave us without the NIR shader. The spec
already contemplates this scenario:
"Enabling this flag must not affect the final compiled pipeline but
may disable pipeline caching or otherwise affect pipeline creation
time."
We also prevent disposing of the pipeline stages the variants when this
flag is requested to ensure this information is available later when
calling vkGetPipelineExecutableInternalRepresentationsKHR.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
This is actually required by Vulkan 1.2 and to expose the extension,
so let's conform to this requirement, we don't really care since
image layouts are not relevant to our current implementation.
Fixes: 1442d77bc5 ('v3dv: trivially implement VK_KHR_separate_depth_stencil_layouts')
Fixes: dEQP-VK.info.device_mandatory_features
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16398>
We don't currently benefit from seeing barriers and layout
transitions that affect just the depth or stencil aspects,
so we don't expose this feature.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16344>
Since we implement input attachments as textures we should check
support for input attachment usage the same way we check support
for sampled images.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16344>
Until now we have been enabling binning sync if we found a barrier
involving geometry stages (a bcl barrier), however, if the actual
binning shaders involved with the job don't access any external
buffers or images there is no reason to sync at the binning stage.
In this patch we don't immediately consume the bcl barrier flag from
the command buffer state when we create a new job. Instead, we check
this state when we are about to emit a draw call by checking if the
shaders involved with binning may access external resources, such as
vertex buffers, UBOs, or textures. If none of the draw calls in the
job use binning shaders that access external resources then we never
enable binning sync for the job.
It is possible that a binning shader uses resources that are not
synchronized through a barrier though, so we keep track of the
access masks used with barriers for both buffers and images separately
to better identify if the binning shader is affected by the barrier.
If a serialized job never consumes the bcl barrier flag because none
of its draw calls ever required a bcl sync, then the flag will be
cleared when the job is finished.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
If a shader doesn't use any samplers (including default sampler states),
make sure we drop them so other parts of the driver can recognize that
the program doesn't actually use any samplers at all.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
Fixes failures on tests like this when the on-disk-cache is enabled:
dEQP-VK.binding_model.descriptor_copy.compute.uniform_buffer_0
We only found them when running full CTS runs. What happens is that we
got a hit from the on-disk shader cache, for several tests using the
same shaders. But some tests seems to be using a uniform buffer, and
others a inline buffer. Right now inline buffers leads to some changes
on the final nir shader, and generated assembly, compared with uniform
buffers. So we got a wrong shader. Fortunately we only got an assert
instead of weird behaviour.
With this commit we include the pipeline layout on the pipeline sha1,
so those two cases would get different sha1. FWIW, this is what other
drivers are already doing.
Surprisingly that didn't cause a problem before.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
If we are calling pipeline_cache_upload_shared_data with
from_disk_cache, that means that we had used the disk-cache to found
that entry. And that should only happens if we didn't find the entry
on the cache. So on that case we can skip to search for it.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
Layout transitions are not relevant to us, we only care about barriers
that involve a sync point between read/write actions on the image across
GPU jobs.
Image transitions from undefined layout can only happen before the image
is ever used by the GPU, which means they are never relevant to our
implementation.
This improves performance in vkQuake.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16235>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>