Typically we free them when we upload the QPU code from the variant
to the assembly BO in the pipeline, however, if there is an error
during pipeline compilation that may not happen and we would leak
the QPU code from the variants.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
We can output the final NIR form (which we store in the pipeline
stage) and the final QPU (which we can retrive from the assembly BO).
We should be careful not to fetch the shaders from the cache when
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR is present,
since we don't store NIR shader in the pipeline shader data that is
cached, so a cache hit would leave us without the NIR shader. The spec
already contemplates this scenario:
"Enabling this flag must not affect the final compiled pipeline but
may disable pipeline caching or otherwise affect pipeline creation
time."
We also prevent disposing of the pipeline stages the variants when this
flag is requested to ensure this information is available later when
calling vkGetPipelineExecutableInternalRepresentationsKHR.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
This is actually required by Vulkan 1.2 and to expose the extension,
so let's conform to this requirement, we don't really care since
image layouts are not relevant to our current implementation.
Fixes: 1442d77bc5 ('v3dv: trivially implement VK_KHR_separate_depth_stencil_layouts')
Fixes: dEQP-VK.info.device_mandatory_features
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16398>
We don't currently benefit from seeing barriers and layout
transitions that affect just the depth or stencil aspects,
so we don't expose this feature.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16344>
Since we implement input attachments as textures we should check
support for input attachment usage the same way we check support
for sampled images.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16344>
Until now we have been enabling binning sync if we found a barrier
involving geometry stages (a bcl barrier), however, if the actual
binning shaders involved with the job don't access any external
buffers or images there is no reason to sync at the binning stage.
In this patch we don't immediately consume the bcl barrier flag from
the command buffer state when we create a new job. Instead, we check
this state when we are about to emit a draw call by checking if the
shaders involved with binning may access external resources, such as
vertex buffers, UBOs, or textures. If none of the draw calls in the
job use binning shaders that access external resources then we never
enable binning sync for the job.
It is possible that a binning shader uses resources that are not
synchronized through a barrier though, so we keep track of the
access masks used with barriers for both buffers and images separately
to better identify if the binning shader is affected by the barrier.
If a serialized job never consumes the bcl barrier flag because none
of its draw calls ever required a bcl sync, then the flag will be
cleared when the job is finished.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
If a shader doesn't use any samplers (including default sampler states),
make sure we drop them so other parts of the driver can recognize that
the program doesn't actually use any samplers at all.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
Fixes failures on tests like this when the on-disk-cache is enabled:
dEQP-VK.binding_model.descriptor_copy.compute.uniform_buffer_0
We only found them when running full CTS runs. What happens is that we
got a hit from the on-disk shader cache, for several tests using the
same shaders. But some tests seems to be using a uniform buffer, and
others a inline buffer. Right now inline buffers leads to some changes
on the final nir shader, and generated assembly, compared with uniform
buffers. So we got a wrong shader. Fortunately we only got an assert
instead of weird behaviour.
With this commit we include the pipeline layout on the pipeline sha1,
so those two cases would get different sha1. FWIW, this is what other
drivers are already doing.
Surprisingly that didn't cause a problem before.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
If we are calling pipeline_cache_upload_shared_data with
from_disk_cache, that means that we had used the disk-cache to found
that entry. And that should only happens if we didn't find the entry
on the cache. So on that case we can skip to search for it.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
Layout transitions are not relevant to us, we only care about barriers
that involve a sync point between read/write actions on the image across
GPU jobs.
Image transitions from undefined layout can only happen before the image
is ever used by the GPU, which means they are never relevant to our
implementation.
This improves performance in vkQuake.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16235>
Dumb buffers do not work with AMD gpus. So use AMD ioctl to create
proper buffers.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16187>
Instead of calling later an ioctl to get the device id, let's store it
while initializing the physical device.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16187>
The key is created on stack, so as soon as the function returns this key
is lost, so the inserted key in the hashtable is invalid.
Rather, insert a duplicated version on heap.
This fixes a stack-buffer-overflow when running some Vulkan CTS tests.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16083>
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:
...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.
As a consequence, this unit test:
...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.
This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.
Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called. This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.
Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
This is trivial thanks to the emulated timelines provided in common
code. "Real" timeline semaphores which can be shared across processes
will require kernel support.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Even if we're the first job on some queue, there may be no wait
semaphores but we still need to ensure things happen in-order. (See
the "Implicit Synchronization Guarantees" section of the Vulkan spec.)
The client can submit back-to-back command buffers with no semaphores
between them and it needs to adt the same as if there were a semaphore.
If job->serialize is set because of a barrier or something, we still
need to synchronize across HW queues by waiting on last_job_syncs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
In order to properly wait for a query to be complete, we need to first
wait for the end query job to flush through on the queue. Since query
end is always handled on the CPU, we can do this with a condition
variable. The 2s timeout is taken from ANV.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Vulkan requires that, once the device has been lost, you keep returning
VK_ERROR_DEVICE_LOST. We've got tracking for this in common code; it
just needs to be wired up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
This only works because c11/threads.h is typedeffing the c11 stuff to
ptrheads.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Instead of having the CPU job execute the CSD job, put both jobs on the
list with the CPU job first which modifies the GPU job which gets kicked
off next. This gives the queue code more visibility into what types of
jobs are actually in the list. In particular, if an indirect compute
job is the last job in a batch buffer, it currently appears as if the
batch ends with CPU work which isn't true because it kicks off GPU work.
In that case, the last job on the list is now a GPU job, which better
matches reality.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
The array is allocated for VkDrmFormatModifierPropertiesEXT, so
writring entried with type VkDrmFormatModifierProperties2EXT is
bogus.
It seems this was a mistake added with a change intended to get
rid of VK_OUTARRAY_MAKE, that changed the type of the write by
mistake.
Fixes: 56a2ccf058 ('v3dv: Stop using VK_OUTARRAY_MAKE()')
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15819>
We don't support 'Update After Bind', however, the limits for this
model also include the ones without it. See the with or without remark
in the spec below:
"maxPerStageDescriptorUpdateAfterBindInlineUniformBlocks is similar to
maxPerStageDescriptorInlineUniformBlocks but counts descriptor bindings
from descriptor sets created with or without the
VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit set."
Fixes:
dEQP-VK.api.info.vulkan1p2_limits_validation.ext_inline_uniform_block
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15732>
Thix fixes two bugs. First, we stop leaking in/out fences with
multisync. Because the in_syncs and out_syncs parameters to
set_multisync were arrays and not pointers to arrays, the caller's
in_syncs and out_syncs pointers never got set and remained NULL so
multisync_free() always sees to NULL pointers and does nothing, leaking
both arrays. Not sure how this isn't showing up in the dEQP leak check
tests.
Second, the struct drm_v3d_multi_sync was in the scope of the then
clause of the `if (device->pdevice->caps.multisync)` so it goes out of
scope before the ioctl. This is, effectively, a use-after-free and,
depending on stack allocation details, may result in the multisync
extension struct getting stompped before the ioctl.
Fixes: ff8586c345 ("v3dv: enable multiple semaphores on cl submission")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15512>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
Based on a similar fix for RADV.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5893
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15634>
Inline uniform blocks store their contents in pool memory rather
than a separate buffer, and are intended to provide a way in which
some platforms may provide more efficient access to the uniform
data, similar to push constants but with more flexible size
constraints.
We implement these in a similar way as push constants: for constant
access we copy the data in the uniform stream (using the new
QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from
which we need to copy and for indirect access we fallback to
regular UBO access.
Because at NIR level there is no distinction between inline and
regular UBOs and the compiler isn't aware of Vulkan descriptor
sets, we use the UBO index on UBO load intrinsics to identify
inline UBOs, just like we do for push constants. Particularly,
we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this,
however, unlike push constants, inline buffers are accessed
through descriptor sets, and therefore we need to make sure
they are located in the first slots of the UBO descriptor map.
This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS
slots of the map, with regular UBOs always coming after these
slots.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
We're trying to replace VK_OUTARRAY_MAKE() by VK_OUTARRAY_MAKE_TYPED()
so people don't get tempted to use it and make things incompatible with
MSVC (which doesn't support typeof()).
Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15522>
This was waiting for multisync support in our kernel interface so
we can wait on the actual imported payload of a semaphore rather
than the last job we submitted.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>