Commit c11f47481a ("panvk: stop CPU mapping all index buffers on JM")
stopped mapping the buffer objects on v9-, but it forgot to remove
panvk_buffer::host_ptr and panvk_cmd_graphics_state::host_addr.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36934>
Images aren't always coherent with L2 and AMD generations have
different rules, see radv_image_is_l2_coherent() for the full picture.
This fixes a rendering issue on GFX9 because depth/stencil images
aren't coherent, but this also affects color images.
This also fixes a cache coherency issue with an ongoing extension.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12274
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36815>
Based on the previous commit, we can also remove the dynamic allocation
for command memory from the submit path and use the new pool instead.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36609>
utrace requires some memory to allocate buffers for timestamps and
indirect data each submit. It is expensive to allocate it from the
kernel each time. Instead, allocate a big bo upfront and hand it out
in small pieces later using util_vma_heap.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36609>
This should fix a deadlock I saw on
dEQP-EGL.functional.sharing.gles2.multithread.random.programs.link.7 on
radeonsi, where we were waiting on an invalid fence[1] value. This was
probably because between when we started setting up the fences for
u_queue_finish and when we waited on those fences, a second thread was
created.
Closes: #13738
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36919>
We pass the tests for exchange, load, and store on R32_SFLOAT, including
shared memory (which the proprietary driver does not advertise). The blob
does not support add operations either.
Passes:
dEQP-VK.glsl.atomic_operations.exchange_float*
dEQP-VK.image.atomic_operations.exchange*r32f*
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36907>
In some situations the implementation of `BITSET_EXTRACT` would read
beyond the size of the bitset due to an unconditional + 1 in the address
calculation.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Fixes: 0cc9443e9b ("util: Add BITSET_EXTRACT")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34605>
f2f640f35 introduces base variants, so we get 2 NIR shaders:
glsl-to-nir -> base_serialized_nir
-> serialized_nir
Then, depending on who is using the shader the right one would be picked:
* draw uses base_serialized_nir
* hw driver uses serialized_nir
ef0c9231a7 made sure that base wasn't used when the shader is loaded from
the cache because it's missing, so in this case, glsl-to-nir does:
glsl-to-nir -> from-cache -> serialized_nir
The problem is that if draw tries to use this shader it may fail, for
the same reason as the referenced commits were introduced: draw may not
be compatible with some NIR passes used by the driver.
To fix this we need to serialize both NIR shaders, and pick the right
one depending on the user.
Fixes: ef0c9231a7 ("mesa/st: don't use serialized_nir for cached shaders")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36411>
Multiple contexts can use those causing deadlocks if e.g. fence_get_fd
gets called before fence_server_signal on another thread on the same
pipe_fence_handle.
Cc: mesa-stable
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36819>
GPFIFO class changed a bit with the years and some
things doesn't parse well on those traces so let's allow to decode with
Ampere A GPFIFO if we are decoding Ampere or later.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36093>
Disk cache is disabled on Android because by default it is
managed by EGL_ANDROID_blob_cache layer. However there are cases
or custom Android builds where disk cache is needed, then it can be
explicitly enabled via `mesa.shader.cache.disable=false` property
and cache path must be set via `mesa.shader.cache.dir`.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36821>
This is an issue found in testing multiple mediafoundation MFT
concurrently. Thanks to Jesse for the fix.
cc: mesa-stable
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36923>
Add it only on external format code path so that no api level guard is
needed. It automatically works with gralloc impls that support
allocating such format.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
ANV currently carries a partial copy of the gralloc mapper's format
resolving code, while the ground truth solely resides inside the
gralloc. The local copy is delicate and unable to maintain compatibility
with different gralloc implementations because AHB formats like
Y8Cb8Cr8_420 and IMPLEMENTATION_DEFINED are flexible formats, and can be
resolved to different underlying drm fourcc formats depending on the
usage and media IPs.
The common impl is more correct as it relies on the info from gralloc
mapper side, and it only sets the minimal set of explicit formats to
avoid hitting spec corner case of allocating out AHB with flexible
formats (missing half of the media usage bits might end up allocating
something different that potentially get resolved to a different
VkFormat as well).
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
The vk_image::ahb_format is for drivers that support more than the
common explicit AHB formats. It is used on AHB image memory export
allocation path, and more specifically vk_device_memory_create will
use that AHB format to allocate the AHB out from gralloc. To be noted,
export allocation path only deals with explicit format but not external
format. So even with the obsolete HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
private format, we don't need such either as multi-planar formats are
supposed to be reported as external format.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
The current impl misses the probe against gralloc mapper, which is the
required handshake before advertising support. For simplicity, just
adopt the common AHB helper. It does not rely on driver specific format
mapping, since the query doesn't allow external format at all.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
AHB images are created with the right VkFormat when external format
isn't used. When external format does get used, the proper VkFormat has
already being set in the common runtime. Upon AHB props query, we
resolve external format to VkFormat and set to the externalFormat field
to be used by the app. The app would than chain the exact external
format when creating the AHB image if it wants to go down the external
format code path instead of being explicit. So in the end, the format we
resolve is the format we get. Thus no need to set it twice.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
An AHB with IMPLEMENTATION_DEFINED format is commonly backed by NV12 or
XBGR8888. The former is the usual pick for camera <-> GPU interop, while
the latter is mostly only seen in Android CTS. Ideally, we can rely on
the queried fourcc to resolve everything instead of being on the
fallback path, but keeping this a minimal fix is easy for porting.
Cc: mesa-stable
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
If we don't need any prime blits then there's no reason to submit
anything to the queue. We can just signal the signal semaphores and
fences with the wait semaphores and skip the queue.
This is only possible because we no longer need a vkQueueSubmit() for
implicit synchronization. The old ANV implicit synchronization path is
gone and all other drivers that do implicit sync do it per-bo so we can
assume that they synchronized somewhere else when writing to the BO and
that the present submit does nothing.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36827>
Vulkan WSI allows you to present to multiple swapchains at the same
time. If it weren't for some of the Vulkan sync rules, this wouldn't be
a big deal as we could just loop and do the present per-swapchain.
However, the whole present takes a single set of wait semaphores and we
can only wait on them once. Right now this works because we do at least
one QueueSubmit2() per-swapchain on the present queue. Since those all
happen on the same queue, we can just wait on the semaphores in the
first present and all the others will pick up that wait thanks to queue
ordering. However, this requires doing a lot of vkQueueSubmit().
This commit changes the vkQueuePresent() flow so that we do a single
vkQueueSubmit2() on the present queue for all swapchains that consumes
the wait semaphores and signals any per-image semaphores and fences. In
the case where separate blit queues are used, we just signal blit
semaphores in the first vkQueueSubmit2() on the present queue and then
do a submit per-swapchain for each of the blit queue blits and signal
per-image semaphores and fences with that submit.
This significantly reduces the number of vkQueueSubmit2() calls being
made by vkQueuePresent() and also breaks the dependency on implicit
ordering of submits, which will be important in the next commit.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36827>
Instead of doing the throttle as we go through the present loop,
allocate throttle fences and wait on them at the top.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36827>