v3d_submit_cpu_ioctl() takes a separate ww_acquire_ctx for the cpu_job's
bo_handles[] and any embedded CSD's bo_handles[]; a BO appearing in both
lists makes the second lock wait on a reservation held by the first
context, deadlocking the ioctl.
We avoid adding a duplicate BO handle when it's already in the cpu_job's
list. This collided when an app suballocates an indirect VkBuffer and a
CSD bind-group VkBuffer out of one VkDeviceMemory.
Fixes: e404ccba5b ("v3dv: use the indirect CSD user extension")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>
V3DV hardcoded maxFragmentOutputAttachments to 4, from
V3D 4.x when V3D_MAX_RENDER_TARGETS was 4. On V3D 7.x (RPi5)
V3D_MAX_RENDER_TARGETS is 8.
WebGPU's mandatory maxColorAttachments minimum is 8, and wgpu computes
max_color_attachments as min(maxColorAttachments,
maxFragmentOutputAttachments). With the previous value V3DV capped
WebGPU clients to 4 color attachments on RPi5.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41600>
This extension is part of Vulkan 1.2 core and the feature is already
exposed; we just weren't advertising the extension separately.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41624>
v3dv_CmdFillBuffer was passing only the user-supplied dstOffset to
meta_fill_buffer, ignoring the destination VkBuffer's mem_offset.
When several VkBuffers share one VkDeviceMemory at different offsets
(sub-allocation) the fill landed on whichever VkBuffer was
bound at offset 0 of the memory object instead of the requested one.
Fixes: 5ed78d91fe ("v3dv: implement vkCmdFillBuffer")
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41436>
V3D advertises maxComputeWorkGroupInvocations = 256 but ggml-vulkan
in many cases ignores this limit an creates compute pipelines with
over this limit. Although this is a bug in the application we can
take advantage of nir_lower_workgroup_size and make the application
work.
This issue was causing an assertion failure at nir_to_vir.c:
assert(c->local_invocation_index_bits <= 8);
The solution is lowering the oversized workgroups to a 256-invocation
workgroup loop, like radv and radeonsi are doing on GFX7, by running
nir_lower_workgroup_size(256) for this scenario.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
Currently local shared memory is backed by a BO that is read/written
using the TMU.
ggml-vulkan probes the size of maxComputeSharedMemorySize and rejects
V3DV (falling back to CPU) when the value is below what its larger
compute pipelines request, although in the end the shaders ollama
runs don't actually use shared memory.
32 KB is what ggml-vulkan demands; the value can grow further with no
real per-op cost since shared memory currently goes through the TMU
like any other BO.
V3D OpenGL driver also has 32 KB for SharedMemory.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
Expose VK_KHR_shader_integer_dot_product 4x8-bit packed dot
products using native HW instructions v8dot and setnnmode.
QPU instruction count for sdot_4x8_iadd compute shader:
Before (scalar decomposition): 18 ALU cycles
After (setnnmode + v8dot): 3 ALU cycles (6x)
We advertise integerDotProduct4x8BitPacked*Accelerated for V3D 7.1+
Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>
Remove the allocate_tile_state_now parameter from v3dv_job_start_frame().
So v3dv_job_allocate_tile_state() is explicitly called after
job_emit_binning_flush() as we know the value of job->draw_count instead
of using always 0.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40554>
Replace the inline tile_alloc/TSDA sizing in v3dv_job_allocate_tile_state()
with a call to the new v3d_tile_alloc_sizes() helper. This switches from
64B to 128B initial tile alloc blocks (avoiding overflow for simple draws)
and from a flat 512KB headroom to a draw-proportional formula.
Set tile_allocation_initial_block_size and tile_allocation_block_size
in all TILE_BINNING_MODE_CFG emissions and update the
TILE_LIST_INITIAL_BLOCK_SIZE packets to match.
Benchmarked on RPi5 (V3D 7.1) with GfxBench Vulkan Aztec Ruins at
1920x1040. Average tile_alloc BO size dropped 75% (535 KB to 132 KB)
with 20% fewer OOM events (521 to 417) and no FPS regression.
This avoids exhausting GPU memory when multiple blit or fill jobs
are batched in the same command buffer, with a huge reduction of
the memory footprint avoiding the 512 KB of the tile_alloc per batched
job.
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40554>
Replace printf and nir_print_shaders by proper mesa_logX and
nir_log_shaderX functions, that provides better features (like logging
to a file, setting the logging verbosity, etc) and works better with
Android.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40434>
The extension is implemented in shared Vulkan/WSI code and
not driver specific. The underlying kms driver needs to
support HDR metadata signalling on the drm connector, which
vc4 kms does for VideoCore 5 and later since April 2021.
Successfully tested on RaspberryPi 4/400 with a HDR-10
capable HDMI monitor.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40696>
These extensions are implemented in shared Vulkan/WSI code and
not driver specific. A Vulkan driver just needs to support
VK_KHR_timeline_semaphore, which v3dv already supports via
emulated timeline semaphores since April 2022.
Successfully tested on RaspberryPi 4/400.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40696>
We have 4 image intrinsic variants now. This enum is useful for
nir_rewrite_image_intrinsic() and it will be used by other NIR passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40709>
The variable doesn't store a granularity specific to CLE buffers. It
stores the granularity that the OS imposes on buffer allocations (that
is, the OS page size). Therefore, rename the variable to best reflect
its meaning.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
When a resolve attachment is created with VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT,
the render pass may use a view format that differs from the image creation
format (e.g. view=R16G16_SINT on an image created as B8G8R8A8_SRGB).
cmd_buffer_emit_resolve() was calling v3dv_CmdResolveImage2() which only
receives images but not the view format. This means that blit_shader()
will use the wrong format, causing miss-renderings.
So instead of using directly v3dv_CmdResolveImage2(), let's have an
intermediate function that receives both images and view formats to do
the resolve.
This fixes dEQP-VK.image.mutable.* failures.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40234>
Split the monolithic v3dv_private.h (~2600 lines) into self-contained
sub-headers so each .c file only includes what it needs:
v3dv_common.h, v3dv_device.h, v3dv_image.h, v3dv_pass.h,
v3dv_query.h, v3dv_pipeline.h, v3dv_descriptor_set.h,
v3dv_cmd_buffer.h, v3dv_version_dispatch.h
As part of this commit we remove v3dv_private.h.
We keep v3dvx_private.h as it is, because the gain would be really
small (a lot of really small sub-headers).
In addition to keep things more tidy, we made a quick performance
check. We measured how many files are re-compiled and the performance
difference when touching one of the headers, compared with keeping
just one monolithic header.
Header touch (incremental) Split Monolithic Speedup
-------------------------- ----- ---------- -------
v3dv_image.h 2369 (24f) 2436 (33f) 1.03x
v3dv_query.h 2357 (20f) 2436 (33f) 1.03x
v3dv_pass.h 2352 (20f) 2436 (33f) 1.04x
v3dv_cmd_buffer.h 2354 (20f) 2436 (33f) 1.03x
v3dv_descriptor_set.h 2436 (33f) 2436 (33f) 1.00x
v3dv_pipeline.h 2437 (33f) 2436 (33f) 1.00x
v3dv_device.h 2418 (31f) 2436 (33f) 1.01x
v3dv_common.h 2419 (33f) 2436 (33f) 1.01x
v3dv_version_dispatch.h 2371 (26f) 2436 (33f) 1.03x
Header touch (incremental) Split Monolithic Speedup
-------------------------- ---------- ---------- -------
v3dv_image.h 2377 (24f) 2443 (33f) 1.03x
v3dv_query.h 2346 (20f) 2443 (33f) 1.04x
v3dv_pass.h 2360 (20f) 2443 (33f) 1.04x
v3dv_cmd_buffer.h 2351 (20f) 2443 (33f) 1.04x
v3dv_descriptor_set.h 2438 (33f) 2443 (33f) 1.00x
v3dv_pipeline.h 2429 (33f) 2443 (33f) 1.01x
v3dv_device.h 2418 (31f) 2443 (33f) 1.01x
v3dv_common.h 2432 (33f) 2443 (33f) 1.00x
v3dv_version_dispatch.h 2373 (26f) 2443 (33f) 1.03x
The bigger gain is on the files recompiled for some headers (going
from 33 down to 20 in some cases). The performance gain is not so
relevant though.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40169>
_mesa_sha1_format has a few remaining uses, so it's moved to build_id.c,
which is its last user.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40383>
The runtime builds a final pipeline state with pointers to structures
coming from the associated pipelines libraries.
So far it has considered that the viewMask was part of a structure
together with the rest of the renderpass information. This information
can be specified in pre-raster, fragment & color-output state groups
and it was assumed would be consistent for all 3. And the runtime
currently takes the pointer to the structure from the last pipeline
library (color output).
Some coming spec/cts will clarify that the viewMask only needs to be
specified for pre-raster & fragment groups, making the value in the
color-output group untrustworthy.
This change creates a new state structure to hold the viewMask on its
own so it is only gather on pre-raster & fragment groups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Aitor Camacho <aitor@lunarg.com> (kosmickrisp)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (turnip)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Frank Binns <frank.binns@imgtec.com> (powervr)
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (panvk)
Royaled-yes-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (lavapipe)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39940>
Replace manual string parsing for V3DV_ENABLE_PIPELINE_CACHE
in instance creation with parse_debug_string and a dedicated
debug_control table.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40202>
Refactor pipeline creation path to use the vk_graphics_pipeline_state
structures provided by runtime instead of raw Vulkan CreateInfo structs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39834>
The Vulkan spec states:
"If logicOpEnable is VK_TRUE, then a logical operation selected by
logicOp is applied between each color attachment and the
fragment’s corresponding output value, and blending of all
attachments is treated as if it were disabled. Any attachments
using color formats for which logical operations are not supported
simply pass through the color values unmodified."
pack_blend() was only checking blendEnable from the attachment state,
causing hardware blending to be applied even when logic ops were enabled.
This is the v3dv equivalent of the RADV fix in commit c172f6ef01
("radv: fix disabling logic op for srgb/float formats when blending
is enabled").
Fixes: dEQP-VK.pipeline.monolithic.logic_op_na_formats.*_blend
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40025>
On V3D 4.2 (Raspberry Pi 4), there is a hardware bug where the binner
can trigger a GPU reset in some situations where primitives are
discarded, such as due to primitive restarts.
The way to avoid this is to force the binner to do always something, by
emitting the proper CL. In this case we decided to always set point
size, as it is a very simple and fast operation.
This fixes resets caused by
dEQP-VK.pipeline.monolithic.input_assembly.primitive_restart.*.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39826>
Use v3dv_job_apply_barrier_state to consume pending barriers when
executing secondary command buffers. This ensures we only serialize
against relevant stages, addressing FIXME.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39278>
Vulkan spec requires binding flags to be matched with the binding with
the same index, however currently bindings are sorted with flags not
properly sorted, which leads to bindings and flags mismatch.
Resolve this by adding optional flags info to the parameters of
vk_create_sorted_bindings(), and refactoring panvk/pvr (which really
pair bindings and flags instead of only iterating flags) to use sorted
flags.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38967>