Commit graph

1781 commits

Author SHA1 Message Date
Juan A. Suarez Romero
df96a100ae v3dv: fix assertion on push constants
Fixes a compiler warning regarding the assertion.

Fixes: 6d6a3ab679 ("v3dv: asserts push constants data is valid")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42269>
2026-06-19 18:01:40 +00:00
Sid Pranjale
11c9032466 v3dv: use common multisync code
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Refactors the Vulkan submission path to gather incoming and outgoing
sync dependencies into lightweight stack allocations before handing
them off to the common initializer, preserving the exact original
synchronization execution order and OOM semantics.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42177>
2026-06-17 08:58:08 +00:00
Sid Pranjale
5abc4e3e1a v3dv: directly use v3d_has_feature instead of caps struct
v3d_has_feature() function was also moved to the top of the file
so get_device_extensions() and get_features() could use it

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42177>
2026-06-17 08:58:08 +00:00
Sid Pranjale
4fded54fd0 v3dv: replace single-field options struct with bool
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42177>
2026-06-17 08:58:08 +00:00
Jose Maria Casanova Crespo
cff8dbd452 v3dv: rename format_plane unorm/snorm flags to sw_unorm/sw_snorm
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42176>
2026-06-16 10:46:39 +02:00
Jose Maria Casanova Crespo
4edc659231 v3dv: route blending of UNORM16/SNORM16 RTs through software lowering
UNORM16/SNORM16 render targets are backed by 16-bit-integer TLB
formats, which V3D HW cannot blend. The compiler already supports
software blend lowering in NIR, but V3DV only enabled it for dual-src
blending. As a result format_supports_blending refused the BLEND_BIT
for these formats and Dawn could not advertise the WebGPU
Unorm16TextureFormats feature.

Set pipeline->blend.use_software when any color attachment uses a
software-normalised format so the existing NIR blend lowering kicks
in, and expose VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BLEND_BIT for
those formats.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42176>
2026-06-16 10:46:38 +02:00
Jose Maria Casanova Crespo
cdc6a0bfed v3dv: allow TFU readahead padding above maxMemoryAllocationSize
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Our get_buffer/image_memory_requirements() pad TRANSFER_SRC resources
with V3D_TFU_READAHEAD_SIZE, so allocating the reported requirements of
a resource of exactly maxMemoryAllocationSize failed with
VK_ERROR_OUT_OF_DEVICE_MEMORY.

Accept up to one extra page over the limit: since the allocation size
is page-aligned, that covers any sub-page padding.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42179>
2026-06-15 15:15:10 +00:00
Juan A. Suarez Romero
e8b5f93c31 v3dv: increase max push constants size
There is no hardware restriction that limits the current size, it was
selected manually.

Increase it to 256 as this aligns more with other hardware, and this is
the minimum requirement for Vulkan 1.4.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42212>
2026-06-12 12:20:37 +00:00
Jose Maria Casanova Crespo
519f631e6b v3dv: gate Dawn-required limits and features behind V3D_WEBGPU_OVERRIDE
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Exposes higher limits than the ones supported by the HW and several
ArrayDynamicIndexing features not yet implemented so the Dawn WebGPU
implementation can be used while it doesn't exercise these limits or
features.

The override is enabled using the V3D_WEBGPU_OVERRIDE=1 envvar. When
it is enabled it:

- Increases the framebuffer dimension limit from the real HW value
  (4096 on RPi4, 7680 on RPi5) to 8192.
- Bumps the advertised maxMipLevels reported per format from 13 to
  14 to match the bumped 8192-wide images and 15 for non-2D images.
  The TMU HW already supports that for sampling.
- Increases max_varying_components from 64 to 72 (HW limit is 64).
- Exposes features that are not actually implemented; CTS tests that
  exercise them will hit asserts in debug builds:
  - shaderUniformBufferArrayDynamicIndexing
  - shaderSampledImageArrayDynamicIndexing
  - shaderStorageBufferArrayDynamicIndexing
  - shaderStorageImageArrayDynamicIndexing
- Increases maxImageDimension1D   from 4096 to 16384
- Increases maxImageDimension2D   from 4096 to  8192
- Increases maxImageDimension3D   from 4096 to 16384
- Increases maxImageDimensionCube from 4096 to 16384

When V3D_WEBGPU_OVERRIDE is unset (the default), the driver
advertises the real HW limits already set up by the preceding
"use real HW framebuffer limit" change, so Vulkan CTS conformance
is unaffected.

To help diagnose applications that hit the over-advertised paths,
mesa_loge errors are emitted from three places:

- lower_vulkan_resource_index() warns before the existing UNREACHABLE
  for dynamic descriptor indexing, so the cause is visible in release
  builds where the assertion is compiled out.
- create_image() warns when vkCreateImage is called with attachment
  usage and dimensions above the real HW framebuffer limit. Storage
  and sampled-only images above that limit work fine via the TMU.
- job_compute_frame_tiling() erros when a render job width/height
  exceeds the real HW framebuffer limit.

The per-plane slices[] array in struct v3dv_image is sized at
V3D_MAX_MIP_LEVELS + 2 so the override case (which advertises 14/15
mip levels for the bumped 8192-wide 2D images and 16384 for 1D/3D images)
still fits without enlarging the default array.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo
0ae28c9056 broadcom: raise framebuffer size to 7680 on V3D 7.1
Add a new max_framebuffer_size to devinfo so V3D 4.2 and V3D 7.1 can
expose different framebuffer dimensions: 4096 on RPi4 and 7680 on RPi5.
This is bounded by the maximum clip size supported by the framebuffer.

Take advantage of this to also raise maxImageDimensions* to
max_framebuffer_size.

A non-power-of-two framebuffer means framebuffer_size_for_pixel_count can
compute a height larger than max_framebuffer_size. Clamp the height to the
maximum and recompute the width from the division so w * h <= num_pixels.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo
5242d4c171 broadcom: add and use max_render_targets to devinfo
Use the new devinfo value instead the V3D_MAX_RENDER_TARGETS
macro.

We only maintain the usage of the macro in devinfo initialization
and the V3D in the versioned file src/gallium/drivers/v3d/v3dx_state.c

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>
2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo
94abf86561 v3dv: set non-zero array stride in null texture descriptor state
pack_null_texture_state(), introduced to support VK_KHR_robustness2
nullDescriptor for image bindings, left the TEXTURE_SHADER_STATE
"Array Stride (64-byte aligned)" field at 0.

On real V3D HW it is fine: a TMU read against a null descriptor
returns zero regardless of the descriptor contents, but V3D simulator
validates the TMU array stride before issuing the read.

Setting array_stride_64_byte_aligned = 1 (64 bytes raw) fixes failing
dEQP-VK.robustness.robustness2.bind.*.null_descriptor.samples_1.3d.*
tests case under the simulator.

Fixes: 990d76eae6 ("v3dv: Implement and enable nullDescriptor support")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42112>
2026-06-10 14:38:50 +00:00
Samuel Pitoiset
86406ca87d v3dv: use drirc_gen
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41881>
2026-06-10 07:17:14 +00:00
Sid Pranjale
f6dd632b31 v3dv: drop legacy CPU queue fallback paths
We now require kernel side CPU queue support (introduced via
DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE). If the underlying kernel lacks
this support i.e. is older than kernel 6.8, physical device
initialization will now fail.

With this requirement guaranteed, we can remove the userspace
fallback paths that manually managed and stalled on indirect
CSD dispatches and query resets.

Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42087>
2026-06-09 16:53:48 +00:00
Maíra Canal
37ef45a8c0 v3dv: Drop legacy comments about single-sync support
After commit 16c96b0e93 ("v3dv: drop single sync kernel interface"), we
no longer use V3DV_QUEUE_ANY. Therefore, drop it and also remove the
legacy comments about single-sync support.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42119>
2026-06-09 16:34:47 +00:00
Emma Anholt
b6661df5f0 vulkan: Enable GOOGLE_display_timing on KHR_display across multiple drivers.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This covers some drivers which expose KHR_display and EXT_present_timing.

Based on Emma Anholt's work from 2025, rebased on current Mesa 26.2-devel,
tiny compile fixes and docs/features updates by Mario Kleiner.
See MR 38472 for reference of Emma's work, based on Keith's work.

Tested locally on AMD Polaris for radv, Intel Kabylake for anv, and on
Mesa CI's VK-CTS VK_GOOGLE_display_timing test case for AMD radv,
Intel anv, Qualcomm Adreno tu.

Original code of Emma is
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>

Update of docs/features.txt + new_features.txt updates is

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
2026-06-05 10:21:51 +00:00
Jose Maria Casanova Crespo
28e584b687 v3dv: enable lowered shaderFloat16/Int16/Int8 + VK_KHR_shader_float16_int8
V3D 7.1 now exposes shaderFloat16, shaderInt8, shaderInt16 and
VK_KHR_shader_float16_int8.

Partial native Float16 support is already available. But the rest of
sub-32-bit ALU operations are widened to 32-bit by nir_lower_bit_size
in v3d_lower_nir(); conversion and pack operations are kept at their
native bit width so the QPU's 16-bit pack/unpack paths on mul/mov can
be used.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:39 +00:00
Jose Maria Casanova Crespo
54de903ae4 v3dv: lower flrp16 for consistency with flrp32
flrp32 is already lowered; mirror it for flrp16 so V3D's f16 ALU
path doesn't see an unsupported flrp@16 leftover after bit_size
widening. No measurable test impact on the current f16 sweep,
but matches the f32 behaviour and keeps the lowering surface
consistent across bit sizes.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:38 +00:00
Jose Maria Casanova Crespo
0a5200d051 v3d: move nir_lower_frexp after nir_lower_bit_size
The frexp lowering decomposes frexp into bit manipulation (fabs, ushr,
iand, ior) that relies on implicit float-to-int bit reinterpretation.
When lowered at 16-bit, the subsequent nir_lower_bit_size pass widens
float operations with f2f32 (changing the bit pattern to IEEE fp32)
and integer operations with u2u32 (zero-extending 16-bit bits). This
breaks the reinterpretation: ushr on the fabs result gets f2f32-widened
float bits instead of the original fp16 bit pattern, causing the sign
bit to leak into the exponent extraction for negative inputs.

Moving nir_lower_frexp into v3d_lower_nir after nir_lower_bit_size.
This way frexp decomposition operates at 32-bit where float and integer
operations share the same bit width, and the bit manipulation masks use
the correct IEEE fp32 constants.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:38 +00:00
Jose Maria Casanova Crespo
03dee27f48 v3dv: expose the full simulator memory to applications
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On real hardware compute_heap_size() reserves a fraction of total_ram for
the rest of the system and compute_memory_budget() reports at most 90% of
the available memory, both because that RAM is shared between the GPU and
the CPU. In simulator mode the memory is instead a dedicated GPU pool
allocated by the simulator, so these reservations just hid memory: although
we allocate 1 GiB for the simulator, only 512 MiB was exposed as the heap
and as the budget.

Expose the full simulator allocation as both the heap size and the budget.
The simulator never allocates more than the 4 GiB the GPU MMU can address,
which we assert.

Before:
  memoryHeaps[0]:
    size   = 536870912 (0x20000000) (512.00 MiB)
    budget = 536870912 (0x20000000) (512.00 MiB)

After:
  memoryHeaps[0]:
    size   = 1073741824 (0x40000000) (1024.00 MiB)
    budget = 1073725536 (0x3fffc060) (1023.98 MiB)

Assisted-by: Claude Opus 4.8
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41898>
2026-06-04 09:35:38 +00:00
Daivik Bhatia
990d76eae6 v3dv: Implement and enable nullDescriptor support
Handle null descriptors by emitting zeroed descriptor state.
When the nullDescriptor feature is enabled, a dedicated null_bo is
allocated. Null image descriptors now pack a TEXTURE_SHADER_STATE whose
base address points to this BO, ensuring that the TMU reads from valid
memory.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40485>
2026-06-02 22:29:00 +00:00
Karmjit Mahil
72736c621a v3dv: Add heap_memory_percent driconf support
This also introduces a new tier since the common helper exposes
25% of memory as heap on devices with <=1GiB memory. Previously
50% was being used.

This also fixes `device->heap_used` not using atomic read.

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41242>
2026-06-01 17:32:50 +00:00
Sid Pranjale
020a6bc282 vulkan: implement VK_EXT_debug_marker
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32722>
2026-06-01 15:31:38 +00:00
Utku Iseri
2263576f59 v3dv: close display_fd on incompatible_driver path
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Currently, display_fd gets leaked during vulkan loader driver
probing on platforms where there's no v3dv device, as nothing
closes this fd before returning with INCOMPATIBLE_DRIVER. As
the display_fd also holds MASTER, this in turn prevents the
actual driver from becoming master on the display node.

Close the fd before returning to prevent this.

Fixes: bb532a7a ("v3dv: Fix assertion failure for not-found primary_fd during enumeration.")

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41058>
2026-05-29 19:55:33 +00:00
Juan A. Suarez Romero
5a9e40f028 v3dv: disable threadeded submissions under drm-shim
Threaded submit relies on DRM syncobj wait ioctls blocking until the
GPU signals completion. Under drm-shim there is no real GPU, so
SYNCOBJ_WAIT returns immediately, creating a race between the submit
thread and vkQueueWaitIdle that leads to use-after-free crashes.

Detect if we are running under drm-shim by checking the DRM version
description, skip enabling threaded submit in that case.

Assisted-by: Cursor Agent (Opus 4.6)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>
2026-05-27 10:19:51 +00:00
Juan A. Suarez Romero
1eae5ca94f v3dv: allow device with only render node
When using drm-shim there is no primary node for the driver. This is
fine, and hence we only mark that we don't have primary device.

This fixes using v3dv with drm-shim.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>
2026-05-27 10:19:50 +00:00
Jose Maria Casanova Crespo
ae604b4bdd v3dv: share zero-fill TFU staging BO at device level
The TFU stride-0 fill path allocates a 64 KiB staging BO
(V3D_TFU_MAX_DIM * cpp = 16384 * 4), maps it, fills it with the
pattern, and caches it on the command buffer. For non-zero patterns
the per-cmd-buffer cache works well, but WebGPU/Dawn workloads
issue many zero-fills (lazy buffer init) across separate command
buffers, so the cache misses almost every time and each fill pays
for a fresh alloc + mmap + memcpy.

Add a device-wide staging BO held in v3dv_device::meta.tfu_fill_zero,
lazily allocated under meta.mtx and used whenever data == 0. The BO
is read-only after init so it can be shared across queues without
extra synchronization, and it is freed in destroy_device_meta.

Measured on a Dawn/WebGPU zero-fill-heavy workload (RPi5, ~60
meta_fill_buffer calls, ~218 MiB total, all zero-fills):

  before: TFU branch total 7.328 ms, avg 115.55 us/call
  after:  TFU branch total 0.296 ms, avg   4.78 us/call  (~24x)

Non-zero patterns continue to use the per-cmd-buffer cache.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:45 +00:00
Jose Maria Casanova Crespo
2a62490fa7 v3dv: relax buffer padding in TFU buffer<->image copy
Adjust eligibility check on imageExtent vs slice dimensions
rather than on the buffer addressing dimensions. The TFU codepath
here always writes/reads the full slice from its origin, so the
required invariant is 'imageExtent == slice'; bufferRowLength and
bufferImageHeight may be larger than imageExtent (the spec permits
this for non-zero values), in which case the TFU reads/writes at the
buffer's row/layer stride but only touches slice->width pixels per
row and slice->height rows per layer, leaving the trailing padding
untouched.

The previous combined check (width == slice->width && height ==
slice->height applied to the buffer dimensions) would reject any
caller that set bufferRowLength or bufferImageHeight larger than the
image (this is common for buffers shared across mip levels or
for alignment requirements like Dawn aligning bufferRowLength to 2
for 1-pixel-wide textures), forcing those copies through the slower
TLB / blit / compute paths.

For compressed formats, keep the strict equality check since
block-level stride semantics are more complex.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo
99bce54daa v3dv: implement TFU image-to-buffer copy on V3D 7.1
Generalize copy_buffer_image_tfu with a to_buffer flag selecting which
side is the raster destination, and wire it into v3dv_CmdCopyImageToBuffer2
before the TLB path.

The to_buffer=true direction has the same eligibility constraints as
buffer-to-image, except that V3D 4.2 is unsupported as its TFU cannot
produce raster output, and for image-to-buffer the destination is
always a raster buffer.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo
0054ff2cb7 v3dv: rename copy_buffer_to_image_tfu to copy_buffer_image_tfu
Drop the direction from the function name in preparation for sharing
this implementation with image-to-buffer copies in the next commit.

Pure rename, no functional change.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo
43ddd0c96f v3dv: extract TFU helpers for format-plane and slice-stride args
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo
8e294e6aee v3dv: use TFU copy with stride-0 for vkCmdFillBuffer
Replace the TLB-based meta_fill_buffer path on V3D 7.1+ with a TFU
raster-to-raster copy that broadcasts a single staging row across
the output via iis=0 (stride-0 input). This eliminates the per-fill
CL render job and its tile_alloc/TSDA BO overhead, which is
substantial on workloads that issue many small fills (e.g. WebGPU
lazy buffer initialization in Dawn).

The staging BO holding one row of the fill pattern is cached on the
command buffer and reused across fills with the same data value, so
sequences of identical-pattern fills share a single staging BO.

The existing TLB-based fill is kept as a fallback and is also used
when V3D_DEBUG=disable_tfu is set, or on V3D simulator builds where
the stride-0 TFU input mode is not supported and would assert.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo
ed9fea6045 v3dv: move destroy_update_buffer_cb to a generic helper
Move from v3dv_meta_copy.c to a generic v3dv_cmd_buffer_destroy_bo_cb
in the cmd buffer module. This makes it reusable for different callers
that want to attach a v3dv_bo to a command buffer's private_objs list.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:42 +00:00
Jose Maria Casanova Crespo
9b131eb86e v3dv: Enable meta_copy_buffer with TFU for V3D 7.1
Buffer-to-buffer copies on V3D 7.1+ can be served by the TFU as a
raster-to-raster copy, avoiding the per-copy CL render job and
tile_alloc/TSDA BO overhead of the TLB-based path.

Treat the buffer as a raster texture and chunk the copy into TFU
jobs of up to 16384x16384 pixels. Pick the largest pixel size
(cpp in {4,2,1}) such that src/dst offsets and size are all
cpp-aligned: cpp=4 (R8G8B8A8_UINT) is the expected common case;
cpp=2 (R8G8_UINT) and cpp=1 (R8_UINT) handle Vulkan-permitted
unaligned vkCmdCopyBuffer regions that would otherwise fall back
to the slow TLB path. Skipped when V3D_DEBUG=disable_tfu is set;
emits perf_debug when the cpp=1/2 fallback is taken.

Drop the `if (copy_job)` guard on src_bo cleanup registration in
v3dv_CmdUpdateBuffer: the TFU path queues jobs without returning a
v3dv_job*, so the staging BO must be tracked unconditionally to
avoid leaking once the cmd buffer is submitted.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:42 +00:00
Lishin
c41f88fb35 v3d/v3dv: use common compute limits
Move the compute workgroup count and shared memory limits shared by
v3d and v3dv to v3d_limits.h.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41791>
2026-05-26 07:13:22 +01:00
Samuel Pitoiset
54b71e9e77 util: pass a struct to driParseConfigFiles()
It would be easier to add more functionalities like shader hashes etc.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41657>
2026-05-19 19:51:45 +00:00
Karol Herbst
e9c1cce35f nir: remove ffma_old
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
2026-05-19 18:13:42 +00:00
Jose Maria Casanova Crespo
e40989f451 v3dv: advertise VK_EXT_scalar_block_layout on V3D 7.1+
The scalarBlockLayout feature was already exposed via the Vulkan 1.2
feature struct, but Vulkan 1.1 clients (e.g. Dawn) need the EXT to
discover it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41673>
2026-05-19 14:24:54 +00:00
Jose Maria Casanova Crespo
cd9f2648d3 v3dv: avoid 16F TLB usage for B10G11R11_UFLOAT copies
B10G11R11_UFLOAT_PACK32 maps to V3D_INTERNAL_TYPE_16F on the TLB,
which canonicalizes NaN bit patterns when arbitrary 32 bits are
reinterpreted as that format. The same canonicalization happens in
the blit shader when sampling a B10G11R11 source. Both break the
bit-exactness that vkCmdCopyImage, vkCmdCopyImageToBuffer and
vkCmdCopyBufferToImage require, since the spec defines them as raw
byte copies for any pair of texel-size compatible formats.

Fix it by aliasing the format to R32_UINT whenever B10G11R11 is
involved.

This fixes dEQP-VK.api.copy_and_blit.*b10g11r11*,
dEQP-VK.image.subresource_layout.*b10g11r11* and
dEQP-VK.api.image_clearing.*b10g11r11* failures on V3D 7.1.7 (rpi5)
and V3D 4.2 (rpi4).

Assisted-by: Claude Opus 4.7
Cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41599>
2026-05-19 13:29:35 +00:00
Jose Maria Casanova Crespo
14b8d02130 v3dv: assert timestamp pool BO is disjoint from dst buffer BO
The two BOs come from distjoint allocation nowadays. So they
would never share the BO handle. In case this becomes false
in the future, the BO hanldes needs to be de-duped as happens
with TFU submisions.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>
2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo
87a0eac718 v3dv: avoid duplicate bo_handles between cpu_job and CSD lists
v3d_submit_cpu_ioctl() takes a separate ww_acquire_ctx for the cpu_job's
bo_handles[] and any embedded CSD's bo_handles[]; a BO appearing in both
lists makes the second lock wait on a reservation held by the first
context, deadlocking the ioctl.

We avoid adding a duplicate BO handle when it's already in the cpu_job's
list. This collided when an app suballocates an indirect VkBuffer and a
CSD bind-group VkBuffer out of one VkDeviceMemory.

Fixes: e404ccba5b ("v3dv: use the indirect CSD user extension")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>
2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo
e3ff5d6cdb v3dv: expose maxFragmentOutputAttachments as max_rts
V3DV hardcoded maxFragmentOutputAttachments to 4, from
V3D 4.x when V3D_MAX_RENDER_TARGETS was 4. On V3D 7.x (RPi5)
V3D_MAX_RENDER_TARGETS is 8.

WebGPU's mandatory maxColorAttachments minimum is 8, and wgpu computes
max_color_attachments as min(maxColorAttachments,
maxFragmentOutputAttachments). With the previous value V3DV capped
WebGPU clients to 4 color attachments on RPi5.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41600>
2026-05-18 11:45:10 +00:00
Jose Maria Casanova Crespo
e1c03cb4f6 v3dv: Enable KHR_shader_subgroup_extended_types
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This extension is part of Vulkan 1.2 core and the feature is already
exposed; we just weren't advertising the extension separately.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41624>
2026-05-18 11:26:11 +00:00
Jose Maria Casanova Crespo
8bd7f1d44b v3dv: include mem_offset in vkCmdFillBuffer destination
v3dv_CmdFillBuffer was passing only the user-supplied dstOffset to
meta_fill_buffer, ignoring the destination VkBuffer's mem_offset.
When several VkBuffers share one VkDeviceMemory at different offsets
(sub-allocation) the fill landed on whichever VkBuffer was
bound at offset 0 of the memory object instead of the requested one.

Fixes: 5ed78d91fe ("v3dv: implement vkCmdFillBuffer")
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41436>
2026-05-11 10:49:20 +02:00
Roman Stratiienko
60fdab22a5 v3dv: Emulate multi-queue support via vk_queue for Android
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Android14+ relies on at least 2 queues for vulkan skia/UI rendering.
More explained [here][1]

[1]: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/11326

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>
2026-05-05 07:03:08 +00:00
Roman Stratiienko
16526e451e v3dv: move noop_job creation to device scope
Preparation step for multiple queue emulation support

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>
2026-05-05 07:03:07 +00:00
Jose Maria Casanova Crespo
d95076e581 v3dv: lower oversized compute workgroups to 256 invocations
V3D advertises maxComputeWorkGroupInvocations = 256 but ggml-vulkan
in many cases ignores this limit an creates compute pipelines with
over this limit. Although this is a bug in the application we can
take advantage of nir_lower_workgroup_size and make the application
work.

This issue was causing an assertion failure at nir_to_vir.c:

  assert(c->local_invocation_index_bits <= 8);

The solution is lowering the oversized workgroups to a 256-invocation
workgroup loop, like radv and radeonsi are doing on GFX7, by running
nir_lower_workgroup_size(256) for this scenario.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
2026-04-30 13:59:19 +00:00
Jose Maria Casanova Crespo
c3ba5effe2 v3d/v3dv: Use new V3D_MAX_CSD_WG_SIZE = 256
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
2026-04-30 13:59:18 +00:00
Jose Maria Casanova Crespo
e378a7d773 v3dv: bump maxComputeSharedMemorySize to 32 KB
Currently local shared memory is backed by a BO that is read/written
using the TMU.

ggml-vulkan probes the size of maxComputeSharedMemorySize and rejects
V3DV (falling back to CPU) when the value is below what its larger
compute pipelines request, although in the end the shaders ollama
runs don't actually use shared memory.

32 KB is what ggml-vulkan demands; the value can grow further with no
real per-op cost since shared memory currently goes through the TMU
like any other BO.

V3D OpenGL driver also has 32 KB for SharedMemory.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
2026-04-30 13:59:18 +00:00
Roman Stratiienko
bdbf4ed739 v3dv/android: Add deferred ANB allocation support
Fixes:

dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#basic
dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#bind_image
dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#basic
dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#bind_image

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41235>
2026-04-29 15:31:28 +00:00