Commit graph

3676 commits

Author SHA1 Message Date
Daivik Bhatia
990d76eae6 v3dv: Implement and enable nullDescriptor support
Handle null descriptors by emitting zeroed descriptor state.
When the nullDescriptor feature is enabled, a dedicated null_bo is
allocated. Null image descriptors now pack a TEXTURE_SHADER_STATE whose
base address points to this BO, ensuring that the TMU reads from valid
memory.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40485>
2026-06-02 22:29:00 +00:00
Daivik Bhatia
6aed7d2988 broadcom/compiler: add support for null descriptors
Add `v3d_nir_lower_null_descriptors` NIR pass to bypass operations
if the descriptor size is zero, returning 0 where necessary.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40485>
2026-06-02 22:29:00 +00:00
Karmjit Mahil
72736c621a v3dv: Add heap_memory_percent driconf support
This also introduces a new tier since the common helper exposes
25% of memory as heap on devices with <=1GiB memory. Previously
50% was being used.

This also fixes `device->heap_used` not using atomic read.

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41242>
2026-06-01 17:32:50 +00:00
Sid Pranjale
020a6bc282 vulkan: implement VK_EXT_debug_marker
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32722>
2026-06-01 15:31:38 +00:00
Utku Iseri
2263576f59 v3dv: close display_fd on incompatible_driver path
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Currently, display_fd gets leaked during vulkan loader driver
probing on platforms where there's no v3dv device, as nothing
closes this fd before returning with INCOMPATIBLE_DRIVER. As
the display_fd also holds MASTER, this in turn prevents the
actual driver from becoming master on the display node.

Close the fd before returning to prevent this.

Fixes: bb532a7a ("v3dv: Fix assertion failure for not-found primary_fd during enumeration.")

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41058>
2026-05-29 19:55:33 +00:00
Juan A. Suarez Romero
5a9e40f028 v3dv: disable threadeded submissions under drm-shim
Threaded submit relies on DRM syncobj wait ioctls blocking until the
GPU signals completion. Under drm-shim there is no real GPU, so
SYNCOBJ_WAIT returns immediately, creating a race between the submit
thread and vkQueueWaitIdle that leads to use-after-free crashes.

Detect if we are running under drm-shim by checking the DRM version
description, skip enabling threaded submit in that case.

Assisted-by: Cursor Agent (Opus 4.6)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>
2026-05-27 10:19:51 +00:00
Juan A. Suarez Romero
788979594e v3d/drm-shim: add GPU selection
So far with drm-shim we were always emulating V3D 4.2.

Now we always emulate V3D 7.1, but we allow selecting 4.2 through an
envvar: `V3D_GPU_ID=(42|71)`

Borrowed from etnaviv.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>
2026-05-27 10:19:50 +00:00
Juan A. Suarez Romero
1eae5ca94f v3dv: allow device with only render node
When using drm-shim there is no primary node for the driver. This is
fine, and hence we only mark that we don't have primary device.

This fixes using v3dv with drm-shim.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>
2026-05-27 10:19:50 +00:00
Juan A. Suarez Romero
6f8a692abc broadcom/simulator: V3D is always 4.2 or above
This is a leftover from when we were supporting V3D 4.1 or below.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>
2026-05-27 10:19:50 +00:00
Jose Maria Casanova Crespo
ae604b4bdd v3dv: share zero-fill TFU staging BO at device level
The TFU stride-0 fill path allocates a 64 KiB staging BO
(V3D_TFU_MAX_DIM * cpp = 16384 * 4), maps it, fills it with the
pattern, and caches it on the command buffer. For non-zero patterns
the per-cmd-buffer cache works well, but WebGPU/Dawn workloads
issue many zero-fills (lazy buffer init) across separate command
buffers, so the cache misses almost every time and each fill pays
for a fresh alloc + mmap + memcpy.

Add a device-wide staging BO held in v3dv_device::meta.tfu_fill_zero,
lazily allocated under meta.mtx and used whenever data == 0. The BO
is read-only after init so it can be shared across queues without
extra synchronization, and it is freed in destroy_device_meta.

Measured on a Dawn/WebGPU zero-fill-heavy workload (RPi5, ~60
meta_fill_buffer calls, ~218 MiB total, all zero-fills):

  before: TFU branch total 7.328 ms, avg 115.55 us/call
  after:  TFU branch total 0.296 ms, avg   4.78 us/call  (~24x)

Non-zero patterns continue to use the per-cmd-buffer cache.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:45 +00:00
Jose Maria Casanova Crespo
2a62490fa7 v3dv: relax buffer padding in TFU buffer<->image copy
Adjust eligibility check on imageExtent vs slice dimensions
rather than on the buffer addressing dimensions. The TFU codepath
here always writes/reads the full slice from its origin, so the
required invariant is 'imageExtent == slice'; bufferRowLength and
bufferImageHeight may be larger than imageExtent (the spec permits
this for non-zero values), in which case the TFU reads/writes at the
buffer's row/layer stride but only touches slice->width pixels per
row and slice->height rows per layer, leaving the trailing padding
untouched.

The previous combined check (width == slice->width && height ==
slice->height applied to the buffer dimensions) would reject any
caller that set bufferRowLength or bufferImageHeight larger than the
image (this is common for buffers shared across mip levels or
for alignment requirements like Dawn aligning bufferRowLength to 2
for 1-pixel-wide textures), forcing those copies through the slower
TLB / blit / compute paths.

For compressed formats, keep the strict equality check since
block-level stride semantics are more complex.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo
99bce54daa v3dv: implement TFU image-to-buffer copy on V3D 7.1
Generalize copy_buffer_image_tfu with a to_buffer flag selecting which
side is the raster destination, and wire it into v3dv_CmdCopyImageToBuffer2
before the TLB path.

The to_buffer=true direction has the same eligibility constraints as
buffer-to-image, except that V3D 4.2 is unsupported as its TFU cannot
produce raster output, and for image-to-buffer the destination is
always a raster buffer.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo
0054ff2cb7 v3dv: rename copy_buffer_to_image_tfu to copy_buffer_image_tfu
Drop the direction from the function name in preparation for sharing
this implementation with image-to-buffer copies in the next commit.

Pure rename, no functional change.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo
43ddd0c96f v3dv: extract TFU helpers for format-plane and slice-stride args
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo
8e294e6aee v3dv: use TFU copy with stride-0 for vkCmdFillBuffer
Replace the TLB-based meta_fill_buffer path on V3D 7.1+ with a TFU
raster-to-raster copy that broadcasts a single staging row across
the output via iis=0 (stride-0 input). This eliminates the per-fill
CL render job and its tile_alloc/TSDA BO overhead, which is
substantial on workloads that issue many small fills (e.g. WebGPU
lazy buffer initialization in Dawn).

The staging BO holding one row of the fill pattern is cached on the
command buffer and reused across fills with the same data value, so
sequences of identical-pattern fills share a single staging BO.

The existing TLB-based fill is kept as a fallback and is also used
when V3D_DEBUG=disable_tfu is set, or on V3D simulator builds where
the stride-0 TFU input mode is not supported and would assert.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo
ed9fea6045 v3dv: move destroy_update_buffer_cb to a generic helper
Move from v3dv_meta_copy.c to a generic v3dv_cmd_buffer_destroy_bo_cb
in the cmd buffer module. This makes it reusable for different callers
that want to attach a v3dv_bo to a command buffer's private_objs list.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:42 +00:00
Jose Maria Casanova Crespo
9b131eb86e v3dv: Enable meta_copy_buffer with TFU for V3D 7.1
Buffer-to-buffer copies on V3D 7.1+ can be served by the TFU as a
raster-to-raster copy, avoiding the per-copy CL render job and
tile_alloc/TSDA BO overhead of the TLB-based path.

Treat the buffer as a raster texture and chunk the copy into TFU
jobs of up to 16384x16384 pixels. Pick the largest pixel size
(cpp in {4,2,1}) such that src/dst offsets and size are all
cpp-aligned: cpp=4 (R8G8B8A8_UINT) is the expected common case;
cpp=2 (R8G8_UINT) and cpp=1 (R8_UINT) handle Vulkan-permitted
unaligned vkCmdCopyBuffer regions that would otherwise fall back
to the slow TLB path. Skipped when V3D_DEBUG=disable_tfu is set;
emits perf_debug when the cpp=1/2 fallback is taken.

Drop the `if (copy_job)` guard on src_bo cleanup registration in
v3dv_CmdUpdateBuffer: the TFU path queues jobs without returning a
v3dv_job*, so the staging BO must be tracked unconditionally to
avoid leaking once the cmd buffer is submitted.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>
2026-05-26 07:50:42 +00:00
Lishin
c41f88fb35 v3d/v3dv: use common compute limits
Move the compute workgroup count and shared memory limits shared by
v3d and v3dv to v3d_limits.h.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41791>
2026-05-26 07:13:22 +01:00
Juan A. Suarez Romero
3df406633e vc4/ci: update expected results
Add new failures and timeouts.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41778>
2026-05-25 10:22:12 +02:00
Collabora's Gfx CI Team
18ba81e5b6 Uprev Piglit to 6fd29fe44f8857b876a67bee962919635f22ecc8
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
11ce9eb56e...6fd29fe44f

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40989>
2026-05-20 21:37:44 +00:00
Juan A. Suarez Romero
a1c371517c broadcom/ci: update kernel for nightly runs
Update the RPi kernel used in CI-Tron.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41698>
2026-05-20 08:58:37 +00:00
Samuel Pitoiset
54b71e9e77 util: pass a struct to driParseConfigFiles()
It would be easier to add more functionalities like shader hashes etc.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41657>
2026-05-19 19:51:45 +00:00
Karol Herbst
e9c1cce35f nir: remove ffma_old
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
2026-05-19 18:13:42 +00:00
Jose Maria Casanova Crespo
e40989f451 v3dv: advertise VK_EXT_scalar_block_layout on V3D 7.1+
The scalarBlockLayout feature was already exposed via the Vulkan 1.2
feature struct, but Vulkan 1.1 clients (e.g. Dawn) need the EXT to
discover it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41673>
2026-05-19 14:24:54 +00:00
Jose Maria Casanova Crespo
cd9f2648d3 v3dv: avoid 16F TLB usage for B10G11R11_UFLOAT copies
B10G11R11_UFLOAT_PACK32 maps to V3D_INTERNAL_TYPE_16F on the TLB,
which canonicalizes NaN bit patterns when arbitrary 32 bits are
reinterpreted as that format. The same canonicalization happens in
the blit shader when sampling a B10G11R11 source. Both break the
bit-exactness that vkCmdCopyImage, vkCmdCopyImageToBuffer and
vkCmdCopyBufferToImage require, since the spec defines them as raw
byte copies for any pair of texel-size compatible formats.

Fix it by aliasing the format to R32_UINT whenever B10G11R11 is
involved.

This fixes dEQP-VK.api.copy_and_blit.*b10g11r11*,
dEQP-VK.image.subresource_layout.*b10g11r11* and
dEQP-VK.api.image_clearing.*b10g11r11* failures on V3D 7.1.7 (rpi5)
and V3D 4.2 (rpi4).

Assisted-by: Claude Opus 4.7
Cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41599>
2026-05-19 13:29:35 +00:00
Jose Maria Casanova Crespo
bd496a6aad broadcom/ci: skip SSBO tests close to the 60s threshold on rpi4
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Add them to the existing "Slow CTS tests" group in
broadcom-rpi4-skips.txt.

Assisted-by: Claude Opus 4.7
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41615>
2026-05-18 15:42:24 +00:00
Jose Maria Casanova Crespo
14b8d02130 v3dv: assert timestamp pool BO is disjoint from dst buffer BO
The two BOs come from distjoint allocation nowadays. So they
would never share the BO handle. In case this becomes false
in the future, the BO hanldes needs to be de-duped as happens
with TFU submisions.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>
2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo
87a0eac718 v3dv: avoid duplicate bo_handles between cpu_job and CSD lists
v3d_submit_cpu_ioctl() takes a separate ww_acquire_ctx for the cpu_job's
bo_handles[] and any embedded CSD's bo_handles[]; a BO appearing in both
lists makes the second lock wait on a reservation held by the first
context, deadlocking the ioctl.

We avoid adding a duplicate BO handle when it's already in the cpu_job's
list. This collided when an app suballocates an indirect VkBuffer and a
CSD bind-group VkBuffer out of one VkDeviceMemory.

Fixes: e404ccba5b ("v3dv: use the indirect CSD user extension")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>
2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo
e3ff5d6cdb v3dv: expose maxFragmentOutputAttachments as max_rts
V3DV hardcoded maxFragmentOutputAttachments to 4, from
V3D 4.x when V3D_MAX_RENDER_TARGETS was 4. On V3D 7.x (RPi5)
V3D_MAX_RENDER_TARGETS is 8.

WebGPU's mandatory maxColorAttachments minimum is 8, and wgpu computes
max_color_attachments as min(maxColorAttachments,
maxFragmentOutputAttachments). With the previous value V3DV capped
WebGPU clients to 4 color attachments on RPi5.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41600>
2026-05-18 11:45:10 +00:00
Jose Maria Casanova Crespo
e1c03cb4f6 v3dv: Enable KHR_shader_subgroup_extended_types
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This extension is part of Vulkan 1.2 core and the feature is already
exposed; we just weren't advertising the extension separately.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41624>
2026-05-18 11:26:11 +00:00
Juan A. Suarez Romero
f763ed0ae8 v3d/ci: add OpenCL regressions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Plus a flake.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41575>
2026-05-14 09:22:35 +00:00
Emma Anholt
76c39acad7 ci: Update VK CTS to 1.4.5.3 with fixes.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
I've pulled in a pile of changes to reduce the overhead (runtime and
memory) when sharding for deqp-runner, along with a bunch of fixes for
KHR_display testing that we recently enabled, plus a few others that
affect our drivers.

The big new set of failures looks like it's from more complete coverage of
blitting between formats.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41243>
2026-05-14 04:22:21 +00:00
Jose Maria Casanova Crespo
8bd7f1d44b v3dv: include mem_offset in vkCmdFillBuffer destination
v3dv_CmdFillBuffer was passing only the user-supplied dstOffset to
meta_fill_buffer, ignoring the destination VkBuffer's mem_offset.
When several VkBuffers share one VkDeviceMemory at different offsets
(sub-allocation) the fill landed on whichever VkBuffer was
bound at offset 0 of the memory object instead of the requested one.

Fixes: 5ed78d91fe ("v3dv: implement vkCmdFillBuffer")
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41436>
2026-05-11 10:49:20 +02:00
Daivik Bhatia
a8c7a25fb1 broadcom/compiler: Add explicit NOP instruction at page boundaries
The QPU prefetches the next instruction during shader execution.
If the shader assembly size perfectly aligns with a page boundary,
the prefetching mechanism reads past the compiled boundary,
leading to an MMU error.

This commit insert an explicit NOP instruction at the end of the shader
and increases the qpu_inst_count by one when the instruction count
exactly hits a page boundary. This ensures we don't fall off the end
of the last executable instruction page and into invalid memory.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40983>
2026-05-06 09:21:28 +00:00
Lishin
c50210ba7c broadcom/qpu: add V3D 7.1 disasm tests
Add QPU disassembler tests for V3D 7.1, covering
small immediates in both add and mul slots, as well
as setnnmode_uu paired with v8dot.

Assisted-by: OpenAI Codex
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41280>
2026-05-06 07:33:42 +00:00
Roman Stratiienko
60fdab22a5 v3dv: Emulate multi-queue support via vk_queue for Android
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Android14+ relies on at least 2 queues for vulkan skia/UI rendering.
More explained [here][1]

[1]: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/11326

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>
2026-05-05 07:03:08 +00:00
Roman Stratiienko
16526e451e v3dv: move noop_job creation to device scope
Preparation step for multiple queue emulation support

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>
2026-05-05 07:03:07 +00:00
Yiwei Zhang
26c870f173 broadcom: remove unused Android log utils
These are leftovers from
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40434

Acked-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41254>
2026-05-01 20:23:23 +00:00
Jose Maria Casanova Crespo
d95076e581 v3dv: lower oversized compute workgroups to 256 invocations
V3D advertises maxComputeWorkGroupInvocations = 256 but ggml-vulkan
in many cases ignores this limit an creates compute pipelines with
over this limit. Although this is a bug in the application we can
take advantage of nir_lower_workgroup_size and make the application
work.

This issue was causing an assertion failure at nir_to_vir.c:

  assert(c->local_invocation_index_bits <= 8);

The solution is lowering the oversized workgroups to a 256-invocation
workgroup loop, like radv and radeonsi are doing on GFX7, by running
nir_lower_workgroup_size(256) for this scenario.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
2026-04-30 13:59:19 +00:00
Jose Maria Casanova Crespo
c3ba5effe2 v3d/v3dv: Use new V3D_MAX_CSD_WG_SIZE = 256
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
2026-04-30 13:59:18 +00:00
Jose Maria Casanova Crespo
e378a7d773 v3dv: bump maxComputeSharedMemorySize to 32 KB
Currently local shared memory is backed by a BO that is read/written
using the TMU.

ggml-vulkan probes the size of maxComputeSharedMemorySize and rejects
V3DV (falling back to CPU) when the value is below what its larger
compute pipelines request, although in the end the shaders ollama
runs don't actually use shared memory.

32 KB is what ggml-vulkan demands; the value can grow further with no
real per-op cost since shared memory currently goes through the TMU
like any other BO.

V3D OpenGL driver also has 32 KB for SharedMemory.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
2026-04-30 13:59:18 +00:00
Jose Maria Casanova Crespo
2cd51a6efc broadcom/compiler: move nir_lower_undef_to_zero out of optimization loop
The combination of nir_opt_if and nir_lower_undef_to_zero running inside
the optimization loop could make it to not converge.

This was exercised by ollama running gemma3 compute shaders.

Removing the pass from the optimization loop results in No changes in
shader-db.

Assisted-by: Claude Opus 4.6
Fixes: cbe24a0e9c ("broadcom/compiler: use nir_lower_undef_to_zero")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41256>
2026-04-30 12:30:34 +02:00
Roman Stratiienko
bdbf4ed739 v3dv/android: Add deferred ANB allocation support
Fixes:

dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#basic
dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#bind_image
dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#basic
dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#bind_image

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41235>
2026-04-29 15:31:28 +00:00
Jose Maria Casanova Crespo
3a8d5aeaa1 v3dv: Expose hardware-accelerated integer dot products on V3D 7.1+
Expose VK_KHR_shader_integer_dot_product 4x8-bit packed dot
products using native HW instructions v8dot and setnnmode.

QPU instruction count for sdot_4x8_iadd compute shader:

  Before (scalar decomposition):  18 ALU cycles
  After (setnnmode + v8dot):       3 ALU cycles  (6x)

We advertise integerDotProduct4x8BitPacked*Accelerated for V3D 7.1+

Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>
2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo
8f06961bf5 broadcom/compiler: Eliminate redundant setnnmode instructions
This new VIR optimization pass tracks the current NN signedness
mode per block and removes duplicate setnnmode instructions.

When consecutive dot products use the same signedness mode, the backend
emits one setnnmode per dot product. This pass removes the redundant
ones, keeping only the first.

Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>
2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo
24ecc9cbcc broadcom/compiler: Add v8dot and setnnmode scheduler dependencies.
As nnmode register is read by v8dot instruction we need to add dependencies
between setnnmode instructions and v8dot via the nnmode register, so they
are scheduled correcty using last_nn_mode virtual register..

Add a last_nn_mode virtual register to the scheduler state and create:
- Write dependencies for all SETNNMODE variants
- Read dependencies for V8DOT.

This follows the same pattern as the existing MULTOP/UMUL24 rtop tracking.

Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>
2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo
33a700be91 broadcom/compiler: hardware-accelerated 4x8-bit dot products on V3D 7.1+
VIR instructions and nir_to_vir implementation of 4x8-bit dot products
using native HW accelerated ALU instructions.

setnnmode instructions are marked as having side effects.

Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>
2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo
afe4e321e1 broadcom/compiler: Add V3D 7.1 v8dot dot product QPU instructions
Add QPU instruction definitions, metadata, and encoding for V3D 7.1
v8dot product instruction and the setnnmode instruction that allows
defining the signedness (UU/SU/US/SS) of the v8dot operation.

Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>
2026-04-29 13:21:07 +00:00
Emma Anholt
3a8ff22336 ci: Delete references to various broken traces.
These are all being removed from the repos, so no need to leave the old
notes around.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40959>
2026-04-22 17:39:31 +00:00
Samuel Pitoiset
ebf2797da2 vulkan,treewide: stop passing vk_device to vk_pipeline_robustness_state_fill()
This will be helpful for RADV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41029>
2026-04-21 17:29:04 +00:00