fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-06-21 18:58:22 +02:00

Author	SHA1	Message	Date
Juan A. Suarez Romero	df96a100ae	v3dv: fix assertion on push constants Fixes a compiler warning regarding the assertion. Fixes: `6d6a3ab679` ("v3dv: asserts push constants data is valid") Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42269>	2026-06-19 18:01:40 +00:00
Sid Pranjale	11c9032466	v3dv: use common multisync code Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Refactors the Vulkan submission path to gather incoming and outgoing sync dependencies into lightweight stack allocations before handing them off to the common initializer, preserving the exact original synchronization execution order and OOM semantics. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42177>	2026-06-17 08:58:08 +00:00
Sid Pranjale	5abc4e3e1a	v3dv: directly use v3d_has_feature instead of caps struct v3d_has_feature() function was also moved to the top of the file so get_device_extensions() and get_features() could use it Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42177>	2026-06-17 08:58:08 +00:00
Sid Pranjale	4fded54fd0	v3dv: replace single-field options struct with bool Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42177>	2026-06-17 08:58:08 +00:00
Jose Maria Casanova Crespo	cff8dbd452	v3dv: rename format_plane unorm/snorm flags to sw_unorm/sw_snorm Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42176>	2026-06-16 10:46:39 +02:00
Jose Maria Casanova Crespo	4edc659231	v3dv: route blending of UNORM16/SNORM16 RTs through software lowering UNORM16/SNORM16 render targets are backed by 16-bit-integer TLB formats, which V3D HW cannot blend. The compiler already supports software blend lowering in NIR, but V3DV only enabled it for dual-src blending. As a result format_supports_blending refused the BLEND_BIT for these formats and Dawn could not advertise the WebGPU Unorm16TextureFormats feature. Set pipeline->blend.use_software when any color attachment uses a software-normalised format so the existing NIR blend lowering kicks in, and expose VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BLEND_BIT for those formats. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42176>	2026-06-16 10:46:38 +02:00
Jose Maria Casanova Crespo	cdc6a0bfed	v3dv: allow TFU readahead padding above maxMemoryAllocationSize Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Our get_buffer/image_memory_requirements() pad TRANSFER_SRC resources with V3D_TFU_READAHEAD_SIZE, so allocating the reported requirements of a resource of exactly maxMemoryAllocationSize failed with VK_ERROR_OUT_OF_DEVICE_MEMORY. Accept up to one extra page over the limit: since the allocation size is page-aligned, that covers any sub-page padding. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42179>	2026-06-15 15:15:10 +00:00
Juan A. Suarez Romero	e8b5f93c31	v3dv: increase max push constants size There is no hardware restriction that limits the current size, it was selected manually. Increase it to 256 as this aligns more with other hardware, and this is the minimum requirement for Vulkan 1.4. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42212>	2026-06-12 12:20:37 +00:00
Jose Maria Casanova Crespo	519f631e6b	v3dv: gate Dawn-required limits and features behind V3D_WEBGPU_OVERRIDE Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Exposes higher limits than the ones supported by the HW and several ArrayDynamicIndexing features not yet implemented so the Dawn WebGPU implementation can be used while it doesn't exercise these limits or features. The override is enabled using the V3D_WEBGPU_OVERRIDE=1 envvar. When it is enabled it: - Increases the framebuffer dimension limit from the real HW value (4096 on RPi4, 7680 on RPi5) to 8192. - Bumps the advertised maxMipLevels reported per format from 13 to 14 to match the bumped 8192-wide images and 15 for non-2D images. The TMU HW already supports that for sampling. - Increases max_varying_components from 64 to 72 (HW limit is 64). - Exposes features that are not actually implemented; CTS tests that exercise them will hit asserts in debug builds: - shaderUniformBufferArrayDynamicIndexing - shaderSampledImageArrayDynamicIndexing - shaderStorageBufferArrayDynamicIndexing - shaderStorageImageArrayDynamicIndexing - Increases maxImageDimension1D from 4096 to 16384 - Increases maxImageDimension2D from 4096 to 8192 - Increases maxImageDimension3D from 4096 to 16384 - Increases maxImageDimensionCube from 4096 to 16384 When V3D_WEBGPU_OVERRIDE is unset (the default), the driver advertises the real HW limits already set up by the preceding "use real HW framebuffer limit" change, so Vulkan CTS conformance is unaffected. To help diagnose applications that hit the over-advertised paths, mesa_loge errors are emitted from three places: - lower_vulkan_resource_index() warns before the existing UNREACHABLE for dynamic descriptor indexing, so the cause is visible in release builds where the assertion is compiled out. - create_image() warns when vkCreateImage is called with attachment usage and dimensions above the real HW framebuffer limit. Storage and sampled-only images above that limit work fine via the TMU. - job_compute_frame_tiling() erros when a render job width/height exceeds the real HW framebuffer limit. The per-plane slices[] array in struct v3dv_image is sized at V3D_MAX_MIP_LEVELS + 2 so the override case (which advertises 14/15 mip levels for the bumped 8192-wide 2D images and 16384 for 1D/3D images) still fits without enlarging the default array. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>	2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo	0ae28c9056	broadcom: raise framebuffer size to 7680 on V3D 7.1 Add a new max_framebuffer_size to devinfo so V3D 4.2 and V3D 7.1 can expose different framebuffer dimensions: 4096 on RPi4 and 7680 on RPi5. This is bounded by the maximum clip size supported by the framebuffer. Take advantage of this to also raise maxImageDimensions* to max_framebuffer_size. A non-power-of-two framebuffer means framebuffer_size_for_pixel_count can compute a height larger than max_framebuffer_size. Clamp the height to the maximum and recompute the width from the division so w * h <= num_pixels. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>	2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo	5242d4c171	broadcom: add and use max_render_targets to devinfo Use the new devinfo value instead the V3D_MAX_RENDER_TARGETS macro. We only maintain the usage of the macro in devinfo initialization and the V3D in the versioned file src/gallium/drivers/v3d/v3dx_state.c Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42117>	2026-06-10 15:22:27 +00:00
Jose Maria Casanova Crespo	94abf86561	v3dv: set non-zero array stride in null texture descriptor state pack_null_texture_state(), introduced to support VK_KHR_robustness2 nullDescriptor for image bindings, left the TEXTURE_SHADER_STATE "Array Stride (64-byte aligned)" field at 0. On real V3D HW it is fine: a TMU read against a null descriptor returns zero regardless of the descriptor contents, but V3D simulator validates the TMU array stride before issuing the read. Setting array_stride_64_byte_aligned = 1 (64 bytes raw) fixes failing dEQP-VK.robustness.robustness2.bind..null_descriptor.samples_1.3d. tests case under the simulator. Fixes: `990d76eae6` ("v3dv: Implement and enable nullDescriptor support") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42112>	2026-06-10 14:38:50 +00:00
Samuel Pitoiset	86406ca87d	v3dv: use drirc_gen Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41881>	2026-06-10 07:17:14 +00:00
Sid Pranjale	f6dd632b31	v3dv: drop legacy CPU queue fallback paths We now require kernel side CPU queue support (introduced via DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE). If the underlying kernel lacks this support i.e. is older than kernel 6.8, physical device initialization will now fail. With this requirement guaranteed, we can remove the userspace fallback paths that manually managed and stalled on indirect CSD dispatches and query resets. Reviewed-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42087>	2026-06-09 16:53:48 +00:00
Maíra Canal	37ef45a8c0	v3dv: Drop legacy comments about single-sync support After commit `16c96b0e93` ("v3dv: drop single sync kernel interface"), we no longer use V3DV_QUEUE_ANY. Therefore, drop it and also remove the legacy comments about single-sync support. Signed-off-by: Maíra Canal <mcanal@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42119>	2026-06-09 16:34:47 +00:00
Emma Anholt	b6661df5f0	vulkan: Enable GOOGLE_display_timing on KHR_display across multiple drivers. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This covers some drivers which expose KHR_display and EXT_present_timing. Based on Emma Anholt's work from 2025, rebased on current Mesa 26.2-devel, tiny compile fixes and docs/features updates by Mario Kleiner. See MR 38472 for reference of Emma's work, based on Keith's work. Tested locally on AMD Polaris for radv, Intel Kabylake for anv, and on Mesa CI's VK-CTS VK_GOOGLE_display_timing test case for AMD radv, Intel anv, Qualcomm Adreno tu. Original code of Emma is Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> Update of docs/features.txt + new_features.txt updates is Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>	2026-06-05 10:21:51 +00:00
Jose Maria Casanova Crespo	28e584b687	v3dv: enable lowered shaderFloat16/Int16/Int8 + VK_KHR_shader_float16_int8 V3D 7.1 now exposes shaderFloat16, shaderInt8, shaderInt16 and VK_KHR_shader_float16_int8. Partial native Float16 support is already available. But the rest of sub-32-bit ALU operations are widened to 32-bit by nir_lower_bit_size in v3d_lower_nir(); conversion and pack operations are kept at their native bit width so the QPU's 16-bit pack/unpack paths on mul/mov can be used. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>	2026-06-04 13:29:39 +00:00
Jose Maria Casanova Crespo	54de903ae4	v3dv: lower flrp16 for consistency with flrp32 flrp32 is already lowered; mirror it for flrp16 so V3D's f16 ALU path doesn't see an unsupported flrp@16 leftover after bit_size widening. No measurable test impact on the current f16 sweep, but matches the f32 behaviour and keeps the lowering surface consistent across bit sizes. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>	2026-06-04 13:29:38 +00:00
Jose Maria Casanova Crespo	0a5200d051	v3d: move nir_lower_frexp after nir_lower_bit_size The frexp lowering decomposes frexp into bit manipulation (fabs, ushr, iand, ior) that relies on implicit float-to-int bit reinterpretation. When lowered at 16-bit, the subsequent nir_lower_bit_size pass widens float operations with f2f32 (changing the bit pattern to IEEE fp32) and integer operations with u2u32 (zero-extending 16-bit bits). This breaks the reinterpretation: ushr on the fabs result gets f2f32-widened float bits instead of the original fp16 bit pattern, causing the sign bit to leak into the exponent extraction for negative inputs. Moving nir_lower_frexp into v3d_lower_nir after nir_lower_bit_size. This way frexp decomposition operates at 32-bit where float and integer operations share the same bit width, and the bit manipulation masks use the correct IEEE fp32 constants. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>	2026-06-04 13:29:38 +00:00
Jose Maria Casanova Crespo	03dee27f48	v3dv: expose the full simulator memory to applications Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On real hardware compute_heap_size() reserves a fraction of total_ram for the rest of the system and compute_memory_budget() reports at most 90% of the available memory, both because that RAM is shared between the GPU and the CPU. In simulator mode the memory is instead a dedicated GPU pool allocated by the simulator, so these reservations just hid memory: although we allocate 1 GiB for the simulator, only 512 MiB was exposed as the heap and as the budget. Expose the full simulator allocation as both the heap size and the budget. The simulator never allocates more than the 4 GiB the GPU MMU can address, which we assert. Before: memoryHeaps[0]: size = 536870912 (0x20000000) (512.00 MiB) budget = 536870912 (0x20000000) (512.00 MiB) After: memoryHeaps[0]: size = 1073741824 (0x40000000) (1024.00 MiB) budget = 1073725536 (0x3fffc060) (1023.98 MiB) Assisted-by: Claude Opus 4.8 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41898>	2026-06-04 09:35:38 +00:00
Daivik Bhatia	990d76eae6	v3dv: Implement and enable nullDescriptor support Handle null descriptors by emitting zeroed descriptor state. When the nullDescriptor feature is enabled, a dedicated null_bo is allocated. Null image descriptors now pack a TEXTURE_SHADER_STATE whose base address points to this BO, ensuring that the TMU reads from valid memory. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40485>	2026-06-02 22:29:00 +00:00
Karmjit Mahil	72736c621a	v3dv: Add heap_memory_percent driconf support This also introduces a new tier since the common helper exposes 25% of memory as heap on devices with <=1GiB memory. Previously 50% was being used. This also fixes `device->heap_used` not using atomic read. Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41242>	2026-06-01 17:32:50 +00:00
Sid Pranjale	020a6bc282	vulkan: implement VK_EXT_debug_marker Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32722>	2026-06-01 15:31:38 +00:00
Utku Iseri	2263576f59	v3dv: close display_fd on incompatible_driver path Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Currently, display_fd gets leaked during vulkan loader driver probing on platforms where there's no v3dv device, as nothing closes this fd before returning with INCOMPATIBLE_DRIVER. As the display_fd also holds MASTER, this in turn prevents the actual driver from becoming master on the display node. Close the fd before returning to prevent this. Fixes: `bb532a7a` ("v3dv: Fix assertion failure for not-found primary_fd during enumeration.") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41058>	2026-05-29 19:55:33 +00:00
Juan A. Suarez Romero	5a9e40f028	v3dv: disable threadeded submissions under drm-shim Threaded submit relies on DRM syncobj wait ioctls blocking until the GPU signals completion. Under drm-shim there is no real GPU, so SYNCOBJ_WAIT returns immediately, creating a race between the submit thread and vkQueueWaitIdle that leads to use-after-free crashes. Detect if we are running under drm-shim by checking the DRM version description, skip enabling threaded submit in that case. Assisted-by: Cursor Agent (Opus 4.6) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>	2026-05-27 10:19:51 +00:00
Juan A. Suarez Romero	1eae5ca94f	v3dv: allow device with only render node When using drm-shim there is no primary node for the driver. This is fine, and hence we only mark that we don't have primary device. This fixes using v3dv with drm-shim. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>	2026-05-27 10:19:50 +00:00
Jose Maria Casanova Crespo	ae604b4bdd	v3dv: share zero-fill TFU staging BO at device level The TFU stride-0 fill path allocates a 64 KiB staging BO (V3D_TFU_MAX_DIM * cpp = 16384 * 4), maps it, fills it with the pattern, and caches it on the command buffer. For non-zero patterns the per-cmd-buffer cache works well, but WebGPU/Dawn workloads issue many zero-fills (lazy buffer init) across separate command buffers, so the cache misses almost every time and each fill pays for a fresh alloc + mmap + memcpy. Add a device-wide staging BO held in v3dv_device::meta.tfu_fill_zero, lazily allocated under meta.mtx and used whenever data == 0. The BO is read-only after init so it can be shared across queues without extra synchronization, and it is freed in destroy_device_meta. Measured on a Dawn/WebGPU zero-fill-heavy workload (RPi5, ~60 meta_fill_buffer calls, ~218 MiB total, all zero-fills): before: TFU branch total 7.328 ms, avg 115.55 us/call after: TFU branch total 0.296 ms, avg 4.78 us/call (~24x) Non-zero patterns continue to use the per-cmd-buffer cache. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:45 +00:00
Jose Maria Casanova Crespo	2a62490fa7	v3dv: relax buffer padding in TFU buffer<->image copy Adjust eligibility check on imageExtent vs slice dimensions rather than on the buffer addressing dimensions. The TFU codepath here always writes/reads the full slice from its origin, so the required invariant is 'imageExtent == slice'; bufferRowLength and bufferImageHeight may be larger than imageExtent (the spec permits this for non-zero values), in which case the TFU reads/writes at the buffer's row/layer stride but only touches slice->width pixels per row and slice->height rows per layer, leaving the trailing padding untouched. The previous combined check (width == slice->width && height == slice->height applied to the buffer dimensions) would reject any caller that set bufferRowLength or bufferImageHeight larger than the image (this is common for buffers shared across mip levels or for alignment requirements like Dawn aligning bufferRowLength to 2 for 1-pixel-wide textures), forcing those copies through the slower TLB / blit / compute paths. For compressed formats, keep the strict equality check since block-level stride semantics are more complex. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo	99bce54daa	v3dv: implement TFU image-to-buffer copy on V3D 7.1 Generalize copy_buffer_image_tfu with a to_buffer flag selecting which side is the raster destination, and wire it into v3dv_CmdCopyImageToBuffer2 before the TLB path. The to_buffer=true direction has the same eligibility constraints as buffer-to-image, except that V3D 4.2 is unsupported as its TFU cannot produce raster output, and for image-to-buffer the destination is always a raster buffer. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo	0054ff2cb7	v3dv: rename copy_buffer_to_image_tfu to copy_buffer_image_tfu Drop the direction from the function name in preparation for sharing this implementation with image-to-buffer copies in the next commit. Pure rename, no functional change. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo	43ddd0c96f	v3dv: extract TFU helpers for format-plane and slice-stride args Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo	8e294e6aee	v3dv: use TFU copy with stride-0 for vkCmdFillBuffer Replace the TLB-based meta_fill_buffer path on V3D 7.1+ with a TFU raster-to-raster copy that broadcasts a single staging row across the output via iis=0 (stride-0 input). This eliminates the per-fill CL render job and its tile_alloc/TSDA BO overhead, which is substantial on workloads that issue many small fills (e.g. WebGPU lazy buffer initialization in Dawn). The staging BO holding one row of the fill pattern is cached on the command buffer and reused across fills with the same data value, so sequences of identical-pattern fills share a single staging BO. The existing TLB-based fill is kept as a fallback and is also used when V3D_DEBUG=disable_tfu is set, or on V3D simulator builds where the stride-0 TFU input mode is not supported and would assert. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo	ed9fea6045	v3dv: move destroy_update_buffer_cb to a generic helper Move from v3dv_meta_copy.c to a generic v3dv_cmd_buffer_destroy_bo_cb in the cmd buffer module. This makes it reusable for different callers that want to attach a v3dv_bo to a command buffer's private_objs list. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:42 +00:00
Jose Maria Casanova Crespo	9b131eb86e	v3dv: Enable meta_copy_buffer with TFU for V3D 7.1 Buffer-to-buffer copies on V3D 7.1+ can be served by the TFU as a raster-to-raster copy, avoiding the per-copy CL render job and tile_alloc/TSDA BO overhead of the TLB-based path. Treat the buffer as a raster texture and chunk the copy into TFU jobs of up to 16384x16384 pixels. Pick the largest pixel size (cpp in {4,2,1}) such that src/dst offsets and size are all cpp-aligned: cpp=4 (R8G8B8A8_UINT) is the expected common case; cpp=2 (R8G8_UINT) and cpp=1 (R8_UINT) handle Vulkan-permitted unaligned vkCmdCopyBuffer regions that would otherwise fall back to the slow TLB path. Skipped when V3D_DEBUG=disable_tfu is set; emits perf_debug when the cpp=1/2 fallback is taken. Drop the `if (copy_job)` guard on src_bo cleanup registration in v3dv_CmdUpdateBuffer: the TFU path queues jobs without returning a v3dv_job*, so the staging BO must be tracked unconditionally to avoid leaking once the cmd buffer is submitted. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:42 +00:00
Lishin	c41f88fb35	v3d/v3dv: use common compute limits Move the compute workgroup count and shared memory limits shared by v3d and v3dv to v3d_limits.h. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41791>	2026-05-26 07:13:22 +01:00
Samuel Pitoiset	54b71e9e77	util: pass a struct to driParseConfigFiles() It would be easier to add more functionalities like shader hashes etc. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41657>	2026-05-19 19:51:45 +00:00
Karol Herbst	e9c1cce35f	nir: remove ffma_old Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>	2026-05-19 18:13:42 +00:00
Jose Maria Casanova Crespo	e40989f451	v3dv: advertise VK_EXT_scalar_block_layout on V3D 7.1+ The scalarBlockLayout feature was already exposed via the Vulkan 1.2 feature struct, but Vulkan 1.1 clients (e.g. Dawn) need the EXT to discover it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41673>	2026-05-19 14:24:54 +00:00
Jose Maria Casanova Crespo	cd9f2648d3	v3dv: avoid 16F TLB usage for B10G11R11_UFLOAT copies B10G11R11_UFLOAT_PACK32 maps to V3D_INTERNAL_TYPE_16F on the TLB, which canonicalizes NaN bit patterns when arbitrary 32 bits are reinterpreted as that format. The same canonicalization happens in the blit shader when sampling a B10G11R11 source. Both break the bit-exactness that vkCmdCopyImage, vkCmdCopyImageToBuffer and vkCmdCopyBufferToImage require, since the spec defines them as raw byte copies for any pair of texel-size compatible formats. Fix it by aliasing the format to R32_UINT whenever B10G11R11 is involved. This fixes dEQP-VK.api.copy_and_blit.b10g11r11, dEQP-VK.image.subresource_layout.b10g11r11 and dEQP-VK.api.image_clearing.b10g11r11 failures on V3D 7.1.7 (rpi5) and V3D 4.2 (rpi4). Assisted-by: Claude Opus 4.7 Cc: mesa-stable Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41599>	2026-05-19 13:29:35 +00:00
Jose Maria Casanova Crespo	14b8d02130	v3dv: assert timestamp pool BO is disjoint from dst buffer BO The two BOs come from distjoint allocation nowadays. So they would never share the BO handle. In case this becomes false in the future, the BO hanldes needs to be de-duped as happens with TFU submisions. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>	2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo	87a0eac718	v3dv: avoid duplicate bo_handles between cpu_job and CSD lists v3d_submit_cpu_ioctl() takes a separate ww_acquire_ctx for the cpu_job's bo_handles[] and any embedded CSD's bo_handles[]; a BO appearing in both lists makes the second lock wait on a reservation held by the first context, deadlocking the ioctl. We avoid adding a duplicate BO handle when it's already in the cpu_job's list. This collided when an app suballocates an indirect VkBuffer and a CSD bind-group VkBuffer out of one VkDeviceMemory. Fixes: `e404ccba5b` ("v3dv: use the indirect CSD user extension") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>	2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo	e3ff5d6cdb	v3dv: expose maxFragmentOutputAttachments as max_rts V3DV hardcoded maxFragmentOutputAttachments to 4, from V3D 4.x when V3D_MAX_RENDER_TARGETS was 4. On V3D 7.x (RPi5) V3D_MAX_RENDER_TARGETS is 8. WebGPU's mandatory maxColorAttachments minimum is 8, and wgpu computes max_color_attachments as min(maxColorAttachments, maxFragmentOutputAttachments). With the previous value V3DV capped WebGPU clients to 4 color attachments on RPi5. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41600>	2026-05-18 11:45:10 +00:00
Jose Maria Casanova Crespo	e1c03cb4f6	v3dv: Enable KHR_shader_subgroup_extended_types Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This extension is part of Vulkan 1.2 core and the feature is already exposed; we just weren't advertising the extension separately. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41624>	2026-05-18 11:26:11 +00:00
Jose Maria Casanova Crespo	8bd7f1d44b	v3dv: include mem_offset in vkCmdFillBuffer destination v3dv_CmdFillBuffer was passing only the user-supplied dstOffset to meta_fill_buffer, ignoring the destination VkBuffer's mem_offset. When several VkBuffers share one VkDeviceMemory at different offsets (sub-allocation) the fill landed on whichever VkBuffer was bound at offset 0 of the memory object instead of the requested one. Fixes: `5ed78d91fe` ("v3dv: implement vkCmdFillBuffer") Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41436>	2026-05-11 10:49:20 +02:00
Roman Stratiienko	60fdab22a5	v3dv: Emulate multi-queue support via vk_queue for Android Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Android14+ relies on at least 2 queues for vulkan skia/UI rendering. More explained [here][1] [1]: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/11326 Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>	2026-05-05 07:03:08 +00:00
Roman Stratiienko	16526e451e	v3dv: move noop_job creation to device scope Preparation step for multiple queue emulation support Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>	2026-05-05 07:03:07 +00:00
Jose Maria Casanova Crespo	d95076e581	v3dv: lower oversized compute workgroups to 256 invocations V3D advertises maxComputeWorkGroupInvocations = 256 but ggml-vulkan in many cases ignores this limit an creates compute pipelines with over this limit. Although this is a bug in the application we can take advantage of nir_lower_workgroup_size and make the application work. This issue was causing an assertion failure at nir_to_vir.c: assert(c->local_invocation_index_bits <= 8); The solution is lowering the oversized workgroups to a 256-invocation workgroup loop, like radv and radeonsi are doing on GFX7, by running nir_lower_workgroup_size(256) for this scenario. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>	2026-04-30 13:59:19 +00:00
Jose Maria Casanova Crespo	c3ba5effe2	v3d/v3dv: Use new V3D_MAX_CSD_WG_SIZE = 256 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>	2026-04-30 13:59:18 +00:00
Jose Maria Casanova Crespo	e378a7d773	v3dv: bump maxComputeSharedMemorySize to 32 KB Currently local shared memory is backed by a BO that is read/written using the TMU. ggml-vulkan probes the size of maxComputeSharedMemorySize and rejects V3DV (falling back to CPU) when the value is below what its larger compute pipelines request, although in the end the shaders ollama runs don't actually use shared memory. 32 KB is what ggml-vulkan demands; the value can grow further with no real per-op cost since shared memory currently goes through the TMU like any other BO. V3D OpenGL driver also has 32 KB for SharedMemory. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>	2026-04-30 13:59:18 +00:00
Roman Stratiienko	bdbf4ed739	v3dv/android: Add deferred ANB allocation support Fixes: dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#basic dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#bind_image dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#basic dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#bind_image Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> Acked-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41235>	2026-04-29 15:31:28 +00:00

1 2 3 4 5 ...

1781 commits