fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-06-12 05:18:18 +02:00

Author	SHA1	Message	Date
Daivik Bhatia	990d76eae6	v3dv: Implement and enable nullDescriptor support Handle null descriptors by emitting zeroed descriptor state. When the nullDescriptor feature is enabled, a dedicated null_bo is allocated. Null image descriptors now pack a TEXTURE_SHADER_STATE whose base address points to this BO, ensuring that the TMU reads from valid memory. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40485>	2026-06-02 22:29:00 +00:00
Daivik Bhatia	6aed7d2988	broadcom/compiler: add support for null descriptors Add `v3d_nir_lower_null_descriptors` NIR pass to bypass operations if the descriptor size is zero, returning 0 where necessary. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40485>	2026-06-02 22:29:00 +00:00
Karmjit Mahil	72736c621a	v3dv: Add heap_memory_percent driconf support This also introduces a new tier since the common helper exposes 25% of memory as heap on devices with <=1GiB memory. Previously 50% was being used. This also fixes `device->heap_used` not using atomic read. Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41242>	2026-06-01 17:32:50 +00:00
Sid Pranjale	020a6bc282	vulkan: implement VK_EXT_debug_marker Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32722>	2026-06-01 15:31:38 +00:00
Utku Iseri	2263576f59	v3dv: close display_fd on incompatible_driver path Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Currently, display_fd gets leaked during vulkan loader driver probing on platforms where there's no v3dv device, as nothing closes this fd before returning with INCOMPATIBLE_DRIVER. As the display_fd also holds MASTER, this in turn prevents the actual driver from becoming master on the display node. Close the fd before returning to prevent this. Fixes: `bb532a7a` ("v3dv: Fix assertion failure for not-found primary_fd during enumeration.") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41058>	2026-05-29 19:55:33 +00:00
Juan A. Suarez Romero	5a9e40f028	v3dv: disable threadeded submissions under drm-shim Threaded submit relies on DRM syncobj wait ioctls blocking until the GPU signals completion. Under drm-shim there is no real GPU, so SYNCOBJ_WAIT returns immediately, creating a race between the submit thread and vkQueueWaitIdle that leads to use-after-free crashes. Detect if we are running under drm-shim by checking the DRM version description, skip enabling threaded submit in that case. Assisted-by: Cursor Agent (Opus 4.6) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>	2026-05-27 10:19:51 +00:00
Juan A. Suarez Romero	788979594e	v3d/drm-shim: add GPU selection So far with drm-shim we were always emulating V3D 4.2. Now we always emulate V3D 7.1, but we allow selecting 4.2 through an envvar: `V3D_GPU_ID=(42\|71)` Borrowed from etnaviv. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>	2026-05-27 10:19:50 +00:00
Juan A. Suarez Romero	1eae5ca94f	v3dv: allow device with only render node When using drm-shim there is no primary node for the driver. This is fine, and hence we only mark that we don't have primary device. This fixes using v3dv with drm-shim. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>	2026-05-27 10:19:50 +00:00
Juan A. Suarez Romero	6f8a692abc	broadcom/simulator: V3D is always 4.2 or above This is a leftover from when we were supporting V3D 4.1 or below. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41779>	2026-05-27 10:19:50 +00:00
Jose Maria Casanova Crespo	ae604b4bdd	v3dv: share zero-fill TFU staging BO at device level The TFU stride-0 fill path allocates a 64 KiB staging BO (V3D_TFU_MAX_DIM * cpp = 16384 * 4), maps it, fills it with the pattern, and caches it on the command buffer. For non-zero patterns the per-cmd-buffer cache works well, but WebGPU/Dawn workloads issue many zero-fills (lazy buffer init) across separate command buffers, so the cache misses almost every time and each fill pays for a fresh alloc + mmap + memcpy. Add a device-wide staging BO held in v3dv_device::meta.tfu_fill_zero, lazily allocated under meta.mtx and used whenever data == 0. The BO is read-only after init so it can be shared across queues without extra synchronization, and it is freed in destroy_device_meta. Measured on a Dawn/WebGPU zero-fill-heavy workload (RPi5, ~60 meta_fill_buffer calls, ~218 MiB total, all zero-fills): before: TFU branch total 7.328 ms, avg 115.55 us/call after: TFU branch total 0.296 ms, avg 4.78 us/call (~24x) Non-zero patterns continue to use the per-cmd-buffer cache. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:45 +00:00
Jose Maria Casanova Crespo	2a62490fa7	v3dv: relax buffer padding in TFU buffer<->image copy Adjust eligibility check on imageExtent vs slice dimensions rather than on the buffer addressing dimensions. The TFU codepath here always writes/reads the full slice from its origin, so the required invariant is 'imageExtent == slice'; bufferRowLength and bufferImageHeight may be larger than imageExtent (the spec permits this for non-zero values), in which case the TFU reads/writes at the buffer's row/layer stride but only touches slice->width pixels per row and slice->height rows per layer, leaving the trailing padding untouched. The previous combined check (width == slice->width && height == slice->height applied to the buffer dimensions) would reject any caller that set bufferRowLength or bufferImageHeight larger than the image (this is common for buffers shared across mip levels or for alignment requirements like Dawn aligning bufferRowLength to 2 for 1-pixel-wide textures), forcing those copies through the slower TLB / blit / compute paths. For compressed formats, keep the strict equality check since block-level stride semantics are more complex. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo	99bce54daa	v3dv: implement TFU image-to-buffer copy on V3D 7.1 Generalize copy_buffer_image_tfu with a to_buffer flag selecting which side is the raster destination, and wire it into v3dv_CmdCopyImageToBuffer2 before the TLB path. The to_buffer=true direction has the same eligibility constraints as buffer-to-image, except that V3D 4.2 is unsupported as its TFU cannot produce raster output, and for image-to-buffer the destination is always a raster buffer. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:44 +00:00
Jose Maria Casanova Crespo	0054ff2cb7	v3dv: rename copy_buffer_to_image_tfu to copy_buffer_image_tfu Drop the direction from the function name in preparation for sharing this implementation with image-to-buffer copies in the next commit. Pure rename, no functional change. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo	43ddd0c96f	v3dv: extract TFU helpers for format-plane and slice-stride args Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo	8e294e6aee	v3dv: use TFU copy with stride-0 for vkCmdFillBuffer Replace the TLB-based meta_fill_buffer path on V3D 7.1+ with a TFU raster-to-raster copy that broadcasts a single staging row across the output via iis=0 (stride-0 input). This eliminates the per-fill CL render job and its tile_alloc/TSDA BO overhead, which is substantial on workloads that issue many small fills (e.g. WebGPU lazy buffer initialization in Dawn). The staging BO holding one row of the fill pattern is cached on the command buffer and reused across fills with the same data value, so sequences of identical-pattern fills share a single staging BO. The existing TLB-based fill is kept as a fallback and is also used when V3D_DEBUG=disable_tfu is set, or on V3D simulator builds where the stride-0 TFU input mode is not supported and would assert. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:43 +00:00
Jose Maria Casanova Crespo	ed9fea6045	v3dv: move destroy_update_buffer_cb to a generic helper Move from v3dv_meta_copy.c to a generic v3dv_cmd_buffer_destroy_bo_cb in the cmd buffer module. This makes it reusable for different callers that want to attach a v3dv_bo to a command buffer's private_objs list. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:42 +00:00
Jose Maria Casanova Crespo	9b131eb86e	v3dv: Enable meta_copy_buffer with TFU for V3D 7.1 Buffer-to-buffer copies on V3D 7.1+ can be served by the TFU as a raster-to-raster copy, avoiding the per-copy CL render job and tile_alloc/TSDA BO overhead of the TLB-based path. Treat the buffer as a raster texture and chunk the copy into TFU jobs of up to 16384x16384 pixels. Pick the largest pixel size (cpp in {4,2,1}) such that src/dst offsets and size are all cpp-aligned: cpp=4 (R8G8B8A8_UINT) is the expected common case; cpp=2 (R8G8_UINT) and cpp=1 (R8_UINT) handle Vulkan-permitted unaligned vkCmdCopyBuffer regions that would otherwise fall back to the slow TLB path. Skipped when V3D_DEBUG=disable_tfu is set; emits perf_debug when the cpp=1/2 fallback is taken. Drop the `if (copy_job)` guard on src_bo cleanup registration in v3dv_CmdUpdateBuffer: the TFU path queues jobs without returning a v3dv_job*, so the staging BO must be tracked unconditionally to avoid leaking once the cmd buffer is submitted. Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41725>	2026-05-26 07:50:42 +00:00
Lishin	c41f88fb35	v3d/v3dv: use common compute limits Move the compute workgroup count and shared memory limits shared by v3d and v3dv to v3d_limits.h. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41791>	2026-05-26 07:13:22 +01:00
Juan A. Suarez Romero	3df406633e	vc4/ci: update expected results Add new failures and timeouts. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41778>	2026-05-25 10:22:12 +02:00
Collabora's Gfx CI Team	18ba81e5b6	Uprev Piglit to 6fd29fe44f8857b876a67bee962919635f22ecc8 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details `11ce9eb56e...6fd29fe44f` Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40989>	2026-05-20 21:37:44 +00:00
Juan A. Suarez Romero	a1c371517c	broadcom/ci: update kernel for nightly runs Update the RPi kernel used in CI-Tron. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41698>	2026-05-20 08:58:37 +00:00
Samuel Pitoiset	54b71e9e77	util: pass a struct to driParseConfigFiles() It would be easier to add more functionalities like shader hashes etc. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41657>	2026-05-19 19:51:45 +00:00
Karol Herbst	e9c1cce35f	nir: remove ffma_old Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>	2026-05-19 18:13:42 +00:00
Jose Maria Casanova Crespo	e40989f451	v3dv: advertise VK_EXT_scalar_block_layout on V3D 7.1+ The scalarBlockLayout feature was already exposed via the Vulkan 1.2 feature struct, but Vulkan 1.1 clients (e.g. Dawn) need the EXT to discover it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41673>	2026-05-19 14:24:54 +00:00
Jose Maria Casanova Crespo	cd9f2648d3	v3dv: avoid 16F TLB usage for B10G11R11_UFLOAT copies B10G11R11_UFLOAT_PACK32 maps to V3D_INTERNAL_TYPE_16F on the TLB, which canonicalizes NaN bit patterns when arbitrary 32 bits are reinterpreted as that format. The same canonicalization happens in the blit shader when sampling a B10G11R11 source. Both break the bit-exactness that vkCmdCopyImage, vkCmdCopyImageToBuffer and vkCmdCopyBufferToImage require, since the spec defines them as raw byte copies for any pair of texel-size compatible formats. Fix it by aliasing the format to R32_UINT whenever B10G11R11 is involved. This fixes dEQP-VK.api.copy_and_blit.b10g11r11, dEQP-VK.image.subresource_layout.b10g11r11 and dEQP-VK.api.image_clearing.b10g11r11 failures on V3D 7.1.7 (rpi5) and V3D 4.2 (rpi4). Assisted-by: Claude Opus 4.7 Cc: mesa-stable Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41599>	2026-05-19 13:29:35 +00:00
Jose Maria Casanova Crespo	bd496a6aad	broadcom/ci: skip SSBO tests close to the 60s threshold on rpi4 Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Add them to the existing "Slow CTS tests" group in broadcom-rpi4-skips.txt. Assisted-by: Claude Opus 4.7 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41615>	2026-05-18 15:42:24 +00:00
Jose Maria Casanova Crespo	14b8d02130	v3dv: assert timestamp pool BO is disjoint from dst buffer BO The two BOs come from distjoint allocation nowadays. So they would never share the BO handle. In case this becomes false in the future, the BO hanldes needs to be de-duped as happens with TFU submisions. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>	2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo	87a0eac718	v3dv: avoid duplicate bo_handles between cpu_job and CSD lists v3d_submit_cpu_ioctl() takes a separate ww_acquire_ctx for the cpu_job's bo_handles[] and any embedded CSD's bo_handles[]; a BO appearing in both lists makes the second lock wait on a reservation held by the first context, deadlocking the ioctl. We avoid adding a duplicate BO handle when it's already in the cpu_job's list. This collided when an app suballocates an indirect VkBuffer and a CSD bind-group VkBuffer out of one VkDeviceMemory. Fixes: `e404ccba5b` ("v3dv: use the indirect CSD user extension") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41616>	2026-05-18 12:26:55 +00:00
Jose Maria Casanova Crespo	e3ff5d6cdb	v3dv: expose maxFragmentOutputAttachments as max_rts V3DV hardcoded maxFragmentOutputAttachments to 4, from V3D 4.x when V3D_MAX_RENDER_TARGETS was 4. On V3D 7.x (RPi5) V3D_MAX_RENDER_TARGETS is 8. WebGPU's mandatory maxColorAttachments minimum is 8, and wgpu computes max_color_attachments as min(maxColorAttachments, maxFragmentOutputAttachments). With the previous value V3DV capped WebGPU clients to 4 color attachments on RPi5. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41600>	2026-05-18 11:45:10 +00:00
Jose Maria Casanova Crespo	e1c03cb4f6	v3dv: Enable KHR_shader_subgroup_extended_types Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details This extension is part of Vulkan 1.2 core and the feature is already exposed; we just weren't advertising the extension separately. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41624>	2026-05-18 11:26:11 +00:00
Juan A. Suarez Romero	f763ed0ae8	v3d/ci: add OpenCL regressions Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Plus a flake. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41575>	2026-05-14 09:22:35 +00:00
Emma Anholt	76c39acad7	ci: Update VK CTS to 1.4.5.3 with fixes. Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details I've pulled in a pile of changes to reduce the overhead (runtime and memory) when sharding for deqp-runner, along with a bunch of fixes for KHR_display testing that we recently enabled, plus a few others that affect our drivers. The big new set of failures looks like it's from more complete coverage of blitting between formats. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41243>	2026-05-14 04:22:21 +00:00
Jose Maria Casanova Crespo	8bd7f1d44b	v3dv: include mem_offset in vkCmdFillBuffer destination v3dv_CmdFillBuffer was passing only the user-supplied dstOffset to meta_fill_buffer, ignoring the destination VkBuffer's mem_offset. When several VkBuffers share one VkDeviceMemory at different offsets (sub-allocation) the fill landed on whichever VkBuffer was bound at offset 0 of the memory object instead of the requested one. Fixes: `5ed78d91fe` ("v3dv: implement vkCmdFillBuffer") Assisted-by: Claude Opus 4.7 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41436>	2026-05-11 10:49:20 +02:00
Daivik Bhatia	a8c7a25fb1	broadcom/compiler: Add explicit NOP instruction at page boundaries The QPU prefetches the next instruction during shader execution. If the shader assembly size perfectly aligns with a page boundary, the prefetching mechanism reads past the compiled boundary, leading to an MMU error. This commit insert an explicit NOP instruction at the end of the shader and increases the qpu_inst_count by one when the instruction count exactly hits a page boundary. This ensures we don't fall off the end of the last executable instruction page and into invalid memory. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40983>	2026-05-06 09:21:28 +00:00
Lishin	c50210ba7c	broadcom/qpu: add V3D 7.1 disasm tests Add QPU disassembler tests for V3D 7.1, covering small immediates in both add and mul slots, as well as setnnmode_uu paired with v8dot. Assisted-by: OpenAI Codex Acked-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41280>	2026-05-06 07:33:42 +00:00
Roman Stratiienko	60fdab22a5	v3dv: Emulate multi-queue support via vk_queue for Android Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Android14+ relies on at least 2 queues for vulkan skia/UI rendering. More explained [here][1] [1]: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/11326 Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>	2026-05-05 07:03:08 +00:00
Roman Stratiienko	16526e451e	v3dv: move noop_job creation to device scope Preparation step for multiple queue emulation support Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41213>	2026-05-05 07:03:07 +00:00
Yiwei Zhang	26c870f173	broadcom: remove unused Android log utils These are leftovers from https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40434 Acked-by: Valentine Burley <valentine.burley@collabora.com> Reviewed-by: Dhruv Mark Collins <mark@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41254>	2026-05-01 20:23:23 +00:00
Jose Maria Casanova Crespo	d95076e581	v3dv: lower oversized compute workgroups to 256 invocations V3D advertises maxComputeWorkGroupInvocations = 256 but ggml-vulkan in many cases ignores this limit an creates compute pipelines with over this limit. Although this is a bug in the application we can take advantage of nir_lower_workgroup_size and make the application work. This issue was causing an assertion failure at nir_to_vir.c: assert(c->local_invocation_index_bits <= 8); The solution is lowering the oversized workgroups to a 256-invocation workgroup loop, like radv and radeonsi are doing on GFX7, by running nir_lower_workgroup_size(256) for this scenario. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>	2026-04-30 13:59:19 +00:00
Jose Maria Casanova Crespo	c3ba5effe2	v3d/v3dv: Use new V3D_MAX_CSD_WG_SIZE = 256 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>	2026-04-30 13:59:18 +00:00
Jose Maria Casanova Crespo	e378a7d773	v3dv: bump maxComputeSharedMemorySize to 32 KB Currently local shared memory is backed by a BO that is read/written using the TMU. ggml-vulkan probes the size of maxComputeSharedMemorySize and rejects V3DV (falling back to CPU) when the value is below what its larger compute pipelines request, although in the end the shaders ollama runs don't actually use shared memory. 32 KB is what ggml-vulkan demands; the value can grow further with no real per-op cost since shared memory currently goes through the TMU like any other BO. V3D OpenGL driver also has 32 KB for SharedMemory. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>	2026-04-30 13:59:18 +00:00
Jose Maria Casanova Crespo	2cd51a6efc	broadcom/compiler: move nir_lower_undef_to_zero out of optimization loop The combination of nir_opt_if and nir_lower_undef_to_zero running inside the optimization loop could make it to not converge. This was exercised by ollama running gemma3 compute shaders. Removing the pass from the optimization loop results in No changes in shader-db. Assisted-by: Claude Opus 4.6 Fixes: `cbe24a0e9c` ("broadcom/compiler: use nir_lower_undef_to_zero") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41256>	2026-04-30 12:30:34 +02:00
Roman Stratiienko	bdbf4ed739	v3dv/android: Add deferred ANB allocation support Fixes: dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#basic dEQP-VK.wsi.android.maintenance1.deferred_alloc.mailbox#bind_image dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#basic dEQP-VK.wsi.android.maintenance1.deferred_alloc.fifo#bind_image Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> Acked-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41235>	2026-04-29 15:31:28 +00:00
Jose Maria Casanova Crespo	3a8d5aeaa1	v3dv: Expose hardware-accelerated integer dot products on V3D 7.1+ Expose VK_KHR_shader_integer_dot_product 4x8-bit packed dot products using native HW instructions v8dot and setnnmode. QPU instruction count for sdot_4x8_iadd compute shader: Before (scalar decomposition): 18 ALU cycles After (setnnmode + v8dot): 3 ALU cycles (6x) We advertise integerDotProduct4x8BitPacked*Accelerated for V3D 7.1+ Assisted-by: Claude Opus 4.6 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>	2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo	8f06961bf5	broadcom/compiler: Eliminate redundant setnnmode instructions This new VIR optimization pass tracks the current NN signedness mode per block and removes duplicate setnnmode instructions. When consecutive dot products use the same signedness mode, the backend emits one setnnmode per dot product. This pass removes the redundant ones, keeping only the first. Assisted-by: Claude Opus 4.6 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>	2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo	24ecc9cbcc	broadcom/compiler: Add v8dot and setnnmode scheduler dependencies. As nnmode register is read by v8dot instruction we need to add dependencies between setnnmode instructions and v8dot via the nnmode register, so they are scheduled correcty using last_nn_mode virtual register.. Add a last_nn_mode virtual register to the scheduler state and create: - Write dependencies for all SETNNMODE variants - Read dependencies for V8DOT. This follows the same pattern as the existing MULTOP/UMUL24 rtop tracking. Assisted-by: Claude Opus 4.6 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>	2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo	33a700be91	broadcom/compiler: hardware-accelerated 4x8-bit dot products on V3D 7.1+ VIR instructions and nir_to_vir implementation of 4x8-bit dot products using native HW accelerated ALU instructions. setnnmode instructions are marked as having side effects. Assisted-by: Claude Opus 4.6 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>	2026-04-29 13:21:08 +00:00
Jose Maria Casanova Crespo	afe4e321e1	broadcom/compiler: Add V3D 7.1 v8dot dot product QPU instructions Add QPU instruction definitions, metadata, and encoding for V3D 7.1 v8dot product instruction and the setnnmode instruction that allows defining the signedness (UU/SU/US/SS) of the v8dot operation. Assisted-by: Claude Opus 4.6 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255>	2026-04-29 13:21:07 +00:00
Emma Anholt	3a8ff22336	ci: Delete references to various broken traces. These are all being removed from the repos, so no need to leave the old notes around. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40959>	2026-04-22 17:39:31 +00:00
Samuel Pitoiset	ebf2797da2	vulkan,treewide: stop passing vk_device to vk_pipeline_robustness_state_fill() This will be helpful for RADV. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41029>	2026-04-21 17:29:04 +00:00

1 2 3 4 5 ...

3676 commits