fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-06 02:58:05 +02:00

Author	SHA1	Message	Date
Maíra Canal	bfe92d50ce	v3d: sub-allocate sampler view texture state from state uploader Previously, each sampler view allocated a dedicated BO for its, TEXTURE_SHADER_STATE packet (~24 bytes), which got rounded up to a full 4KB page. This wastes memory and inflates the per-job BO handle count. Use u_upload_alloc_ref() to sub-allocate texture shader state from the shared state_uploader, matching the pattern already used by image views. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>	2026-03-27 18:54:29 +00:00
Maíra Canal	751e0d26ec	v3d: use the state uploader for the image view texture shader state From the documentation, the state uploader should be used inside the driver for long-term state inside buffers, while the stream uploader should be used by Gallium's internals. Considering that the image view texture shader state can be considered long-lived state data, use `state_uploader` instead of `uploader` for consistency. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>	2026-03-27 18:54:29 +00:00
Rob Clark	b76678cddd	freedreno/a6xx: Fix supported-blit fmt check Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Fixes some KHR-GLES.core.internalformat.texture2d. failures. Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40665>	2026-03-27 17:48:21 +00:00
Julia Zhang	32d04bcdcd	vulkan: return pQueue with matching flags Searching device->queues only according to queueIndex and queueFamilyIndex could cause this issue: if there are two queues A and B created with same queueIndex and queueFamilyIndex but different flags. When user try to get B but vk_foreach_queue loop return A when it get A and find it have the request queueIndex and queueFamilyIndex. So this add a check of queue flags and return the queue with matching flags, queueIndex and queueFamilyIndex. Signed-off-by: Julia Zhang <Julia.Zhang@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40669>	2026-03-27 17:08:01 +00:00
Trigger Huang	007cfd138d	vulkan/queue: pass protected submit info to driver Pass application's protected submission info to driver Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40669>	2026-03-27 17:08:01 +00:00
Samuel Pitoiset	dede14cce3	radv: advertise VK_KHR_device_address_commands Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>	2026-03-27 16:17:02 +00:00
Samuel Pitoiset	a97c889a7b	radv: implement VK_KHR_device_address_commands Because there is no way to know where the address has been allocated (GTT or VRAM), the existing entrypoints aren't dropped and the sparse bit is derived from VK_ADDRESS_COMMAND_FULLY_BOUND_BIT_KHR. It would be nice to figure out if the CP DMA vs compute heuristic for GTT BOs on dGPUs could be removed to simplify this implementation. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>	2026-03-27 16:17:02 +00:00
Samuel Pitoiset	479a992b02	radv: replace radv_copy_flags by VkAddressCopyFlagsKHR Same meaning. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>	2026-03-27 16:17:02 +00:00
Samuel Pitoiset	72ac5e6d29	radv/ci: fix a typo in radv-navi10-vkcts-full Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Oops. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40679>	2026-03-27 15:53:39 +00:00
Samuel Pitoiset	566e4c25d9	radv/ci: fix radv-slow-skips.txt path This was causing issues with personal branches. Suggested-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40676>	2026-03-27 14:53:37 +00:00
Rhys Perry	3b52d61bb0	radv: don't copy radv_vertex_input_state in CmdSetVertexInputEXT This doubles vkoverhead's draw_16vattrib_change_dynamic performance. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40603>	2026-03-27 13:38:29 +00:00
Georg Lehmann	ae2968c4ec	aco: allow spilling to LDS in RT shaders without stack pointer Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details No Foz-DB changes because most RT shaders use function calls now. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>	2026-03-27 13:08:44 +00:00
Georg Lehmann	133ef9f94b	aco: spill VGPRs to LDS if it doesn't further limit occupancy Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr. Limit to single wave workgroups, to avoid needing the wave_id for the offset. If we have a scratch stack pointer, don't use LDS at all. Limit LDS spilling to not reduce occupancy further. Note that in theory, this can still limit occupancy of other shaders running on the CU at the same time, but that's unlikely and impossible to know at this point. Removes all scratch usage in emulated FSR4 and parallel_rdp. Besides that, only a single GoW shader is affected. Foz-DB Navi31: Totals from 9 (0.01% of 114641) affected shaders: Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02% CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02% LDS: 2048 -> 45056 (+2100.00%) Scratch: 261888 -> 220672 (-15.74%) Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00% InvThroughput: 418644 -> 383269 (-8.45%) VClause: 1506 -> 1200 (-20.32%) Copies: 10651 -> 10624 (-0.25%) VALU: 48700 -> 48684 (-0.03%) SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03% VMEM: 4139 -> 3589 (-13.29%) VOPD: 580 -> 574 (-1.03%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>	2026-03-27 13:08:44 +00:00
Pavel Ondračka	56a6528744	r300/ci: expectation update Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40671>	2026-03-27 10:48:55 +01:00
Tomeu Vizoso	e23fcc1464	ethosu: implement ml_device_destroy for standalone ML device Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Use ralloc_free to release the device allocated by ethosu_ml_device_create(). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>	2026-03-27 09:35:40 +01:00
Tomeu Vizoso	f06b4dbe33	gallium: add ml_device_destroy callback to pipe_ml_device Add a destroy callback so that standalone ML devices created via *_ml_device_create() can properly free their resources. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>	2026-03-27 09:35:40 +01:00
Tomeu Vizoso	f0e4ccf664	ethosu: handle NULL bias tensor in convolution PyTorch Conv2d without explicit bias produces a NULL bias_tensor in the Gallium pipe_ml_operation. Guard against NULL dereferences in two places: - ethosu_lower.c: pass NULL to fill_coefs when bias_tensor is NULL - ethosu_coefs.c: treat missing biases as zero Fixes crashes when running Conv2d models without bias through the Ethos-U NPU backend. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>	2026-03-27 09:33:52 +01:00
Tomeu Vizoso	e0b401aa87	ethosu: implement ml_subgraph_deserialize() Add ethosu_ml_subgraph_deserialize() which reconstructs a subgraph from a serialized byte buffer. Parses the header (cmdstream size, coefs size, io size, tensors size), restores the tensor array, cmdstream, and coefficient buffers. DRM buffer object creation is deferred to prepare_for_submission() which is called lazily on first invoke. Wire pctx->ml_subgraph_deserialize in ethosu_create_context(). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>	2026-03-27 09:33:52 +01:00
Tomeu Vizoso	6bae0b55d0	gallium: add pipe_context::ml_subgraph_deserialize() Add ml_subgraph_deserialize() to pipe_context for reconstructing a previously-serialized ML subgraph at runtime. This complements ml_subgraph_serialize() on pipe_ml_device and allows the runtime to load pre-compiled subgraphs. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>	2026-03-27 09:33:52 +01:00
Tomeu Vizoso	aff92add98	ethosu: Specifying SRAM size in pipe_ml_device ID The spec format is now GEN-MACS-SRAM, e.g. "65-256-4096". Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>	2026-03-27 09:07:12 +01:00
Samuel Pitoiset	eecb37962c	radv/amdgpu: always return VK_ERROR_INVALID_EXTERNAL_HANDLE for host ptr imports Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Less confusing than VK_ERROR_UNKNOWN. Related to https://gitlab.freedesktop.org/mesa/mesa/-/issues/15144. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40574>	2026-03-27 07:07:31 +00:00
Lionel Landwerlin	fa523aedd0	brw: fence SLM writes between workgroups Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details On LSC platforms the SLM writes are unfenced between workgroups. This means a workgroup W1 finishing might have uncompleted SLM writes. Another workgroup W2 dispatched after W1 which gets allocated an overlapping SLM location might have writes that race with the previous W1 operations. The solution to this is fence all write operations (store & atomics) of a workgroup before ending the threads. We do this by emitting a single SLM fence either at the end of the shader or if there is only a single unfenced right, at the end of that block. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13924 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40430>	2026-03-26 22:38:55 +00:00
Christian Gmeiner	32ca98a26e	panvk: Advertise VK_EXT_shader_atomic_float Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Expose float32 atomic exchange support for buffer, shared, and image operations on all architectures. The existing axchg instruction is type-agnostic, so no compiler changes are needed. Image atomics are already lowered to global atomics via nir_lower_image_atomics_to_global. Also add R32_FLOAT to the STORAGE_IMAGE_ATOMIC format feature flag so image atomic operations are accepted for r32f images. Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40506>	2026-03-26 21:28:49 +00:00
Lorenzo Rossi	245d54397d	pan/compiler: Make lower_vs_outputs write needs_extended_fifo Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>	2026-03-26 20:53:21 +00:00
Lorenzo Rossi	445a22acbd	pan/compiler: Group outputs in lower_vs_outputs Previously bifrost_nir_lower_shader_output grouped outputs in separate if blocks and made a best-effort attempt to group them together. This also assumed that pan_nir_lower_store_component wrote each output only once and that nir_lower_io_vars_to_temporaries pulled them out of any control flow. Now all of these are handled by the new pan_nir_lower_vs_outputs pass that handles write masks, control flow, per_view and grouping for IDVS. This makes the overall dependencies much simpler, ensures that the stores are grouped in the same ifs and should be more robust. Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>	2026-03-26 20:53:21 +00:00
Lorenzo Rossi	66bee415ad	pan/compiler: Split lower_varyings_io into fs_inputs and vs_outputs Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>	2026-03-26 20:53:21 +00:00
Lorenzo Rossi	b1acc1aa89	pan/compiler: Refactor va_shader_output_from_ in common code Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>	2026-03-26 20:53:20 +00:00
Lorenzo Rossi	024c66ec0f	pan: Add PAN_MAX_MULTIVIEW_VIEW_COUNT Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>	2026-03-26 20:53:20 +00:00
Lorenzo Rossi	922405ab71	panfrost/bi: Separate va_shader_output from bitmasks The new pass will need to iterate on the enum varyings Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>	2026-03-26 20:53:20 +00:00
Zan Dobersek	d5b9411331	fd: support a8xx in rddecompiler Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details Propery initialize rddecompiler's RNN instance for a8xx dumps. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40661>	2026-03-26 18:46:07 +00:00
emre	fe558d8328	nvk: fix barrier cache invalidation Fixes: `e1c1cdbd5f` ("nvk: Implement vkCmdPipelineBarrier2 for real") Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40637>	2026-03-26 18:29:16 +00:00
Icenowy Zheng	252904f3d1	pvr: consider the size of DMA request when setting msize of DDMADT Some checks are pending macOS-CI / macOS-CI (dri) (push) Waiting to run Details macOS-CI / macOS-CI (xlib) (push) Waiting to run Details The DDMADT instruction of PDS has out-of-bound test capability, which is used for implementation of robust vertex input fetch. According to the pseudocode in the comment block before the "LAST DDMAD" mark in pvr_pipeline_pds.c, the check is between `calculated_source_address + (burst_size << 2)` and `base_address + buffer_size`, in which the `burst_size` seems to correspond to the BSIZE field set in the low 32-bit of DDMAD(T) src3 and the `buffer_size` corresponds to the MSIZE field set in the DDMADT-specific high 32-bit of src3. As the calculated source address is just the base address adds the multiplication result (the offset), the base address could be eliminated from the check, results in the check between `offset + (BSIZE * 4)` and `MSIZE` . Naturally it's expected to just set the MSIZE field to the buffer size. In addition, as the Vulkan spec says "Reads from a vertex input MAY instead be bounds checked against a range rounded down to the nearest multiple of the stride of its binding", the driver rounds down the accessible buffer size before setting MSIZE to it. However when running OpenGL ES 2.0 CTS, two problems are exhibited about the setting of the size to check: - dEQP-GLES2.functional.buffer.write.basic.array_stream_draw sets up a VBO with 3 bytes per vertex (RGB colors and 1B per color) and 340 vertices (results in a buffer size of 1020 = 0x3fc). However as the DMA request size, which is specified by BSIZE, is counted by dwords, 3 bytes are rounded up to 1 dword (which is 4 bytes). When the bound check of the last vertex happens, the vertex's DMA start offset is 0x3f9, so the DDMADT check happens between 0x3fd (0x3f9 + 1 * 4) and 0x3fc, and indicates a check failure. This prevents the last vertex, which is perfectly in-bound, from being properly fetched; this is against the Vulkan specification, and needs to be fixed. - dEQP-GLES2.functional.vertex_arrays.single_attribute.strides. buffer_0_32_float2_vec4_dynamic_draw_quads_1 sets up a VBO with a size of 168 bytes, and tries to draw 6 vertices (each vertex consumes 2 floats (thus 8 bytes) of attribute) with a stride of 32 bytes using this VBO. Zink then translates the VBO to a Vulkan vertex buffer bound with size = 168B, stride = 32B. Here the optional rule about rounding down buffer size happens in the current PowerVR driver, and the checked bound is rounded down to 160B, which prevented the last vertex's 8B attributes to be fetched. It looks like this kind of situation is considered in the codepath without DDMADT, but omitted for the codepath utilizing DDMADT for bound check. So this patch tries to mimic the behavior of DDMADT when setting the MSIZE field of it to prevent false out-of-bounds. It first calculates the offset of the last valid vertex DMA, then adds the DMA request size to it to form the final MSIZE value. With the code calculating the last valid DMA offset considering the situation of fetching the attribute from the space after the last whole multiple of stride, both problems mentioned above are solved by this rework. There're 99 GLES CTS testcases fixed by this change, and Vulkan CTS shows no regression on `dEQP-VK.robustness.robustness1_vertex_access.*` tests. Fixes: `4873903b56` ("pvr: Enable PDS_DDMADT") Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn> Reviewed-by: Ella Stanforth <ella@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>	2026-03-26 18:12:06 +00:00
Icenowy Zheng	d992474be9	pvr: move PVR_BUFFER_MEMORY_PADDING_SIZE definition to pvr_buffer.h This memory padding is enforced by GetBufferMemoryRequirements2 and might be then checked against to decide whether it's enough. Move it to pvr_buffer.h for further assertions. Backport-to: 25.3 Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn> Reviewed-by: Ella Stanforth <ella@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>	2026-03-26 18:12:06 +00:00
Icenowy Zheng	aa8dad141c	pvr: save vertex attribute size for DMA checking Currently the size of single components inside one attribute is saved and checked against when checking DMA capability. However, the vertex attribute DMA happens for a whole attribute instead of individually for its components, so checking against the component size is useless -- the size of the whole attribute is what needs to be saved and checked. Rename all component_size_in_bytes fields to attrib_size_in_bytes, and save the size of the whole attribute inside them. Fixes: `8991e64641` ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs") Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn> Reviewed-by: Ella Stanforth <ella@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>	2026-03-26 18:12:06 +00:00
Icenowy Zheng	caea72cffc	pvr: fix "obb" typo in oob_buffer_size when building vertex pds data The ddmadt_oob_buffer_size structure to be filled is named `obb_buffer_size`, which is obviously a typo. Change to `oob_buffer_size` to fix the typo. Fixes: `8991e64641` ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs") Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn> Reviewed-by: Ella Stanforth <ella@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>	2026-03-26 18:12:06 +00:00
Faith Ekstrand	c2fc7d49e8	pan/bi: Rework mem_vectorize_cb Intstead of focusing on numbers of components and bit sizes, focus on the total number of bytes read. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:40 +00:00
Faith Ekstrand	a5801b1a23	pan/bi: Simplify unpack_64_2x32_split_* Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:40 +00:00
Faith Ekstrand	79f8c1ca9a	pan/bi: Unify handling of unpack_* These are just a fancy mov on Mali. We need to use bi_make_vec_to() because it handles 64-bit movs as well. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:40 +00:00
Faith Ekstrand	6cc10835b6	pan/bi: Unify handling of pack_* These are the same as the split versions and vecN except they have one source and they use src[0].swizzle[i] instead of src[i].swizzle[0]. While we're here, it's trivial to implement pack_64_4x16 as well. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:39 +00:00
Faith Ekstrand	56f5899786	pan/bi: Handle pack_*_split with vecN They're literally the same thing since vectors are packed. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:39 +00:00
Faith Ekstrand	682ab923e6	pan/bi: Move nir_op_mov handling to the top Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:39 +00:00
Faith Ekstrand	0b029f319f	pan/bi: Properly handle large 8-bit vectors in bi_alu_src_index() Previously, we used bi_src_index() directly and ignored the offset we took all that care to calculate at the top of the function. For most cases, this is fine since the offset is 0. But if we ever have an i8v8, or larger, this doesn't work. It's not really more work to handle this case. All we have to do is use the offset and &3 the swizzle. It just means we can't have false code sharing with the bi_make_vec_to() case. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:38 +00:00
Faith Ekstrand	9950a98d5e	pan/bi: Handle 64-bit sources in bi_alu_src_index() Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:38 +00:00
Faith Ekstrand	4be0e46e61	pan/bi: Allow 64-bit vectors in bi_make_vec_to() Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:38 +00:00
Faith Ekstrand	f09f080835	pan/bi: Vectorize SSBOs when not robust Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:37 +00:00
Faith Ekstrand	3c1a1d2006	panvk: Replace robust2_modes with robust_modes There's no real difference for us between robustness and robustness2. The only thing robust_modes does in nir_opt_load_store_vectorize() is to tell it to be a bit more careful about integer overflow in address calculations so you don't end up wrapping something around and getting a non-zero load when you should have gotten an zero from OOB. There's no good reason why we should only set it for robustness2. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:37 +00:00
Faith Ekstrand	fb7e1fe81c	pan/bi: Always vectorize UBO access Now that we claim 16B robustness alignments, we can vectorize UBO access, even when robustness2 is enabled. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:36 +00:00
Faith Ekstrand	3bbacfe8d7	panvk: Set min_ubo/ssbo_alignment in spirv_options Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:36 +00:00
Faith Ekstrand	e52e7019b9	panvk: Increase robust buffer access alignments We can't go any higher than 4B for SSBOs but we can go up to 16B for UBOs. This will let us start vectorizing UBO access, even when robust because max-size loads (LD_PKA.i128) will never overrun a binding unless they're entirely outside the binding. Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:36 +00:00
Faith Ekstrand	f350a69759	panvk: Track which dynamic buffers are SSBOs Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com> Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com> Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>	2026-03-26 16:28:36 +00:00

... 3 4 5 6 7 ...

220648 commits