Commit graph

220648 commits

Author SHA1 Message Date
Maíra Canal
bfe92d50ce v3d: sub-allocate sampler view texture state from state uploader
Previously, each sampler view allocated a dedicated BO for its,
TEXTURE_SHADER_STATE packet (~24 bytes), which got rounded up to a
full 4KB page. This wastes memory and inflates the per-job BO handle
count.

Use u_upload_alloc_ref() to sub-allocate texture shader state from the
shared state_uploader, matching the pattern already used by image views.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
2026-03-27 18:54:29 +00:00
Maíra Canal
751e0d26ec v3d: use the state uploader for the image view texture shader state
From the documentation, the state uploader should be used inside the
driver for long-term state inside buffers, while the stream uploader
should be used by Gallium's internals. Considering that the image view
texture shader state can be considered long-lived state data, use
`state_uploader` instead of `uploader` for consistency.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
2026-03-27 18:54:29 +00:00
Rob Clark
b76678cddd freedreno/a6xx: Fix supported-blit fmt check
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes some KHR-GLES*.core.internalformat.texture2d.* failures.

Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40665>
2026-03-27 17:48:21 +00:00
Julia Zhang
32d04bcdcd vulkan: return pQueue with matching flags
Searching device->queues only according to queueIndex and queueFamilyIndex
could cause this issue: if there are two queues A and B created with same
queueIndex and queueFamilyIndex but different flags. When user try to get
B but vk_foreach_queue loop return A when it get A and find it have the
request queueIndex and queueFamilyIndex.

So this add a check of queue flags and return the queue with matching
flags, queueIndex and queueFamilyIndex.

Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40669>
2026-03-27 17:08:01 +00:00
Trigger Huang
007cfd138d vulkan/queue: pass protected submit info to driver
Pass application's protected submission info to driver

Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40669>
2026-03-27 17:08:01 +00:00
Samuel Pitoiset
dede14cce3 radv: advertise VK_KHR_device_address_commands
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
2026-03-27 16:17:02 +00:00
Samuel Pitoiset
a97c889a7b radv: implement VK_KHR_device_address_commands
Because there is no way to know where the address has been allocated
(GTT or VRAM), the existing entrypoints aren't dropped and the sparse
bit is derived from VK_ADDRESS_COMMAND_FULLY_BOUND_BIT_KHR.

It would be nice to figure out if the CP DMA vs compute heuristic for
GTT BOs on dGPUs could be removed to simplify this implementation.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
2026-03-27 16:17:02 +00:00
Samuel Pitoiset
479a992b02 radv: replace radv_copy_flags by VkAddressCopyFlagsKHR
Same meaning.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
2026-03-27 16:17:02 +00:00
Samuel Pitoiset
72ac5e6d29 radv/ci: fix a typo in radv-navi10-vkcts-full
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Oops.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40679>
2026-03-27 15:53:39 +00:00
Samuel Pitoiset
566e4c25d9 radv/ci: fix radv-slow-skips.txt path
This was causing issues with personal branches.

Suggested-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40676>
2026-03-27 14:53:37 +00:00
Rhys Perry
3b52d61bb0 radv: don't copy radv_vertex_input_state in CmdSetVertexInputEXT
This doubles vkoverhead's draw_16vattrib_change_dynamic performance.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40603>
2026-03-27 13:38:29 +00:00
Georg Lehmann
ae2968c4ec aco: allow spilling to LDS in RT shaders without stack pointer
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No Foz-DB changes because most RT shaders use function calls now.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
133ef9f94b aco: spill VGPRs to LDS if it doesn't further limit occupancy
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.

Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.

Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.

Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Pavel Ondračka
56a6528744 r300/ci: expectation update
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40671>
2026-03-27 10:48:55 +01:00
Tomeu Vizoso
e23fcc1464 ethosu: implement ml_device_destroy for standalone ML device
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Use ralloc_free to release the device allocated by
ethosu_ml_device_create().

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:35:40 +01:00
Tomeu Vizoso
f06b4dbe33 gallium: add ml_device_destroy callback to pipe_ml_device
Add a destroy callback so that standalone ML devices created via
*_ml_device_create() can properly free their resources.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:35:40 +01:00
Tomeu Vizoso
f0e4ccf664 ethosu: handle NULL bias tensor in convolution
PyTorch Conv2d without explicit bias produces a NULL bias_tensor
in the Gallium pipe_ml_operation. Guard against NULL dereferences
in two places:

- ethosu_lower.c: pass NULL to fill_coefs when bias_tensor is NULL
- ethosu_coefs.c: treat missing biases as zero

Fixes crashes when running Conv2d models without bias through the
Ethos-U NPU backend.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:33:52 +01:00
Tomeu Vizoso
e0b401aa87 ethosu: implement ml_subgraph_deserialize()
Add ethosu_ml_subgraph_deserialize() which reconstructs a subgraph
from a serialized byte buffer. Parses the header (cmdstream size,
coefs size, io size, tensors size), restores the tensor array,
cmdstream, and coefficient buffers.

DRM buffer object creation is deferred to prepare_for_submission()
which is called lazily on first invoke.

Wire pctx->ml_subgraph_deserialize in ethosu_create_context().

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:33:52 +01:00
Tomeu Vizoso
6bae0b55d0 gallium: add pipe_context::ml_subgraph_deserialize()
Add ml_subgraph_deserialize() to pipe_context for reconstructing
a previously-serialized ML subgraph at runtime. This complements
ml_subgraph_serialize() on pipe_ml_device and allows the runtime
to load pre-compiled subgraphs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:33:52 +01:00
Tomeu Vizoso
aff92add98 ethosu: Specifying SRAM size in pipe_ml_device ID
The spec format is now GEN-MACS-SRAM, e.g. "65-256-4096".

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:07:12 +01:00
Samuel Pitoiset
eecb37962c radv/amdgpu: always return VK_ERROR_INVALID_EXTERNAL_HANDLE for host ptr imports
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Less confusing than VK_ERROR_UNKNOWN.

Related to https://gitlab.freedesktop.org/mesa/mesa/-/issues/15144.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40574>
2026-03-27 07:07:31 +00:00
Lionel Landwerlin
fa523aedd0 brw: fence SLM writes between workgroups
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
On LSC platforms the SLM writes are unfenced between workgroups. This
means a workgroup W1 finishing might have uncompleted SLM writes.
Another workgroup W2 dispatched after W1 which gets allocated an
overlapping SLM location might have writes that race with the previous
W1 operations.

The solution to this is fence all write operations (store & atomics)
of a workgroup before ending the threads. We do this by emitting a
single SLM fence either at the end of the shader or if there is only a
single unfenced right, at the end of that block.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13924
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40430>
2026-03-26 22:38:55 +00:00
Christian Gmeiner
32ca98a26e panvk: Advertise VK_EXT_shader_atomic_float
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Expose float32 atomic exchange support for buffer, shared, and image
operations on all architectures. The existing axchg instruction is
type-agnostic, so no compiler changes are needed. Image atomics are
already lowered to global atomics via nir_lower_image_atomics_to_global.

Also add R32_FLOAT to the STORAGE_IMAGE_ATOMIC format feature flag so
image atomic operations are accepted for r32f images.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40506>
2026-03-26 21:28:49 +00:00
Lorenzo Rossi
245d54397d pan/compiler: Make lower_vs_outputs write needs_extended_fifo
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>
2026-03-26 20:53:21 +00:00
Lorenzo Rossi
445a22acbd pan/compiler: Group outputs in lower_vs_outputs
Previously bifrost_nir_lower_shader_output grouped outputs in separate
if blocks and made a best-effort attempt to group them together.  This
also assumed that pan_nir_lower_store_component wrote each output only
once and that nir_lower_io_vars_to_temporaries pulled them out of any
control flow.

Now all of these are handled by the new pan_nir_lower_vs_outputs pass
that handles write masks, control flow, per_view and grouping for IDVS.
This makes the overall dependencies much simpler, ensures that the
stores are grouped in the same ifs and should be more robust.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>
2026-03-26 20:53:21 +00:00
Lorenzo Rossi
66bee415ad pan/compiler: Split lower_varyings_io into fs_inputs and vs_outputs
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>
2026-03-26 20:53:21 +00:00
Lorenzo Rossi
b1acc1aa89 pan/compiler: Refactor va_shader_output_from_ in common code
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>
2026-03-26 20:53:20 +00:00
Lorenzo Rossi
024c66ec0f pan: Add PAN_MAX_MULTIVIEW_VIEW_COUNT
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>
2026-03-26 20:53:20 +00:00
Lorenzo Rossi
922405ab71 panfrost/bi: Separate va_shader_output from bitmasks
The new pass will need to iterate on the enum varyings

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40537>
2026-03-26 20:53:20 +00:00
Zan Dobersek
d5b9411331 fd: support a8xx in rddecompiler
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Propery initialize rddecompiler's RNN instance for a8xx dumps.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40661>
2026-03-26 18:46:07 +00:00
emre
fe558d8328 nvk: fix barrier cache invalidation
Fixes: e1c1cdbd5f ("nvk: Implement vkCmdPipelineBarrier2 for real")
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40637>
2026-03-26 18:29:16 +00:00
Icenowy Zheng
252904f3d1 pvr: consider the size of DMA request when setting msize of DDMADT
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The DDMADT instruction of PDS has out-of-bound test capability, which is
used for implementation of robust vertex input fetch.

According to the pseudocode in the comment block before the "LAST DDMAD"
mark in pvr_pipeline_pds.c, the check is between
`calculated_source_address + (burst_size << 2)` and `base_address +
buffer_size`, in which the `burst_size` seems to correspond to the BSIZE
field set in the low 32-bit of DDMAD(T) src3 and the `buffer_size`
corresponds to the MSIZE field set in the DDMADT-specific high 32-bit of
src3. As the calculated source address is just the base address adds the
multiplication result (the offset), the base address could be eliminated
from the check, results in the check between `offset + (BSIZE * 4)` and
`MSIZE` .

Naturally it's expected to just set the MSIZE field to the buffer size.
In addition, as the Vulkan spec says "Reads from a vertex input MAY
instead be bounds checked against a range rounded down to the nearest
multiple of the stride of its binding", the driver rounds down the
accessible buffer size before setting MSIZE to it.

However when running OpenGL ES 2.0 CTS, two problems are exhibited about
the setting of the size to check:

- dEQP-GLES2.functional.buffer.write.basic.array_stream_draw sets up a
  VBO with 3 bytes per vertex (RGB colors and 1B per color) and 340
  vertices (results in a buffer size of 1020 = 0x3fc). However as the
  DMA request size, which is specified by BSIZE, is counted by dwords,
  3 bytes are rounded up to 1 dword (which is 4 bytes). When the bound
  check of the last vertex happens, the vertex's DMA start offset is
  0x3f9, so the DDMADT check happens between 0x3fd (0x3f9 + 1 * 4) and
  0x3fc, and indicates a check failure. This prevents the last vertex,
  which is perfectly in-bound, from being properly fetched; this is
  against the Vulkan specification, and needs to be fixed.
- dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.
  buffer_0_32_float2_vec4_dynamic_draw_quads_1 sets up a VBO with a size
  of 168 bytes, and tries to draw 6 vertices (each vertex consumes 2
  floats (thus 8 bytes) of attribute) with a stride of 32 bytes using
  this VBO. Zink then translates the VBO to a Vulkan vertex buffer bound
  with size = 168B, stride = 32B. Here the optional rule about rounding
  down buffer size happens in the current PowerVR driver, and the
  checked bound is rounded down to 160B, which prevented the last
  vertex's 8B attributes to be fetched. It looks like this kind of
  situation is considered in the codepath without DDMADT, but omitted
  for the codepath utilizing DDMADT for bound check.

So this patch tries to mimic the behavior of DDMADT when setting the
MSIZE field of it to prevent false out-of-bounds. It first calculates
the offset of the last valid vertex DMA, then adds the DMA request size
to it to form the final MSIZE value. With the code calculating the last
valid DMA offset considering the situation of fetching the attribute
from the space after the last whole multiple of stride, both problems
mentioned above are solved by this rework.

There're 99 GLES CTS testcases fixed by this change, and Vulkan CTS
shows no regression on `dEQP-VK.robustness.robustness1_vertex_access.*`
tests.

Fixes: 4873903b56 ("pvr: Enable PDS_DDMADT")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>
2026-03-26 18:12:06 +00:00
Icenowy Zheng
d992474be9 pvr: move PVR_BUFFER_MEMORY_PADDING_SIZE definition to pvr_buffer.h
This memory padding is enforced by GetBufferMemoryRequirements2 and
might be then checked against to decide whether it's enough.

Move it to pvr_buffer.h for further assertions.

Backport-to: 25.3
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>
2026-03-26 18:12:06 +00:00
Icenowy Zheng
aa8dad141c pvr: save vertex attribute size for DMA checking
Currently the size of single components inside one attribute is saved
and checked against when checking DMA capability. However, the vertex
attribute DMA happens for a whole attribute instead of individually for
its components, so checking against the component size is useless -- the
size of the whole attribute is what needs to be saved and checked.

Rename all component_size_in_bytes fields to attrib_size_in_bytes, and
save the size of the whole attribute inside them.

Fixes: 8991e64641 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>
2026-03-26 18:12:06 +00:00
Icenowy Zheng
caea72cffc pvr: fix "obb" typo in oob_buffer_size when building vertex pds data
The ddmadt_oob_buffer_size structure to be filled is named
`obb_buffer_size`, which is obviously a typo.

Change to `oob_buffer_size` to fix the typo.

Fixes: 8991e64641 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Ella Stanforth <ella@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40528>
2026-03-26 18:12:06 +00:00
Faith Ekstrand
c2fc7d49e8 pan/bi: Rework mem_vectorize_cb
Intstead of focusing on numbers of components and bit sizes, focus on
the total number of bytes read.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:40 +00:00
Faith Ekstrand
a5801b1a23 pan/bi: Simplify unpack_64_2x32_split_*
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:40 +00:00
Faith Ekstrand
79f8c1ca9a pan/bi: Unify handling of unpack_*
These are just a fancy mov on Mali.  We need to use bi_make_vec_to()
because it handles 64-bit movs as well.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:40 +00:00
Faith Ekstrand
6cc10835b6 pan/bi: Unify handling of pack_*
These are the same as the split versions and vecN except they have one
source and they use src[0].swizzle[i] instead of src[i].swizzle[0].
While we're here, it's trivial to implement pack_64_4x16 as well.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:39 +00:00
Faith Ekstrand
56f5899786 pan/bi: Handle pack_*_split with vecN
They're literally the same thing since vectors are packed.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:39 +00:00
Faith Ekstrand
682ab923e6 pan/bi: Move nir_op_mov handling to the top
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:39 +00:00
Faith Ekstrand
0b029f319f pan/bi: Properly handle large 8-bit vectors in bi_alu_src_index()
Previously, we used bi_src_index() directly and ignored the offset we
took all that care to calculate at the top of the function.  For most
cases, this is fine since the offset is 0.  But if we ever have an i8v8,
or larger, this doesn't work.  It's not really more work to handle this
case.  All we have to do is use the offset and &3 the swizzle.  It just
means we can't have false code sharing with the bi_make_vec_to() case.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:38 +00:00
Faith Ekstrand
9950a98d5e pan/bi: Handle 64-bit sources in bi_alu_src_index()
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:38 +00:00
Faith Ekstrand
4be0e46e61 pan/bi: Allow 64-bit vectors in bi_make_vec_to()
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:38 +00:00
Faith Ekstrand
f09f080835 pan/bi: Vectorize SSBOs when not robust
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:37 +00:00
Faith Ekstrand
3c1a1d2006 panvk: Replace robust2_modes with robust_modes
There's no real difference for us between robustness and robustness2.
The only thing robust_modes does in nir_opt_load_store_vectorize() is to
tell it to be a bit more careful about integer overflow in address
calculations so you don't end up wrapping something around and getting a
non-zero load when you should have gotten an zero from OOB.  There's no
good reason why we should only set it for robustness2.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:37 +00:00
Faith Ekstrand
fb7e1fe81c pan/bi: Always vectorize UBO access
Now that we claim 16B robustness alignments, we can vectorize UBO
access, even when robustness2 is enabled.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:36 +00:00
Faith Ekstrand
3bbacfe8d7 panvk: Set min_ubo/ssbo_alignment in spirv_options
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:36 +00:00
Faith Ekstrand
e52e7019b9 panvk: Increase robust buffer access alignments
We can't go any higher than 4B for SSBOs but we can go up to 16B for
UBOs.  This will let us start vectorizing UBO access, even when robust
because max-size loads (LD_PKA.i128) will never overrun a binding unless
they're entirely outside the binding.

Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:36 +00:00
Faith Ekstrand
f350a69759 panvk: Track which dynamic buffers are SSBOs
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40576>
2026-03-26 16:28:36 +00:00