Commit graph

220480 commits

Author SHA1 Message Date
Benjamin Cheng
34e090ae11 radv/video: Add low-latency flags
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
radv equivalent of 62f07b8c.

Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40524>
2026-03-29 15:56:50 +00:00
Benjamin Cheng
917dff0b22 ac: Update FW required for variable slice mode
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There are some compatiblity issues with variable slice mode and
preencode that are fixed with newer FW.

Fixes: d9ba641e28 ("ac: Add variable slice mode interface")
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40604>
2026-03-29 15:30:55 +00:00
Benjamin Cheng
bb6d57c90d radeonsi/vcn: Reorder get_slice_ctrl_param
This will need to depend on quality_modes.pre_encode_mode, so reorder
the calls to make it possible.

Fixes: d9ba641e28 ("ac: Add variable slice mode interface")
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40604>
2026-03-29 15:30:55 +00:00
Alyssa Rosenzweig
aebd76415b agx: drop NIR continue handling
Since 31af989270 ("nir/lower_continue_constructs: Simplify loops before
lowering continue constructs"), we never ingest loops with continues. That lets
us delete a bunch of now dead code (and outdated comments) around control flow.

This patch is part of the treewide effort to improve loops in NIR. I already
sent the Intel patch earlier this week and this weekend hit delete here too.
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40609

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40690>
2026-03-29 14:06:14 +00:00
Kenneth Graunke
ca3cabd2f8 brw: Use nir_texop_resinfo_intel for query_levels and txs
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This eliminates the need to special case query_levels.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40451>
2026-03-29 12:53:10 +00:00
Kenneth Graunke
0e143ae663 nir: Add nir_texop_resinfo_intel
This is a combination of txs and query_levels in a single vec4 result.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40451>
2026-03-29 12:53:09 +00:00
Georg Lehmann
e7077e8f5c nir/lower_non_uniform_access: fix fusing loops for same index but different array variable
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
struct nu_handle is hashed and deduplicated using struct nu_handle_key, which ignored
parent_deref. That means all instructions will use the first parent_deref when rewriting
the sources.

Avoid this by not including the parent deref in the struct, and instead querying it
when needed.

Fixes: 4d09cd7fa5 ("nir/lower_non_uniform_access: Group accesses using the same resource")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15173
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40654>
2026-03-29 08:31:51 +00:00
Rob Clark
6fb261147b freedreno: Add a829
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15124
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40655>
2026-03-28 21:19:58 +00:00
Rob Clark
04f9a82705 freedreno/common: Drop gen8 0x78000 offset
Initially I'd added the offset to make things match up to blob driver on
x2-85/a840.  But this gets in the way on parts with smaller GMEM.

Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40655>
2026-03-28 21:19:58 +00:00
Yiwei Zhang
a2e42eff52 ci/panvk: update expectations with new flakes
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40682>
2026-03-28 20:16:09 +00:00
Yiwei Zhang
73c9d35644 panvk: hide swapchainMaintenance1 behind WSI guard
Fixes: 9ec387efb1 ("panvk: advertise wsi maintenance extensions")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40682>
2026-03-28 20:16:09 +00:00
Marek Olšák
c361c82a5a radeonsi: draw using a single triangle in u_blitter
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency*
when not using the rectangle path.

Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40634>
2026-03-28 18:47:55 +00:00
Marek Olšák
6ce1b12a76 radeonsi: sink si_get_pipe_constant_buffer in si_blitter_begin
Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40634>
2026-03-28 18:47:55 +00:00
Marek Olšák
7f846bc50a radeonsi: remove always-set SI_SAVE_FRAGMENT_STATE
Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40634>
2026-03-28 18:47:55 +00:00
Marek Olšák
2dc65308f8 radeonsi: add 64K texture support to gfx blits
Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40634>
2026-03-28 18:47:55 +00:00
Marek Olšák
918e5764f4 radeonsi: disable streamout queries for u_blitter
Cc: mesa-stable
Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40634>
2026-03-28 18:47:55 +00:00
Marek Olšák
556ceb1b75 radeonsi: fix blits via util_blitter_draw_rectangle
It didn't save states properly. The only correct place to save them is
si_blitter_begin. Unfortunately, we can't skip saving and restoring
those states because we don't know in advance whether the rectangle path
will be used.

Cc: mesa-stable
Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40634>
2026-03-28 18:47:54 +00:00
Marek Olšák
ea9a31cc8c gallium/u_blitter: allow using the single triangle for scaled blits too
This should be faster because 2 triangles are inefficient on the diagonal,
generating helper invocations and potentially extra memory loads from dst
because tiles aren't fully covered.

Reviewed-by: Pierre-Eric
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40633>
2026-03-28 18:01:40 +00:00
Natalie Vock
1f9bc71051 radv/rt: Remove RADV_OFFSET_UNUSED
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
RADV_OFFSET_UNUSED became unused, itself.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39985>
2026-03-28 16:48:46 +01:00
Natalie Vock
579feda38b radv/rt: Fix cases in which the bound BVH build pipeline gets clobbered
The most egregious case was AS updates, in which case radv_copy_memory
would decide to use compute, which overwrites the bound pipeline with
a copy shader. Subsequent dispatches assumed the update pipeline to be
bound, but dispatched another copy shader instead.

There is also a chance of this happening for geometry info copying for
RRA, so add another pass for that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39985>
2026-03-28 16:48:46 +01:00
Natalie Vock
e713527aa9 vulkan: Bump MAX_ENCODE_PASSES
RADV needs one more encode pass for a bugfix in the next commit.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39985>
2026-03-28 16:12:09 +01:00
Natalie Vock
6f80027447 vulkan: Rename {encode,update}_bind_pipeline to {encode,update}_prepare
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39985>
2026-03-28 16:12:09 +01:00
Icenowy Zheng
ee031d67b4 pvr: fix dirty tracking for stencil ops
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The dirty state of stencil ops is not checked when deciding whether to
rebuild the ISP state, although the values are part of the ISP state
(the 27:16 bits of ISPB word).

Add MESA_VK_DYNAMIC_DS_STENCIL_OP to the condition for rebuilding ISP
control registers.

Fixes GLCTS tests when running on top of Zink:
dEQP-GLES2.functional.fragment_ops.stencil.zero_stencil_fail

Fixes: 88f1fad3f7 ("pvr: Use common pipeline & dynamic state frameworks")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40623>
2026-03-28 19:39:01 +08:00
Icenowy Zheng
71880a2911 pvr: support VK_EXT_non_seamless_cube_map
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
When running GLES2 conformance tests with Zink on the PowerVR driver, I
found that the PowerVR driver has the same kind of weird behavior of not
ignoreing wrap mode for seamless cubes with Apple AGX (See !21978 for
the description of the quirk on AGX).

As GLES2 exposes non-seamless cubes, exposing non-seamless cube support
at PowerVR help seems to help lot about these GLES2 tests. Implementing
full GLES 3 and relying on the workaround for AGX is another choice, but
it's still too far.

Implementing non-seamless cube seems to be as easy as setting a bit in
the sampler control word, so do it.

Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40660>
2026-03-28 11:17:12 +00:00
Zan Dobersek
468113efd4 fd/replay: kgsl context should use no-fault tolerance, report reset state
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Use KGSL_CONTEXT_NO_FAULT_TOLERANCE to push context into an error state
when a GPU fault is detected. This is useful when dealing with replays of
captures that are producing a GPU fault but might seem to replay just fine
because of the KGSL kernel fault tolerance.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40667>
2026-03-28 07:58:05 +00:00
Olivia Lee
8d5ba04e65 panvk/csf: use different resource registers for precomp vs user dispatch
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This allows us to avoid dirtying all of the state for user compute
dispatches when we run a precomp shader.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37970>
2026-03-28 03:53:41 +00:00
Icenowy Zheng
ea783b4691 vulkan/wsi/headless: implement wait_for_present for swapchain
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The VK_KHR_present_wait extension contains no functionality to announce
(the lack of) support for vkWaitForPresentKHR() on a WSI (or WSI-bound
object) granularity.

On any driver advertising that extension and the headless WSI, the
application will expect vkWaitForPresentKHR() to be usable with the
headless WSI, which leads to assertion failure in debug Mesa builds or
crash in release builds.

Create a trivial wait_for_present implementation for the headless WSI,
which just assumes the image is immediately presented at the time of
queue_present is called, so it only checks the WSI present semaphore.

Tested with `dEQP-VK.wsi.headless.present_id_wait.wait.*` on RADV
without any failures.

Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40347>
2026-03-27 19:55:11 +00:00
Icenowy Zheng
2f540283b3 vulkan/wsi/headless: properly cleanup swapchain init failure
Currently the wsi_headless_surface_create_swapchain() function abuses
the corresponding destroy function to perform cleanup operations when
any failure happens during images creation. This practice sounds
fragile and prevents further changes to the swapchain creation
procedure.

Implement a proper cleanup sequence to reverse all operations.

As another cleanup codepath above already contains call of vk_free(),
the call is changed to a goto targetting the corresponding label.

Regression tested with `dEQP-VK.wsi.headless.swapchain.simulate_oom.*`
on RADV.

Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40347>
2026-03-27 19:55:11 +00:00
Lorenzo Rossi
c0e0591999 pan/compiler: Replace frag_coord_zw_pan with var_special_pan
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Just a bit cleaner, and we can unify point size too.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
2026-03-27 19:23:02 +00:00
Lorenzo Rossi
5be2b03b88 pan/compiler: Add bound assert on emit_split_i32
This could've saved me a lot of time debugging stack corruption.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40677>
2026-03-27 19:23:02 +00:00
Maíra Canal
691cfe40fa v3d: use devinfo->page_size for state uploader default size
The state uploader was hardcoded to 4096 bytes, which doesn't fill the
full page on systems with 16KB pages. Use devinfo->page_size instead so
the uploader default matches the actual allocation granularity.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
2026-03-27 18:54:29 +00:00
Maíra Canal
4db32305ec v3d: Rename cle_buffer_min_size to page_size
The variable doesn't store a granularity specific to CLE buffers. It
stores the granularity that the OS imposes on buffer allocations (that
is, the OS page size). Therefore, rename the variable to best reflect
its meaning.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
2026-03-27 18:54:29 +00:00
Maíra Canal
bfe92d50ce v3d: sub-allocate sampler view texture state from state uploader
Previously, each sampler view allocated a dedicated BO for its,
TEXTURE_SHADER_STATE packet (~24 bytes), which got rounded up to a
full 4KB page. This wastes memory and inflates the per-job BO handle
count.

Use u_upload_alloc_ref() to sub-allocate texture shader state from the
shared state_uploader, matching the pattern already used by image views.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
2026-03-27 18:54:29 +00:00
Maíra Canal
751e0d26ec v3d: use the state uploader for the image view texture shader state
From the documentation, the state uploader should be used inside the
driver for long-term state inside buffers, while the stream uploader
should be used by Gallium's internals. Considering that the image view
texture shader state can be considered long-lived state data, use
`state_uploader` instead of `uploader` for consistency.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40496>
2026-03-27 18:54:29 +00:00
Rob Clark
b76678cddd freedreno/a6xx: Fix supported-blit fmt check
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes some KHR-GLES*.core.internalformat.texture2d.* failures.

Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40665>
2026-03-27 17:48:21 +00:00
Julia Zhang
32d04bcdcd vulkan: return pQueue with matching flags
Searching device->queues only according to queueIndex and queueFamilyIndex
could cause this issue: if there are two queues A and B created with same
queueIndex and queueFamilyIndex but different flags. When user try to get
B but vk_foreach_queue loop return A when it get A and find it have the
request queueIndex and queueFamilyIndex.

So this add a check of queue flags and return the queue with matching
flags, queueIndex and queueFamilyIndex.

Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40669>
2026-03-27 17:08:01 +00:00
Trigger Huang
007cfd138d vulkan/queue: pass protected submit info to driver
Pass application's protected submission info to driver

Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40669>
2026-03-27 17:08:01 +00:00
Samuel Pitoiset
dede14cce3 radv: advertise VK_KHR_device_address_commands
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
2026-03-27 16:17:02 +00:00
Samuel Pitoiset
a97c889a7b radv: implement VK_KHR_device_address_commands
Because there is no way to know where the address has been allocated
(GTT or VRAM), the existing entrypoints aren't dropped and the sparse
bit is derived from VK_ADDRESS_COMMAND_FULLY_BOUND_BIT_KHR.

It would be nice to figure out if the CP DMA vs compute heuristic for
GTT BOs on dGPUs could be removed to simplify this implementation.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
2026-03-27 16:17:02 +00:00
Samuel Pitoiset
479a992b02 radv: replace radv_copy_flags by VkAddressCopyFlagsKHR
Same meaning.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40386>
2026-03-27 16:17:02 +00:00
Samuel Pitoiset
72ac5e6d29 radv/ci: fix a typo in radv-navi10-vkcts-full
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Oops.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40679>
2026-03-27 15:53:39 +00:00
Samuel Pitoiset
566e4c25d9 radv/ci: fix radv-slow-skips.txt path
This was causing issues with personal branches.

Suggested-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40676>
2026-03-27 14:53:37 +00:00
Rhys Perry
3b52d61bb0 radv: don't copy radv_vertex_input_state in CmdSetVertexInputEXT
This doubles vkoverhead's draw_16vattrib_change_dynamic performance.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40603>
2026-03-27 13:38:29 +00:00
Georg Lehmann
ae2968c4ec aco: allow spilling to LDS in RT shaders without stack pointer
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No Foz-DB changes because most RT shaders use function calls now.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Georg Lehmann
133ef9f94b aco: spill VGPRs to LDS if it doesn't further limit occupancy
Only use LDS for VGPR spilling if we can use addtid access, to avoid having a VGPR addr.
Limit to single wave workgroups, to avoid needing the wave_id for the offset.
If we have a scratch stack pointer, don't use LDS at all.

Limit LDS spilling to not reduce occupancy further.
Note that in theory, this can still limit occupancy of other shaders running
on the CU at the same time, but that's unlikely and impossible to know at this point.

Removes all scratch usage in emulated FSR4 and parallel_rdp.
Besides that, only a single GoW shader is affected.

Foz-DB Navi31:
Totals from 9 (0.01% of 114641) affected shaders:
Instrs: 68863 -> 68830 (-0.05%); split: -0.07%, +0.02%
CodeSize: 416108 -> 416000 (-0.03%); split: -0.05%, +0.02%
LDS: 2048 -> 45056 (+2100.00%)
Scratch: 261888 -> 220672 (-15.74%)
Latency: 727951 -> 657155 (-9.73%); split: -9.73%, +0.00%
InvThroughput: 418644 -> 383269 (-8.45%)
VClause: 1506 -> 1200 (-20.32%)
Copies: 10651 -> 10624 (-0.25%)
VALU: 48700 -> 48684 (-0.03%)
SALU: 6200 -> 6199 (-0.02%); split: -0.05%, +0.03%
VMEM: 4139 -> 3589 (-13.29%)
VOPD: 580 -> 574 (-1.03%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36367>
2026-03-27 13:08:44 +00:00
Pavel Ondračka
56a6528744 r300/ci: expectation update
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40671>
2026-03-27 10:48:55 +01:00
Tomeu Vizoso
e23fcc1464 ethosu: implement ml_device_destroy for standalone ML device
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Use ralloc_free to release the device allocated by
ethosu_ml_device_create().

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:35:40 +01:00
Tomeu Vizoso
f06b4dbe33 gallium: add ml_device_destroy callback to pipe_ml_device
Add a destroy callback so that standalone ML devices created via
*_ml_device_create() can properly free their resources.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:35:40 +01:00
Tomeu Vizoso
f0e4ccf664 ethosu: handle NULL bias tensor in convolution
PyTorch Conv2d without explicit bias produces a NULL bias_tensor
in the Gallium pipe_ml_operation. Guard against NULL dereferences
in two places:

- ethosu_lower.c: pass NULL to fill_coefs when bias_tensor is NULL
- ethosu_coefs.c: treat missing biases as zero

Fixes crashes when running Conv2d models without bias through the
Ethos-U NPU backend.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:33:52 +01:00
Tomeu Vizoso
e0b401aa87 ethosu: implement ml_subgraph_deserialize()
Add ethosu_ml_subgraph_deserialize() which reconstructs a subgraph
from a serialized byte buffer. Parses the header (cmdstream size,
coefs size, io size, tensors size), restores the tensor array,
cmdstream, and coefficient buffers.

DRM buffer object creation is deferred to prepare_for_submission()
which is called lazily on first invoke.

Wire pctx->ml_subgraph_deserialize in ethosu_create_context().

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
2026-03-27 09:33:52 +01:00