PyTorch Conv2d without explicit bias produces a NULL bias_tensor
in the Gallium pipe_ml_operation. Guard against NULL dereferences
in two places:
- ethosu_lower.c: pass NULL to fill_coefs when bias_tensor is NULL
- ethosu_coefs.c: treat missing biases as zero
Fixes crashes when running Conv2d models without bias through the
Ethos-U NPU backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Add ethosu_ml_subgraph_deserialize() which reconstructs a subgraph
from a serialized byte buffer. Parses the header (cmdstream size,
coefs size, io size, tensors size), restores the tensor array,
cmdstream, and coefficient buffers.
DRM buffer object creation is deferred to prepare_for_submission()
which is called lazily on first invoke.
Wire pctx->ml_subgraph_deserialize in ethosu_create_context().
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Move target-specific fields (is_u65, ifm_ublock, ofm_ublock,
max_concurrent_blocks, sram_size) from ethosu_screen into
ethosu_ml_device. This decouples the compilation phase from the DRM
file descriptor and pipe_screen, allowing ahead-of-time compilation
where the target NPU is not present on the compilation host.
The ethosu_device_screen() helper is retained only for runtime paths
that need the DRM fd (buffer allocation, job submission, destroy).
Compilation code now accesses hardware parameters through
ethosu_ml_device() cast of pipe_ml_device, which can be created
either from a DRM-backed screen or standalone via
ethosu_ml_device_create() with a target string like "65-256".
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40647>
Thread timeline_point through si_add_fence_dependency and
si_add_syncobj_signal to the winsys. Remove the assert(!value)
guards in si_fence_server_sync and si_fence_server_signal so that
non-zero timeline point values are passed through to the winsys
fence dependency and signal lists.
Add PIPE_FD_TYPE_TIMELINE_SEMAPHORE_VK handling in si_create_fence_fd,
importing the fd as a syncobj (the timeline point is applied at
wait/signal time, not at import time).
Author: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40526>
Add a parallel uint64_t *points array to amdgpu_fence_list to store
timeline semaphore point values alongside each fence. Point=0 means
binary semaphore (preserving existing behavior).
Update cs_add_fence_dependency and cs_add_syncobj_signal winsys
interfaces to accept a timeline_point parameter, and thread it
through to the fence lists. All existing callers pass 0.
Author: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40526>
Overwrite the whole framebuffer cbuf rather than copying it from the
stack; fixes util_framebuffer_get_num_samples getting uninitialized
stack contents during validation.
Suggested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Fixes: 2eb45daa9c ("gallium: de-pointerize pipe_surface")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14082
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39138>
For compiling models, we don't really need a context for a real device.
To support ML frameworks models in which compilation happens
ahead-of-time (AoT), add API for compilation that doesn't require a
pipe_context.
Add struct pipe_ml_device with function pointers for:
- ml_operation_supported: query operation support
- ml_subgraph_create: compile a subgraph
- ml_subgraph_serialize: serialize a compiled subgraph
- ml_subgraph_destroy: free subgraph resources
Move ml_operation_supported, ml_subgraph_create, and
ml_subgraph_destroy from pipe_context to pipe_ml_device.
Add pipe_screen::get_ml_device() to obtain the device.
Change pipe_ml_subgraph.context (pipe_context*) to
pipe_ml_subgraph.device (pipe_ml_device*).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Replace the boolean padding_same field in pipe_ml_operation.conv
and .pooling with explicit per-side padding fields: padding_top,
padding_bottom, padding_left, padding_right.
Frontends always compute these from their own padding representation
(e.g. TFLite same/valid, PyTorch (pad_h, pad_w)). Drivers use
them directly, removing the need for drivers to derive padding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Change the tensor backing storage from pipe_resource* to uint8_t*.
This simplifies tensor data management by using raw memory pointers
instead of pipe_resource objects. Frontends allocate tensor data with
malloc() and drivers access it directly, removing the need for
pipe_buffer_map/unmap for tensor data access.
We initially used resources thinking that the NPU would want to directly
access the data in those tensors. It is clear now that all NPUs will
need the data to be compressed and reformatted in some way, so let's
drop the incovenient resources and just use allocated memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
We used to lower multisampled arrays to 3D images by adjusting the
height and the Y coordinate so that addressing samples became
addressing into the new base image. This worked for gallium, but
was never implemented for vulkan, and also had the disadvantages
that (a) we handled arrays and non-arrays differently, and
(b) the image height was restricted to 4096.
Change this so that we lower samples into the Z coordinate instead,
adding new layers for each sample. This requires that we know the
number of samples (so we have to save a sysval for this in gallium)
but means that we handle arrays and non-arrays the same. More
importantly, we can fit 3 bits to indicate the number of samples
into the attribute descriptor in Vulkan, so this scheme works
there as well as in OpenGL.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40460>
Not really used yet, but we will need it later when we change how we
lower multisampled image arrays.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40460>
The sc7180-trogdor-lazor-limozeen devices are having issues, so move the
job to a different device with available capacity.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40566>
these functions should no longer be used by serious drivers. for those that
do use them, they now need to pass their own destructor function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40462>
Variants can modify which outputs get written so we must update
these fields otherwise spi_shader_col_format will be incorrect.
This can happen for instance with uniforms inlining:
uniform bool depth_only;
void main() {
if (depth_only) return;
...
}
When depth_only is true, this shader becomes empty after uniforms
inlining but spi_shader_col_format wasn't updated properly,
causing a hang.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14737
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40372>
On U85, both NPU_SET_IFM_BROADCAST and NPU_SET_IFM2_BROADCAST must be
emitted for elementwise operations, matching Vela's GenerateInputBroadcast.
Add calc_broadcast_mode() matching Vela's CalculateBroadcast(): broadcasts
a dimension of shape1 when it is 1 and shape2 is larger, producing a
broadcast_mode bitmask (H=1, W=2, C=4, SCALAR=8).
Split emit_ifm2_broadcast into U65 (legacy bitfields) and U85 paths.
The U85 path emits both IFM_BROADCAST and IFM2_BROADCAST using
calc_broadcast_mode in each direction.
Also fix emit_eltwise to call emit_ifm2_precision instead of
emit_ifm_broadcast for U85, which was emitting 0 instead of the
required IFM2_PRECISION register.
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39611>
Map DRM buffer objects once at resource_create and unmap at
resource_destroy, instead of mapping them in buffer_map where they
were never unmapped. This fixes a virtual memory leak that caused
SIGBUS under heavy workloads by exhausting CMA.
Also remove unused phys_addr and obj_addr fields from ethosu_resource,
and add asserts on pipe_buffer_create return values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39611>
For U85-256 with 8-bit IFM, Vela's _uBlockToOpTable restricts which
microblocks are valid per operation type:
{2,2,8} and {4,1,8}: conv, matmul, vectorprod, reducesum, eltwise, resize
{2,1,16}: depthwise, pool, eltwise, reduceminmax, argmax, resize
Mesa's find_ublock() was not enforcing these constraints, allowing
{4,1,8} or {2,2,8} to be selected for depthwise/pooling based on
minimum waste. For depthwise ops with OFM shapes that aligned better
to {4,1,8}, the wrong ublock was chosen, causing incorrect weight
encoding and NPU hangs.
Fix by skipping {4,1,8} and {2,2,8} for depthwise/pooling operations,
matching Vela's operation-validity table.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39611>