PyTorch Conv2d without explicit bias produces a NULL bias_tensor
in the Gallium pipe_ml_operation. Guard against NULL dereferences
in two places:
- ethosu_lower.c: pass NULL to fill_coefs when bias_tensor is NULL
- ethosu_coefs.c: treat missing biases as zero
Fixes crashes when running Conv2d models without bias through the
Ethos-U NPU backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Add ethosu_ml_subgraph_deserialize() which reconstructs a subgraph
from a serialized byte buffer. Parses the header (cmdstream size,
coefs size, io size, tensors size), restores the tensor array,
cmdstream, and coefficient buffers.
DRM buffer object creation is deferred to prepare_for_submission()
which is called lazily on first invoke.
Wire pctx->ml_subgraph_deserialize in ethosu_create_context().
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Add ml_subgraph_deserialize() to pipe_context for reconstructing
a previously-serialized ML subgraph at runtime. This complements
ml_subgraph_serialize() on pipe_ml_device and allows the runtime
to load pre-compiled subgraphs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Move target-specific fields (is_u65, ifm_ublock, ofm_ublock,
max_concurrent_blocks, sram_size) from ethosu_screen into
ethosu_ml_device. This decouples the compilation phase from the DRM
file descriptor and pipe_screen, allowing ahead-of-time compilation
where the target NPU is not present on the compilation host.
The ethosu_device_screen() helper is retained only for runtime paths
that need the DRM fd (buffer allocation, job submission, destroy).
Compilation code now accesses hardware parameters through
ethosu_ml_device() cast of pipe_ml_device, which can be created
either from a DRM-backed screen or standalone via
ethosu_ml_device_create() with a target string like "65-256".
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40647>
Thread timeline_point through si_add_fence_dependency and
si_add_syncobj_signal to the winsys. Remove the assert(!value)
guards in si_fence_server_sync and si_fence_server_signal so that
non-zero timeline point values are passed through to the winsys
fence dependency and signal lists.
Add PIPE_FD_TYPE_TIMELINE_SEMAPHORE_VK handling in si_create_fence_fd,
importing the fd as a syncobj (the timeline point is applied at
wait/signal time, not at import time).
Author: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40526>
When has_timeline_syncobj is available, use AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT
with drm_amdgpu_cs_chunk_syncobj for dependencies and
AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL for signals in kernelq submission.
This passes timeline point values from the fence lists through to the kernel.
Keep the existing binary SYNCOBJ_IN/SYNCOBJ_OUT path as fallback when
timeline syncobj is not available.
Author: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40526>
Add a parallel uint64_t *points array to amdgpu_fence_list to store
timeline semaphore point values alongside each fence. Point=0 means
binary semaphore (preserving existing behavior).
Update cs_add_fence_dependency and cs_add_syncobj_signal winsys
interfaces to accept a timeline_point parameter, and thread it
through to the fence lists. All existing callers pass 0.
Author: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40526>
Overwrite the whole framebuffer cbuf rather than copying it from the
stack; fixes util_framebuffer_get_num_samples getting uninitialized
stack contents during validation.
Suggested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Fixes: 2eb45daa9c ("gallium: de-pointerize pipe_surface")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14082
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39138>
For compiling models, we don't really need a context for a real device.
To support ML frameworks models in which compilation happens
ahead-of-time (AoT), add API for compilation that doesn't require a
pipe_context.
Add struct pipe_ml_device with function pointers for:
- ml_operation_supported: query operation support
- ml_subgraph_create: compile a subgraph
- ml_subgraph_serialize: serialize a compiled subgraph
- ml_subgraph_destroy: free subgraph resources
Move ml_operation_supported, ml_subgraph_create, and
ml_subgraph_destroy from pipe_context to pipe_ml_device.
Add pipe_screen::get_ml_device() to obtain the device.
Change pipe_ml_subgraph.context (pipe_context*) to
pipe_ml_subgraph.device (pipe_ml_device*).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Replace the boolean padding_same field in pipe_ml_operation.conv
and .pooling with explicit per-side padding fields: padding_top,
padding_bottom, padding_left, padding_right.
Frontends always compute these from their own padding representation
(e.g. TFLite same/valid, PyTorch (pad_h, pad_w)). Drivers use
them directly, removing the need for drivers to derive padding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Change the tensor backing storage from pipe_resource* to uint8_t*.
This simplifies tensor data management by using raw memory pointers
instead of pipe_resource objects. Frontends allocate tensor data with
malloc() and drivers access it directly, removing the need for
pipe_buffer_map/unmap for tensor data access.
We initially used resources thinking that the NPU would want to directly
access the data in those tensors. It is clear now that all NPUs will
need the data to be compressed and reformatted in some way, so let's
drop the incovenient resources and just use allocated memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
We used to lower multisampled arrays to 3D images by adjusting the
height and the Y coordinate so that addressing samples became
addressing into the new base image. This worked for gallium, but
was never implemented for vulkan, and also had the disadvantages
that (a) we handled arrays and non-arrays differently, and
(b) the image height was restricted to 4096.
Change this so that we lower samples into the Z coordinate instead,
adding new layers for each sample. This requires that we know the
number of samples (so we have to save a sysval for this in gallium)
but means that we handle arrays and non-arrays the same. More
importantly, we can fit 3 bits to indicate the number of samples
into the attribute descriptor in Vulkan, so this scheme works
there as well as in OpenGL.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40460>
Not really used yet, but we will need it later when we change how we
lower multisampled image arrays.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40460>
The referenced commit switched from a passthrough shader
to fs_clear_color[write_all_cbufs=0]. It shouldn't matter since
the shader isn't supposed to be executed - it's only setup to get
the first color output active.
On some chips (gfx8) it seems to cause issues (hangs or page fault)
for some piglit tests, eg:
framebuffer-blit-levels draw stencil
To fix this, introduce a 3rd variant, where a constant buffer isn't
required and instead the color is hardcoded in the shader.
Fixes: ca09c173f6 ("gallium/u_blitter: remove UTIL_BLITTER_ATTRIB_COLOR, use a constant buffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40486>
The sc7180-trogdor-lazor-limozeen devices are having issues, so move the
job to a different device with available capacity.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40566>
If the FD passed in isn't from a render node, try to determine the
corresponding render node and open it. If that succeeds, pass the
render node FD to ac_drm_device_initialize.
The existing code already detects when ac_drm_device_get_fd doesn't
return the FD passed in, and handles that case correctly.
This avoids issues with unauthenticated FDs from card nodes.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7289
v2:
* Always close render_fd after calling ac_drm_device_initialize for it.
(Pierre-Eric Pelloux-Prayer)
* Formatting tweaks for logging when ac_drm_device_initialize fails for
render_fd.
v3: (Pierre-Eric Pelloux-Prayer)
* Log render_device path when ac_drm_device_initialize fails for
render_fd.
* Fix render_device string leak.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40519>
these functions should no longer be used by serious drivers. for those that
do use them, they now need to pass their own destructor function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40462>