Add ml_subgraph_deserialize() to pipe_context for reconstructing
a previously-serialized ML subgraph at runtime. This complements
ml_subgraph_serialize() on pipe_ml_device and allows the runtime
to load pre-compiled subgraphs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Add a parallel uint64_t *points array to amdgpu_fence_list to store
timeline semaphore point values alongside each fence. Point=0 means
binary semaphore (preserving existing behavior).
Update cs_add_fence_dependency and cs_add_syncobj_signal winsys
interfaces to accept a timeline_point parameter, and thread it
through to the fence lists. All existing callers pass 0.
Author: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40526>
For compiling models, we don't really need a context for a real device.
To support ML frameworks models in which compilation happens
ahead-of-time (AoT), add API for compilation that doesn't require a
pipe_context.
Add struct pipe_ml_device with function pointers for:
- ml_operation_supported: query operation support
- ml_subgraph_create: compile a subgraph
- ml_subgraph_serialize: serialize a compiled subgraph
- ml_subgraph_destroy: free subgraph resources
Move ml_operation_supported, ml_subgraph_create, and
ml_subgraph_destroy from pipe_context to pipe_ml_device.
Add pipe_screen::get_ml_device() to obtain the device.
Change pipe_ml_subgraph.context (pipe_context*) to
pipe_ml_subgraph.device (pipe_ml_device*).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Replace the boolean padding_same field in pipe_ml_operation.conv
and .pooling with explicit per-side padding fields: padding_top,
padding_bottom, padding_left, padding_right.
Frontends always compute these from their own padding representation
(e.g. TFLite same/valid, PyTorch (pad_h, pad_w)). Drivers use
them directly, removing the need for drivers to derive padding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Change the tensor backing storage from pipe_resource* to uint8_t*.
This simplifies tensor data management by using raw memory pointers
instead of pipe_resource objects. Frontends allocate tensor data with
malloc() and drivers access it directly, removing the need for
pipe_buffer_map/unmap for tensor data access.
We initially used resources thinking that the NPU would want to directly
access the data in those tensors. It is clear now that all NPUs will
need the data to be compressed and reformatted in some way, so let's
drop the incovenient resources and just use allocated memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Allows a uniform name to be passed to force_explicit_uniform_loc_zero
allowing us to set that uniform to an explicit location of zero.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40448>
In some cases CL has higher precision requirements for format support.
Add a PIPE_BIND_x flag so that drivers can expose formats in GL(ES) that
they cannot expose in CL.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40028>
The original MR switched to use a float raw_timestamp_period to scale
the raw timestamp outside of the gallium driver. This better matched
how vulkan works.
But unlike vulkan, gallium has timestamp related queries/APIs that
return already scaled time, resulting in small errors if the way the
scaling is done differs between driver scaling and frontend scaling.
The important thing is that any error introduced by scaling must be
the same error across APIs.
(In particular, a f64 value cannot preciesly represent an arbitrary
u64 value. OTOH the driver's scaling could be simply multiply be an
integer. But differing precision errors of the two approachs causes
problems when comparing between timestamps that are converted in
different ways.)
In some, but not all, cases this could be addressed by changing the
driver to use the same scaling function, but this is not always possible
(if, for ex, the scaling is done on the GPU CP). So switch back to
the original approach from !39995, using a pscreen->convert_timestamp()
callback, to put the control back in the hands of the driver.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40051>
This is intended to enable rusticl to use get_query_result_resource()
for timestamp queries, for hw which cannot convert ticks to us on the
GPU (or for which doing the conversion on the GPU is expensive). In
this case, the query result buffer is not exposed to the app, so we
can still do the necessary conversion on the CPU.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39995>
Add a 2-bit field to select which geometry shader output stream
feeds the rasterizer. Only meaningful when a geometry shader
with multiple output streams is active.
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39984>
Not all HW can do linear interpolation natively, and eumlating it
can come at a cost. This adds a cap that the gallium driver can
expose, that makes us try to avoid it when we can.
Reviewed-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39426>
Add a new PIPE_CAP_CLEAR_MASKED capability that allows drivers to
handle buffer clears with color and stencil masks directly, instead
of falling back to drawing a quad in Mesa.
This patch introduces several changes:
1. Add the new pipe cap PIPE_CAP_CLEAR_MASKED to pipe_defines.h and
document it in the Gallium screen documentation.
2. Add color_clear_mask and stencil_clear_mask parameters to the
pipe_context::clear() hook:
- color_clear_mask (uint32_t): contains 4 color mask bits per draw buffer
(max 8 buffers = 32 bits)
- stencil_clear_mask (uint8_t): contains the stencil write mask (8 bits)
3. Update the state tracker to use the masked clear path when the
driver supports it:
- Pass ctx->Color.ColorMask for color buffer clears
- Pass ctx->Stencil.WriteMask for stencil clears
- Allow both color and stencil clears to avoid the quad path when
masks are present and the driver advertises support
4. Update all existing driver clear() hooks to accept the new
color_clear_mask and stencil_clear_mask parameter.
This optimization allows drivers that can efficiently handle masked
clears in hardware to do so, improving performance for applications
that frequently clear buffers with masks enabled.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31512>
The bit values are taken from Vulkan to make it easy for Zink. Those new
subgroup features will be used by rusticl for cl_khr_subgroup_rotate.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38015>
This commit also checks for and issues errors for the following:
INVALID_OPERATION is generated if the application attempts enable pixel
local storage while the value of SAMPLE_BUFFERS is one.
INVALID_OPERATION is generated if the application attempts to enable pixel
local storage while the current draw framebuffer is a user-defined frame-
buffer object and has an image attached to any color attachment other than
color attachment zero.
INVALID_OPERATION is generated if the application attempts to enable pixel
local storage while the current draw framebuffer is a user-defined frame-
buffer and the draw buffer for any color output other than color
output zero is not NONE.
INVALID_FRAMEBUFFER_OPERATION is generated if the application attempts to
enable pixel local storage while the current draw framebuffer is
incomplete.
INVALID_OPERATION is generated if pixel local storage is disabled and the
application attempts to issue a rendering command while a program object
that accesses pixel local storage is bound.
INVALID_OPERATION is generated if pixel local storage is enabled and the
application attempts to bind a new draw framebuffer, delete the currently
bound draw framebuffer, change color buffer selection via DrawBuffers, or
modify any attachment of the currently bound draw framebuffer including
their underlying storage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37110>
There is no disable_screen_content_tools in AV1 spec, instead this
should be seq_choose_screen_content_tools. But we don't need that either
as we keep the effective value in force_screen_content_tools.
Same for seq_choose_integer_mv and force_integer_mv.
Also stop overriding these values and instead fix frame header coding
to work with all combinations.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38260>
Commit a4ffd2395f ("mesa: Implement label sharing from GL objects with
UM drivers") enabled GL clients to tag objects at a UM driver level. In
the case of Panfrost, and for both KMDs, maximum label size is set to
4096, but the Mesa limit is much lower.
Since glObjectLabel() allocates object labels dynamically, there's no
need to have this value chiseled in stone, so allow Gallium driver
implementers to set their own limit through a pipe screen capability.
Keep the same default maximum label length as before.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38027>
Per ARB_vertex_program spec result registers are 4-component and initially
undefined, and the FF fragment program expects its intputs to be
4-component too. So, if the client's vertex program does not write the
whole vector it will cause misrenderings unless the same client also
supplies fragment program that expects less than 4 componens.
This commit adds a workaround that initializes results to vec4(0, 0, 0, 1)
which seems to be an expected behavior for such clients.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38295>