to lower bindless handles to 32 bits before nir_opt_varyings, so that
the high 32 bits of (input) loads of bindless handles are eliminated early.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41170>
Reported by clang tools.
See: https://clangd.llvm.org/guides/include-cleaner
struct ac_cmdbuf had to be moved to ac_cmdbuf_base.h because we can't
include ac_cmdbuf.h->sid.h->amdgfxregs.h in radeon_winsys.h for r300.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41091>
Add ST_INVALIDATE_* flags for all states that meta-operations
(bitmap, clear, drawpixels, etc.) modify behind the state tracker's
back. This enables meta-ops to use st_context_invalidate_state()
instead of CSO save/restore for state re-derivation.
The atoms already know how to re-derive Gallium state from GL context,
so marking states dirty via st_context_invalidate_state() is
sufficient - st_validate_state() will call the atoms before the next
draw.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40592>
Add ml_subgraph_deserialize() to pipe_context for reconstructing
a previously-serialized ML subgraph at runtime. This complements
ml_subgraph_serialize() on pipe_ml_device and allows the runtime
to load pre-compiled subgraphs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40578>
Add a parallel uint64_t *points array to amdgpu_fence_list to store
timeline semaphore point values alongside each fence. Point=0 means
binary semaphore (preserving existing behavior).
Update cs_add_fence_dependency and cs_add_syncobj_signal winsys
interfaces to accept a timeline_point parameter, and thread it
through to the fence lists. All existing callers pass 0.
Author: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40526>
For compiling models, we don't really need a context for a real device.
To support ML frameworks models in which compilation happens
ahead-of-time (AoT), add API for compilation that doesn't require a
pipe_context.
Add struct pipe_ml_device with function pointers for:
- ml_operation_supported: query operation support
- ml_subgraph_create: compile a subgraph
- ml_subgraph_serialize: serialize a compiled subgraph
- ml_subgraph_destroy: free subgraph resources
Move ml_operation_supported, ml_subgraph_create, and
ml_subgraph_destroy from pipe_context to pipe_ml_device.
Add pipe_screen::get_ml_device() to obtain the device.
Change pipe_ml_subgraph.context (pipe_context*) to
pipe_ml_subgraph.device (pipe_ml_device*).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Replace the boolean padding_same field in pipe_ml_operation.conv
and .pooling with explicit per-side padding fields: padding_top,
padding_bottom, padding_left, padding_right.
Frontends always compute these from their own padding representation
(e.g. TFLite same/valid, PyTorch (pad_h, pad_w)). Drivers use
them directly, removing the need for drivers to derive padding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Change the tensor backing storage from pipe_resource* to uint8_t*.
This simplifies tensor data management by using raw memory pointers
instead of pipe_resource objects. Frontends allocate tensor data with
malloc() and drivers access it directly, removing the need for
pipe_buffer_map/unmap for tensor data access.
We initially used resources thinking that the NPU would want to directly
access the data in those tensors. It is clear now that all NPUs will
need the data to be compressed and reformatted in some way, so let's
drop the incovenient resources and just use allocated memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40167>
Allows a uniform name to be passed to force_explicit_uniform_loc_zero
allowing us to set that uniform to an explicit location of zero.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40448>
In some cases CL has higher precision requirements for format support.
Add a PIPE_BIND_x flag so that drivers can expose formats in GL(ES) that
they cannot expose in CL.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40028>
The original MR switched to use a float raw_timestamp_period to scale
the raw timestamp outside of the gallium driver. This better matched
how vulkan works.
But unlike vulkan, gallium has timestamp related queries/APIs that
return already scaled time, resulting in small errors if the way the
scaling is done differs between driver scaling and frontend scaling.
The important thing is that any error introduced by scaling must be
the same error across APIs.
(In particular, a f64 value cannot preciesly represent an arbitrary
u64 value. OTOH the driver's scaling could be simply multiply be an
integer. But differing precision errors of the two approachs causes
problems when comparing between timestamps that are converted in
different ways.)
In some, but not all, cases this could be addressed by changing the
driver to use the same scaling function, but this is not always possible
(if, for ex, the scaling is done on the GPU CP). So switch back to
the original approach from !39995, using a pscreen->convert_timestamp()
callback, to put the control back in the hands of the driver.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40051>
This is intended to enable rusticl to use get_query_result_resource()
for timestamp queries, for hw which cannot convert ticks to us on the
GPU (or for which doing the conversion on the GPU is expensive). In
this case, the query result buffer is not exposed to the app, so we
can still do the necessary conversion on the CPU.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39995>
Add a 2-bit field to select which geometry shader output stream
feeds the rasterizer. Only meaningful when a geometry shader
with multiple output streams is active.
Reviewed-by: Roland Scheidegger <roland.scheidegger@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39984>
Not all HW can do linear interpolation natively, and eumlating it
can come at a cost. This adds a cap that the gallium driver can
expose, that makes us try to avoid it when we can.
Reviewed-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39426>
Add a new PIPE_CAP_CLEAR_MASKED capability that allows drivers to
handle buffer clears with color and stencil masks directly, instead
of falling back to drawing a quad in Mesa.
This patch introduces several changes:
1. Add the new pipe cap PIPE_CAP_CLEAR_MASKED to pipe_defines.h and
document it in the Gallium screen documentation.
2. Add color_clear_mask and stencil_clear_mask parameters to the
pipe_context::clear() hook:
- color_clear_mask (uint32_t): contains 4 color mask bits per draw buffer
(max 8 buffers = 32 bits)
- stencil_clear_mask (uint8_t): contains the stencil write mask (8 bits)
3. Update the state tracker to use the masked clear path when the
driver supports it:
- Pass ctx->Color.ColorMask for color buffer clears
- Pass ctx->Stencil.WriteMask for stencil clears
- Allow both color and stencil clears to avoid the quad path when
masks are present and the driver advertises support
4. Update all existing driver clear() hooks to accept the new
color_clear_mask and stencil_clear_mask parameter.
This optimization allows drivers that can efficiently handle masked
clears in hardware to do so, improving performance for applications
that frequently clear buffers with masks enabled.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31512>
The bit values are taken from Vulkan to make it easy for Zink. Those new
subgroup features will be used by rusticl for cl_khr_subgroup_rotate.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38015>