Commit graph

224626 commits

Author SHA1 Message Date
Autumn Ashton
0dfe4aaa24 nvk: Allow nvk_cmd_upload_qmd to take a custom root descriptor
The cubin kernel launches need to use a root descriptor that's
compatible with the bytecode that nvcc generates which contains block
dim, grid dim and the kernel params at specific layouts which can be
influenced by ELF .nv.info attributes.

Thus, expose the ability to input custom root descriptors
in nvk_cmd_upload_qmd.

Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
2026-06-19 19:01:38 +00:00
Autumn Ashton
acfea9e03f nak: Expose max_warps_per_sm
Previously this was only accessible from Rust,
but VK_NVX_binary_import needs to calculate this
for imported cubin kernels from EIATTR_REGCOUNT.

Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40686>
2026-06-19 19:01:38 +00:00
Adrián Larumbe
c95edade04 panvk: Talk directly to pankmod when binding sparse resources
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
There's no longer need for the panvk_sparse library, or for panvk to care
about whether the KMD can do native sparse mapping. Submit sparse VM
bindings as a single operation and let pankmod handle the gory details.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:31 +00:00
Adrián Larumbe
19cf49f02f panvk: Use pankmod instead of panthor drm interfaces in bind queues
On top of that, leverage the new push/flush interface so that management of
the black hole in older KMD versions can be handled by the pankmod layer.

Merging of operations is now done in conjunction with buffering the latest
submission, so that the very last operation can have its signal syncs
assigned before being delivered to the pankmod layer.

Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:31 +00:00
Adrián Larumbe
306539a3c7 pan/kmod: Introduce vm_op buffering and sparse mapping emulation
The goal is moving the need for prebuffering when the total number of
vm_bind operations isn't known in advance away from panvk, and into the
pankmod layer, and also to consolidate that treatment in a single place. At
the moment, both panvk_vX_bind_queue.c and panvk_sparse.c roll their own
workarounds for the blackhole-mapping sparse bind mechanism.

For older KMD versions with no sparse mapping support, emulate it by
cyclically mapping over a dummy BO, which is allocated on demand and per
VM. This behaviour is similar to that of the Panthor KMD.

This moves responsibility over whether to use native KMD sparse mapping or
the blackhole method into the pankmod layer, so that the sparse mapping
mechanism is transparent to the Vulkan driver. Also disallow automatic VA
assignment when sparse emulation is required, because relaying auto va's
back to the caller is both cumbersome and unsafe, and also not a practical
use case.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:31 +00:00
Adrián Larumbe
831bc9bb65 pan/kmod: Introduce sparse binding
Register whether the underlying KMD supports sparse mappings in a device
property. Add a new VM operation field that holds flags, for the time being
only sparse is a valid operation modifier. Disallow sparse operations when
an automatic VA is requested or when a BO is provided accidentally.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:31 +00:00
Adrián Larumbe
f20b7adbca pan/kmod: Handle sync object signals in Panthor's vm_bind
A future commit will want to have a binary sync object attached to a
vm_bind operation or a sync operation only, so rather than creating a
separate pankmod flag for it, we simply check the point (always 0).

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:31 +00:00
Adrián Larumbe
1dd5ea7feb pan/kmod: Pass signal and wait syncs separately
This is done in preapration of kmod support for blackhole sparse mappings.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:31 +00:00
Adrián Larumbe
9ded6f3d38 pan/kmod: Use kernel-reported page sizes for new VM when available
Instead of hard-coding available page sizes in UM, have pan_kmod
backends query the KMD when these are exposed by the kernel.

This is not yet done for Panfrost, but it might be added soon.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:30 +00:00
Adrián Larumbe
1ca84d67ac drm-uapi: Sync the panthor header
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40400>
2026-06-19 18:20:30 +00:00
Juan A. Suarez Romero
df96a100ae v3dv: fix assertion on push constants
Fixes a compiler warning regarding the assertion.

Fixes: 6d6a3ab679 ("v3dv: asserts push constants data is valid")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42269>
2026-06-19 18:01:40 +00:00
David Rosca
08c2bb3b31 radeonsi/mm: Set correct usage in si_dec_fill_surface
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: 26979becec ("radeonsi/video: Add video decoder using ac_video_dec")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42149>
2026-06-19 15:18:03 +00:00
David Rosca
6cd7dd852a radeonsi/mm: Only setup ref surfaces with tier3
For lower tiers this adds unnecessary dependency on the ref surface.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15630
Fixes: 26979becec ("radeonsi/video: Add video decoder using ac_video_dec")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42149>
2026-06-19 15:18:03 +00:00
David Rosca
61d2f8d0f1 radeonsi/mm: Return error when decoding H264 P/B frame with no refs
The firmware expects at least one valid reference when decoding P and B
frames, otherwise it may pagefault.
If the app doesn't handle missing references by using dummy surfaces,
error out when trying to decode such frame.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15659
Fixes: 26979becec ("radeonsi/video: Add video decoder using ac_video_dec")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42256>
2026-06-19 14:57:39 +00:00
Matt Turner
5bb025f953 gallivm: fix small_unorm -> unorm8 fetch path on big-endian
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Two bugs in lp_build_fetch_rgba_aos's small_unorm fast path:

- vector_justify=true shifted the loaded value into the MSB of the wider
  type on big-endian. The format_desc already carries
  big-endian-corrected channel shifts, so the extra shift broke channel
  extraction for sub-32-bit formats (e.g. R8G8B8, B5G5R5).

- The output OR-loop packed channels assuming little-endian byte order
  (shift = j * width), so after bitcast to vec4-u8 on big-endian the
  alpha channel landed at byte[0] instead of byte[3].

The fix is simple: gather with vector_justify=false so format_desc
shifts apply directly; use (3-j)*width on UTIL_ARCH_BIG_ENDIAN to match
the memory layout that big-endian bitcast produces.

This fixes the lp_test_format test on big-endian platforms.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42228>
2026-06-19 14:17:27 +00:00
Matt Turner
a9d70225ec gallivm: fix lp_build_round on altivec/VSX
The software fallback in lp_build_round (used when
arch_rounding_available returns false, e.g. altivec with length < 4)
used lp_build_iround's bias-and-truncate path, which rounds
half-away-from-zero due to float32 rounding of the (a + nextafterf(0.5))
sum. This caused lp_test_arit failures for v1 and v2 vector widths on
ppc64.

For altivec/VSX, llvm.nearbyint lowers to vrfin (AltiVec) or xvrspic
(VSX) — both single instructions that round to nearest-even — for any
vector width. Use it in the else branch when has_altivec is set,
preserving the lp_build_iround path for x86 pre-SSE4.1 where
llvm.nearbyint would expand to scalar nearbyintf calls.

Update the length==2 expected-failure condition in lp_test_arit to
exclude altivec (now fixed), keeping it for other platforms that still
use the software fallback.

This fixes the lp_test_arit test on ppc64.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42227>
2026-06-19 13:46:11 +00:00
Juan A. Suarez Romero
07b53cd328 v3d/ci: update expected results and document failures
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42292>
2026-06-19 12:31:56 +00:00
Job Noorman
99a268c889 ir3/lower_vars_to_scratch_global: use stable sort for variables
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
To ensure we pick variables to spill deterministically.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42168>
2026-06-19 12:00:49 +00:00
Job Noorman
3596d63338 nir/lower_vars_to_scratch_global: make callback deterministic
We pass the found variables as a pointer set to the driver. Since the
callback is supposed to be used for global decisions, the driver might
end up picking different variables based on the (non-deterministic)
iteration order of the set. Fix this by passing the variables as a
util_dynarray instead.

To make sure the contents of the util_dynarray don't have to be shuffled
around every time the drivers wants to remove a variable from it,
introduce nir_variable::pass_flags that we use to create an intrusive
ordered set using a util_dynarray.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42168>
2026-06-19 12:00:49 +00:00
Eric Engestrom
eec4f5712d ci: fix the fix for perfetto download in make-git-archive nightly job
The previous fix used `grep -P` which is not supported by the grep
implementation used in this job, so replace it with `grep -E` + `cut`
which is supported by that implementation.

Fixes: df3756e6dc ("ci: fix perfetto download in `make-git-archive` nightly job")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42331>
2026-06-19 11:22:49 +00:00
Christian Gmeiner
ad4e3cad54 etnaviv: Gate 128-bit render targets on HALF_FLOAT
128-bit render targets are emulated as paired G32R32F targets. There is no
integer 64-bit PE format, so the integer formats also render through
G32R32F, as the blob does. The real hardware requirement is the half-float
pipe that provides G32R32F, so gate on HALF_FLOAT instead of the
conservative halti5 level. This enables the formats on older GPUs that have
the pipe.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:24 +00:00
Christian Gmeiner
57f5acf849 etnaviv: Support split sampler for 128-bit formats on the state path
128-bit formats (RGBA32) are emulated as two stacked G32R32 planes. The
bound sampler reads the RG plane and a companion sampler reads the BA plane,
which etna_nir_lower_128bit(..) reassembles in the shader. Only the
descriptor path set up the companion, so the state path could not sample
these formats. Set up the companion on the state path too and share
companion_slot(..) between both paths.

The real requirement is the plane format, not the descriptors. The float
plane G32R32F samples through the half-float pipe, so gate it on HALF_FLOAT
and advertise GL_OES_texture_float, also on halti2 GPUs like GC3000. The
integer plane G32R32I needs halti5, so keep the integer formats there.

The KHR-GLES2 internalformat tests for sized RGB32F/RGBA32F need an ES3
context, so list them as expected fails on GC3000 too.

Verified on GC7000 with and without ETNA_MESA_DEBUG=no_texdesc.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:24 +00:00
Christian Gmeiner
0783eaf6d6 etnaviv: Set per-RT sRGB bit on non-zero render target slots
sRGB encoding was only handled through the global PE.LOGIC_OP SRGB bit,
which the hardware applies to the primary render target alone. An sRGB
surface bound to any other MRT slot was written as linear.

Fixes dEQP-GLES3.functional.fragment_out.random.{1,17,39,64,86,93,96}.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:23 +00:00
Christian Gmeiner
2e0b5f4b96 etnaviv: Update headers from rnndb
Update to rnndb commit 0fd26f92cfd7

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:23 +00:00
Christian Gmeiner
37eb09ed06 etnaviv: Disable TS per render target on mixed TS modes
PE.MEM_CONFIG.COLOR_TS_MODE is a single global field, so every TS-enabled
color render target in a framebuffer has to share one TS mode. With
CACHE128B256BPERLINE the mode is picked per resource (256B for compressible
formats, 128B otherwise), so a compressible format bound next to an integer
format disagrees and the odd target gets decoded in the wrong mode, reading
back as the clear color.

The blob keeps TS on the targets that match the global mode and disables it
only on the odd one, instead of giving up TS for the whole framebuffer.
Compute a per-RT TS mask once in etna_set_framebuffer_state(..), store it in
etna_framebuffer_state and reuse it when arming the BLT fast clear, so the
two consumers stay consistent by construction. A disabled target keeps its
tile status allocated, so it recovers once a later framebuffer is compatible
again.

Fixes 23 dEQP-GLES3.functional.draw_buffers_indexed.random.* cases that mix
integer and unorm render targets, with no regression in fbo.color or fbo.blit.

Fixes: d70531ca93 ("etnaviv: Extend etna_update_ts_config(..) for MRTs")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:23 +00:00
Christian Gmeiner
47a2f9e420 etnaviv: Advertise 128-bit color formats as renderable and samplable
The 128-bit emulation now covers the clear, blit, copy and sample paths,
so stop rejecting the three emulated RGBA32 formats. The format table is
the remaining filter. Sampling still relies on the halti5 texture
descriptors, so halti5 is the gate.

Sampling RGBA32F enables GL_OES_texture_float, and with the existing
half-float support also GL_ARB_texture_float, so advertise both.

The KHR-GLES2 internalformat tests for sized RGB32F/RGBA32F need an ES3
context, so they fail on the ES2 driver. List them as expected fails, as
other ES2 drivers do.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:23 +00:00
Christian Gmeiner
9013b56a22 etnaviv: Limit nir_lower_fragcolor(..) to advertised render targets
nir_lower_fragcolor(..) expands a broadcast gl_FragColor into one store
per render target. It was passed specs->num_rts, the physical HW count,
but on HALTI2 only half of them are advertised (caps.max_render_targets)
since the upper half is reserved for float and 128-bit format emulation
companions.

A broadcast shader thus wrote into the reserved slots. For a 128-bit
target the clear meta shader stores to every gl_FragData and overwrote
the BA companion plane filled by etna_nir_lower_128bit(..), so the clear
came back with the RG half replicated into BA.

Pass the advertised count instead to keep the broadcast inside the user
visible range.

Fixes: 928a276b78 ("etnaviv: Limit max supported render targets")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:23 +00:00
Christian Gmeiner
d75d08437b etnaviv: rs: Support 128-bit color clears
A 128-bit color level is laid out as two stacked G32R32F planes, so
clear it with two 64bpp RS fills, the RG half at the level offset and
the BA half at the second-plane offset.

A cache flush and stall separate the two fills. etna_clear_rs(..) needs
the same flush between its color and depth clears to avoid a GC600 hang,
and the blob brackets every RS operation this way. The blob clears
RGBA32F render targets through RS with the same plane split, verified
with a cmdstream capture on a faked GC7000 rev 6204 identity.

Fixes dEQP-GLES3.functional.fbo.color.repeated_clear.* for 128-bit
formats on RS-only halti5 hardware.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:22 +00:00
Christian Gmeiner
3d8a718181 etnaviv: Save the framebuffer without 128-bit companion slots
etna_blit_save_state(..) saved the expanded framebuffer including the
appended companion slots. The util_blitter restore goes through
etna_set_framebuffer_state(..), which appends companions again, so every
blitter round trip with a 128-bit color buffer bound grew nr_cbufs until
the expansion assert fired.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:22 +00:00
Christian Gmeiner
38eb315407 etnaviv: blt: Use block-layout offset for 128-bit second-plane blit
The 128-bit emulation stores all RG halves in the first half of the BO
and all BA halves in the second half. The sampler descriptors, the CPU
upload and the BLT clear all compute the second plane as
(size * depth) / 2.

etna_try_blt_blit(..) advanced source and destination by layer_stride
instead, an interleaved layout nothing else uses. For single-layer 2D
targets both formulas coincide, so plain blits worked, but per-layer
blits of a multi-layer 128-bit array texture corrupted the BA half of
every layer. Use the same (size * depth) / 2 offset as the rest of the
emulation.

Fixes: 1f60a0397b ("etnaviv: blt: Support 128 bit blit operations")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:22 +00:00
Christian Gmeiner
7436c8ad0b etnaviv: NIR pass to lower 128-bit color RT and texture access
The hardware reads and writes the emulated 128-bit formats as two
G32R32F planes, so one store or sample in the shader has to become two.
Add etna_nir_lower_128bit to do that split at the NIR level, driven by
per-slot masks and companion tables in the shader key.

A store is split into an RG store to the user output and a BA store to a
companion output above the application visible range. A sample is cloned
to the companion sampler slot and the two results are reassembled into
the full vec4, with textureGather matching the blob's 16-bit halves. The
write mask is split alongside the data so partial writes keep their
meaning, and missing channels are padded with a typed zero so the
integer formats work.

set_sampler_views(..) computes the per-stage 128-bit mask once and the
per-draw shader key setup reads it back, so draws without 128-bit
textures only pay for a key compare.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:22 +00:00
Christian Gmeiner
a499067877 etnaviv: Emit paired 128-bit sampler descriptors
Sampling a 128-bit color texture needs two G32R32F reads because the TE
knows nothing about the two-plane emulation. Emit a second sampler
descriptor at a companion slot that points at the BA plane. The shader
lowering later samples both slots and reassembles the vec4.

The BA descriptors reuse the descriptor slots of the RB_SWAP
native-order descriptors. The two cases cannot collide because no
128-bit format is RB_SWAP.

The companion of user slot i is the slot right after the bound views,
nr + i. A 128-bit view is only usable when nr + i fits the sampler pool,
asserted in debug builds. An out-of-pool companion stays at ~0U so the
emit path skips it and the shader lowering leaves the sample untouched,
the only fallout being wrong .ba data instead of a wild state write.

Dummy descriptors now cover every inactive dirty slot, not only the
previously-active ones, because a companion slot from an earlier bind
may never have been active.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:22 +00:00
Christian Gmeiner
7467461277 etnaviv: Lay out 128-bit color FBO as paired G32R32F render targets
The PE cannot write 128-bit color formats (RGBA32F/UI/SI) natively. A
pair of 64-bit G32R32F render targets covers the same memory: the
user's RG channels go to RT[i] and the BA channels go to a companion
RT that points at the second plane of the same BO.

etna_set_framebuffer_state(..) appends one companion RT per 128-bit
user RT and records the mapping for the shader lowering and the
sampler side. The companion takes an extra resource reference so it is
released cleanly on unbind. The companion colormask is derived from
the user's blend state in etna_update_blend(..), the upper two bits
(B,A) move down to (R,G), so glColorMask works on both halves.

The per-RT restructuring drops the RT_CONFIG_UNK27 programming for
non-TS secondary RTs on CACHE128B256BPERLINE hardware. The bit must
not be set on the 128-bit companion RTs and its exact meaning needs
more reverse engineering. It can come back once understood.

Drawing still needs the NIR lowering from a later commit to split the
shader output into the RG and BA halves.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:22 +00:00
Christian Gmeiner
7d853e01b4 etnaviv: Wrap pipe_framebuffer_state in etna_framebuffer_state
The 128-bit color emulation needs driver-private tracking next to the
framebuffer state. Introduce an etnaviv-private wrapper struct with
pipe_framebuffer_state as its base member. The tracking fields come
with the next commit.

All ctx->framebuffer_s.X accesses become ctx->framebuffer_s.base.X.
No behavior change.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:21 +00:00
Christian Gmeiner
d5c507c6db etnaviv: Add a helper for the 128-bit second-plane offset
The BA plane of an emulated 128-bit color level starts at
(size * depth) / 2. That offset is open-coded in the blt clear and the
transfer map/unmap paths, and the upcoming 128-bit render target and
sampler support adds more sites. Add
etna_resource_level_second_plane_offset(..) and use it, so the
invariant lives in one place.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:21 +00:00
Christian Gmeiner
695fec92f7 etnaviv: Flush texture caches after clears
A clear writes the surface through the BLT or RS engine while the
texture caches can still hold texels from an earlier draw that sampled
the cleared resource. Nothing invalidates them when the sampler view
stays bound, so the next draw samples stale data.

Clears of TS'd levels are not affected as sampling them goes through
the sampler TS or a resolve in etna_update_sampler_source(..), which
flushes the texture caches as a side effect.

Mark the texture caches dirty after clearing a samplable surface, like
etna_blit(..) and etna_transfer_unmap(..) already do.

Fixes dEQP-GLES3.functional.fbo.color.repeated_clear.sample.* with
ETNA_MESA_DEBUG=no_ts.

Cc: mesa-stable
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Daniel Lang <dalang@gmx.at>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42201>
2026-06-19 10:49:21 +00:00
David Rosca
88baf64496 va: Use RGB format with matching bit depth for YUV->YUV matrices
The bit depth used for the matrix calculation is derived from input
format.

Fixes: 5bc0df5aad ("vl,frontends/va: Implement YUV->YUV matrix coeff conversion")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42181>
2026-06-19 10:26:15 +00:00
David Rosca
5d6e5a895f va: Always reset compositor chroma location
Otherwise it would incorrectly use it for RGB formats if there
was a previous conversion with YUV formats.

Fixes: d0eec62831 ("frontends/va: Change vlVaPostProcCompositor to take pipe_vpp_desc arg")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42181>
2026-06-19 10:26:15 +00:00
David Rosca
f13f439049 vl: Skip transfer function and primaries conversion when not needed
cs_trc_apply clamps the value to 0 which causes issues in YUV->YUV conversions
that can have negative intermediate values after conversion to RGB.
If no transfer function and primaries conversion is needed, this step
can be skipped.

Fixes: 69717c257f ("vl,frontends/va: Implement gamma and primaries conversion")
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42181>
2026-06-19 10:26:15 +00:00
Collabora's Gfx CI Team
46bf8c2568 Uprev VVL to e17d63f8fcd967b2ff91efcb8607d2c9ab962e23
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
2ab77a0165...e17d63f8fc

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42140>
2026-06-19 09:23:36 +00:00
Job Noorman
44b82c7576 ir3/opt_prefetch_descriptors: rematerialize defs at preamble start
When rematerializing defs in the preamble, we make sure to insert them
in a block dominated by all their sources. When inserting a def that
doesn't have any sources we have to make sure to insert them as early as
possible. This important for sequences like this:

32      %34 = load_const (0x00000007 = 0.000000)
...
if ... {
    32     %184 = @load_preamble (base=8)
    32     %185 = @bindless_resource_ir3 (%184) (desc_set=0)
    32     %186 = @bindless_resource_ir3 (%34 (0x7)) (desc_set=1)
    32x4   %187 = (float32)tex %186 (texture_handle), %185 (sampler_handle), ...
    ...
 }

%185 has to be rematerialized in control flow since its source is
defined there. %186 does not as its source is defined outside control
flow. We used to insert %186 as late as possible
(nir_after_impl(preamble)) but this causes issues as we cannot find a
valid block (i.e., a block that is dominated by both) to insert the
descriptor prefetch for (%185, %186). Fix this by setting the default
block to insert rematerialized defs as the preamble's start block.

Totals from 520 (0.30% of 176258) affected shaders:
MaxWaves: 5796 -> 5794 (-0.03%)
Instrs: 715314 -> 715248 (-0.01%); split: -0.02%, +0.01%
CodeSize: 1547680 -> 1547182 (-0.03%); split: -0.17%, +0.14%
NOPs: 157057 -> 157005 (-0.03%); split: -0.07%, +0.04%
Full: 9399 -> 9415 (+0.17%)
(ss): 18718 -> 18715 (-0.02%); split: -0.03%, +0.01%
(sy): 8183 -> 8178 (-0.06%)
(ss)-stall: 79780 -> 79818 (+0.05%); split: -0.02%, +0.06%
(sy)-stall: 221660 -> 221591 (-0.03%); split: -0.05%, +0.02%
STPs: 232 -> 234 (+0.86%)
LDPs: 232 -> 234 (+0.86%)
Preamble Instrs: 242817 -> 242695 (-0.05%); split: -0.79%, +0.74%
Cat0: 176573 -> 176523 (-0.03%); split: -0.06%, +0.03%
Cat7: 18945 -> 18929 (-0.08%); split: -0.09%, +0.01%

Signed-off-by: Job Noorman <job@noorman.info>
Fixes: 4e2a0a5ad0 ("ir3: Add descriptor prefetching optimization on a7xx")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42220>
2026-06-19 08:47:11 +00:00
Toshinari Morikawa
0b1479eb6a virgl: fix memory leak on shader translation
virgl_create_compute_state, virgl_shader_encoder was unnecessarily
cloning NIR. Since NIR is passed from st/mesa and the driver is
responsible for freeing it, virgl driver don't have to preserve
passed NIR and should free it through nir_to_tgsi_options.

Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13822
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Toshinari Morikawa <morikawa.toshinari@jp.panasonic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40236>
2026-06-19 08:31:39 +00:00
squidbus
1b48450128 kk: Implement draw-related commands using device addresses
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Replaces several draw commands with their `VK_KHR_device_address_commands`
equivalent, using device addresses directly with Metal 4. The Vulkan runtime
handles lifting the older commands up for us. We can also remove most index
robustness handling, as Metal 4 provides the necessary guarantees.

This does not fully implement `VK_KHR_device_address_commands` yet as the
copy/fill/update memory commands do not have as straight-forward Metal 4
or meta equivalents.

Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42316>
2026-06-19 06:38:20 +00:00
Casey Bowman
5fe14149a9 anv: Set anv_disable_hiz for Sons of the Forest
Until we make greater use of the COMMON_SLICE_CHICKEN1 register for Xe1-Xe3
platforms, we'll disable HiZ for SOTF to avoid redundant plane expansions.

In turn, this will avoid major FPS drop when encountering scenes
where this condition takes place.

A test scene with the player's camera near an environmental rock
showed FPS gains greater than 183%. This should keep the FPS
stable during gameplay.

TODO: Revert once proper solution is in place. See
https://gitlab.freedesktop.org/mesa/mesa/-/work_items/11782

Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42206>
2026-06-19 05:58:12 +00:00
Casey Bowman
8c689e9bcb anv: Add option to disable HiZ via drirc
Allows per-app disabling of HiZ if a case arises where HiZ is
significantly slowing down a workload.

This option should not be the resulting fix for when such cases
arise, but can serve as a temporary band-aid while a proper
solution is fleshed out.

Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42206>
2026-06-19 05:58:12 +00:00
Aitor Camacho
c08dba8302 kk: Move to Metal4 command encoding
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Tested-by: squidbus <squidbus@proton.me>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42268>
2026-06-19 05:18:55 +00:00
Gu, Wangfeng
56588ef066 radv/sqtt: emit pending barrier end before API markers
Flush any delayed RGP barrier-end marker before writing the next general API marker
so no-op barriers cannot incorrectly cover subsequent draw or dispatch events.

Signed-off-by: Gu, Wangfeng <Wangfeng.Gu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42289>
2026-06-19 04:57:12 +00:00
Valentine Burley
d2ac94d4d3 perfetto: Centralize perfetto header include in u_perfetto.h
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Move the ANDROID_LIBPERFETTO conditional include (<perfetto.h> vs
<perfetto/tracing.h>) into util/perf/u_perfetto.h to eliminate
duplicated #ifdef blocks scattered across every driver.

Files that need additional pbzero proto headers for Android's modular
perfetto still include those individually under #ifdef
ANDROID_LIBPERFETTO.

This enables ninja-to-soong to generate an Android.bp that builds Mesa
against Android's libperfetto_client_experimental library.

Following:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36561

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Rob Clark <rob.clark@oss.qualcomm.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42271>
2026-06-19 02:58:16 +00:00
Karol Herbst
d470487004 nak/instr_sched_prepass: Take predicate spilling into account when scheduling instrucitons
As long as the GPR usage never exceeds the threshold, the instruction
scheduling doesn't take spilling costs into account of other files. Given
we only have 7 availalbe predicates, the overhead of spilling those can be
significant in int64 math heavy shaders.

I also refactored the code a bit to make it easier to do the same for
other register files, however it seems to hurt occupancy when doing so.

Fixes a performance regression with `vkpeak int64-scalar`.

Totals:
CodeSize: 8373514512 -> 8372304992 (-0.01%); split: -0.02%, +0.01%
Number of GPRs: 48332322 -> 48328427 (-0.01%); split: -0.02%, +0.01%
Static cycle count: 4781363047 -> 4779946979 (-0.03%); split: -0.11%, +0.08%
Spills to reg: 197662 -> 159098 (-19.51%); split: -21.19%, +1.68%
Fills from reg: 195767 -> 161305 (-17.60%); split: -18.62%, +1.01%
Max warps/SM: 53043448 -> 53044352 (+0.00%); split: +0.00%, -0.00%

Totals from 15734 (1.30% of 1213129) affected shaders:
CodeSize: 465771008 -> 464561488 (-0.26%); split: -0.36%, +0.10%
Number of GPRs: 1277105 -> 1273210 (-0.30%); split: -0.75%, +0.44%
Static cycle count: 570906432 -> 569490364 (-0.25%); split: -0.90%, +0.65%
Spills to reg: 121361 -> 82797 (-31.78%); split: -34.52%, +2.74%
Fills from reg: 107186 -> 72724 (-32.15%); split: -34.00%, +1.85%
Max warps/SM: 448816 -> 449720 (+0.20%); split: +0.38%, -0.18%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42323>
2026-06-19 02:01:18 +00:00
Ahmed Hesham
13d2b058cd clc: fix fp16 fallback mask for remquo
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The fp16 libclc fallback indexes NIR function arguments including the
hidden return pointer. For remquo, this makes the parameters:
0: return pointer
1: x
2: y
3: quotient pointer

The quotient pointer (3) is an integer pointer and should not be
converted when building the wrapper. Update the mask for
OpenCLstd_Remquo to exclude the third argument.

Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42309>
2026-06-19 01:24:45 +00:00