Commit graph

221721 commits

Author SHA1 Message Date
Loïc Molinari
29d19a6119 panfrost: Save render cond state in panfrost_blitter_clear()
Fixes: 689f38b2b4 ("panfrost: fix refcnt imbalance related to blitter")
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:35 +00:00
Loïc Molinari
6789f9e47a panfrost: Refactor blitter states saving
Use a single enum for states and declare the flags directly into the
functions using them and use the same pattern in all of them to
improve readability. Rename PAN_DISABLE_RENDER_COND to
PAN_SAVE_RENDER_COND so that it can use a non-negative check like all
the other states.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:34 +00:00
Loïc Molinari
d5bd00dba8 panfrost: Test render cond in legalized blit func
Add a note to the blit supported check for the S8_UINT format that's
specially handled by Panfrost.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:34 +00:00
Loïc Molinari
0620b7746b panfrost: Shorten legalized blit func name
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:34 +00:00
Loïc Molinari
b13b27bef4 panfrost: Make legalization less verbose
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:34 +00:00
Loïc Molinari
0a36a33f53 panfrost: Move clear funcs to pan_blitter
Move panfrost_clear_render_target() and panfrost_clear_depth_stencil()
to pan_blitter.c. This allows to make pan_blitter_save_states() static
and avoids to expose the saved states enums.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:33 +00:00
Loïc Molinari
e30d95309f panfrost: Move blit funcs to pan_blitter
Add a dedicated header and prefix blit funcs accordingly.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:33 +00:00
Loïc Molinari
547c4ff8cb panfrost: Fix MTK tiled resources mapped as shader images
MTK tiled resources, like AFBC/AFRC resources, must be converted to
linear before being used as a shader image in a CS.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:32 +00:00
Loïc Molinari
23ab18e9e3 panfrost: Blit with the RUN_FULLSCREEN instruction (v13)
Extend RUN_FULLSCREEN support to architecture v13.

Draw call descriptor flags must now be copied to input staging
registers for the tiler to avoid dereferences in the fragment
pre-pass.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:32 +00:00
Loïc Molinari
712f2534b9 panfrost: Blit with the RUN_FULLSCREEN instruction (v12)
Extend RUN_FULLSCREEN support to architecture v12.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:32 +00:00
Loïc Molinari
1e9c175ea0 panfrost: Blit with the RUN_FULLSCREEN instruction (v10)
The GPU (tiler) only writes the VertexArrayDescriptor fields
"pointer", "vertex_packet_stride" and "vertex_attribute_stride" in
RUN_IDVS malloc mode. These fields can also be programmed in
non-malloc modes. Update the XML spec file comment accordingly.

Fixes: 69ddb910e0 ("panfrost/ci: Enable G610 piglit job")
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:31 +00:00
Loïc Molinari
ed05834caf panfrost: Blit with the FullScreenJob descriptor (v9)
A vertex array is pre-allocated in order to interpolate varyings.

The GPU (tiler) only writes the VertexArrayDescriptor fields
"pointer", "vertex_packet_stride" and "vertex_attribute_stride" in
MallocVertexShaderJob mode. These fields can also be programmed with
IndexedVertexShaderJob and FullScreenJob modes. Update the XML spec
file comment accordingly.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:31 +00:00
Loïc Molinari
82306a6bd7 panfrost: Emit draw call flags with dedicated func
The new function will be used in the next commit to emit fullscreen
draw call flags.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:31 +00:00
Loïc Molinari
338b937d01 panfrost: Pass primitive mode to funcs instead of full draw info
panfrost_update_active_prim() and prepare_draw() both take a
pipe_draw_info struct pointer only for accessing the primitive
mode. Directly pass the primitive mode instead in order to ease the
addition of new draw functions.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:30 +00:00
Loïc Molinari
6ad3944290 panfrost: Add infrastructure for fullscreen draw calls
Panfrost blits are implemented using u_blitter which exposes the
draw_rectangle handler for drivers to blit with dedicated GPU
support. By default, it ends up blitting with a draw_vbo call on the
pipe. In Panfrost, draw_vbo then emits a full-featured draw call using
an indexed or malloc vertex shader job on pre-CSF GPUs or a RUN_IDVS
instruction on CSF GPUs. Since v9, Mali GPUs expose a lighter draw
call with the FullScreenJob descriptor on pre-CSF GPUs or with the
RUN_FULLSCREEN instruction on CSF GPUs. These draw calls emit a quad
primitive into the polygon list and run tiling of a fullscreen
fragment job without vertex processing. This commit adds the
infrastructure to implement blits using fullscreen draw calls.

It supports all types of blits apart from the instanced ones. Partial
blits are supported using scissors and textured blits are supported
using varying interpolation and fragment shading.

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:30 +00:00
Loïc Molinari
d83e090f12 u_blitter: Fix out-dated draw_rectangle handler doc
Remove mention of UTIL_BLITTER_ATTRIB_COLOR and
UTIL_BLITTER_ATTRIB_TEXCOORD and add a few words about
UTIL_BLITTER_ATTRIB_XY and UTIL_BLITTER_ATTRIB_XYZW.

Fixes: 22ed1ba01a ("gallium/u_blitter: use draw_rectangle for all blits except cubemaps")
Fixes: ca09c173f6 ("gallium/u_blitter: remove UTIL_BLITTER_ATTRIB_COLOR, use a constant buffer")
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40124>
2026-04-28 11:43:30 +00:00
Nick Hamilton
b205c7d592 pvr: Enable shaderImageGatherExtended
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Enables the shaderImageGatherExtended feature and sets the
{min,max}TexelGatherOffset physical device properties.

The properties are queried via Zink and are expected to be non-zero.

Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
2026-04-28 12:04:09 +01:00
Simon Perretta
57791c4a99 pco: track how many tg4/raw sample comps are needed
Rather than always emitting and swizzling 16 components for raw samples,
scale it by the number actually needed as defined by the selected tg4
channel/components.

Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
2026-04-28 12:04:03 +01:00
Nick Hamilton
b80a5f9b7d pco: fix clamping the array index when shaderImageGatherExtended is enabled
The array index value is a signed integer but the compiler was using
the unsigned version of the clamp helper function meaning the value
was not been clamped to 0 when its value was < 0.

Fix the following deqp test cases when shaderImageGatherExtended is enabled
dEQP-VK.glsl.texture_gather.basic.2d_array.*
dEQP-VK.glsl.texture_gather.offset.*.2d_array.*
dEQP-VK.glsl.texture_gather.offset_dynamic.*.2d_array.*
dEQP-VK.glsl.texture_gather.offsets.*.2d_array.*

Fixes: 854563f0f8 ("pco: fully switch over to common smp emission code")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
2026-04-28 12:03:51 +01:00
Simon Perretta
56b8dc92a9 pco: amend tg4 lowering
Use lower_tg4_offsets to take care of explicit offsets, and just swizzle
the texels in the order defined by textureGather*

Fixes: 46c9239c11 ("pvr, pco: initial texture gather support with gather sampler")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Acked-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40687>
2026-04-28 12:03:30 +01:00
Arjob Mukherjee
35f57a2739 pvr: increase value of maxPerStageDescriptorStorageBuffers
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Increase past the minimum required by the Vulkan Spec to fix tests. This
was needed due to Zink requirements which splits
`maxPerStageDescriptorStorageBuffers` between atomic buffers and
`MaxShaderStorageBlocks`.

Fixes the following GLES conformance tests:
  KHR-GLES31.core.compute_shader.resources-max
  KHR-GLES31.core.draw_indirect.advanced-twoPass-Compute-arrays
  KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
  KHR-GLES31.core.shader_image_load_store.basic-allTargets-store-cs
  KHR-GLES31.core.shader_image_load_store.basic-allTargets-store-fs
  KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-int
  KHR-GLES31.core.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case1-cs
  KHR-GLES31.core.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
  dEQP-GLES31.functional.draw_indirect.compute_interop.combined.drawelements_compute_cmd_and_data_and_indices
  dEQP-GLES31.functional.synchronization.in_invocation.ssbo_alias_overwrite
  dEQP-GLES31.functional.synchronization.in_invocation.ssbo_alias_write
  dEQP-GLES31.functional.synchronization.in_invocation.ssbo_atomic_alias_overwrite
  dEQP-GLES31.functional.synchronization.in_invocation.ssbo_atomic_alias_write
  dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.ssbo_atomic_multiple_write_read
  dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.ssbo_multiple_write_read
  dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_alias_overwrite
  dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_alias_write
  dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_atomic_alias_overwrite
  dEQP-GLES31.functional.synchronization.inter_invocation.ssbo_atomic_alias_write

Backport-to: 26.0
Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41156>
2026-04-28 08:36:48 +00:00
Christian Gmeiner
7d59c62fde panvk: Wire up VK_EXT_conservative_rasterization on v11+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Mali >= v11 has a Conservative Rast Mode field in DCD Flags 0 with
values Disabled and Over Estimate. Wire it to vk_runtime's
rasterization state and expose the extension on PAN_ARCH >= 11, with
caps restricted to overestimate only — HW has no underestimate value
and no overestimation-size granularity.

On v11-v13, degenerate triangles produce a wrong fragment w when
overestimate is enabled, so cull_zero_area is forced on alongside
the mode bit and degenerateTrianglesRasterized is reported as false.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41189>
2026-04-28 09:34:28 +02:00
Samuel Pitoiset
df3de4acbb ac,radv,radeonsi: replace mesh_fast_launch_2 by gfx_level checks
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41204>
2026-04-28 06:50:43 +00:00
Samuel Pitoiset
94ae99f16f radv: replace use_ngg_streamout by gfx_level checks
There is no way to enable/disable via debug options or so, it's only
used on GFX11+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41204>
2026-04-28 06:50:43 +00:00
Karol Herbst
4b66258717 nak: call nir_opt_algebraic_distribute_src_mods
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Totals from 134863 (11.12% of 1212873) affected shaders:
CodeSize: 2109574320 -> 2105266608 (-0.20%); split: -0.23%, +0.02%
Number of GPRs: 7199115 -> 7194107 (-0.07%); split: -0.13%, +0.06%
SLM Size: 201728 -> 201720 (-0.00%); split: -0.01%, +0.00%
Static cycle count: 2037608114 -> 2035165858 (-0.12%); split: -0.17%, +0.05%
Spills to memory: 22063 -> 22035 (-0.13%); split: -0.14%, +0.01%
Fills from memory: 22063 -> 22035 (-0.13%); split: -0.14%, +0.01%
Spills to reg: 78193 -> 78139 (-0.07%); split: -0.17%, +0.10%
Fills from reg: 83383 -> 83335 (-0.06%); split: -0.15%, +0.09%
Max warps/SM: 5188428 -> 5188840 (+0.01%); split: +0.03%, -0.02%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41214>
2026-04-28 03:08:01 +02:00
Karol Herbst
a9eac010dd nak: call nir_opt_fp_math_ctrl
Totals from 77360 (6.38% of 1212873) affected shaders:
CodeSize: 1255332672 -> 1250129888 (-0.41%); split: -0.44%, +0.03%
Number of GPRs: 4233257 -> 4226625 (-0.16%); split: -0.20%, +0.05%
Static cycle count: 937314398 -> 935865851 (-0.15%); split: -0.22%, +0.07%
Spills to memory: 11371 -> 11373 (+0.02%)
Fills from memory: 11371 -> 11373 (+0.02%)
Spills to reg: 24245 -> 24262 (+0.07%); split: -0.65%, +0.72%
Fills from reg: 23689 -> 23742 (+0.22%); split: -0.55%, +0.77%
Max warps/SM: 2912604 -> 2916096 (+0.12%); split: +0.15%, -0.03%

Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41214>
2026-04-28 03:07:51 +02:00
Marek Olšák
3dcba87ca3 nir/opt_licm: hoist instructions across multiple levels of nested loops
radv gfx12:

Totals:
Instrs: 42861311 -> 42861476 (+0.00%); split: -0.00%, +0.00%
CodeSize: 227917476 -> 227918160 (+0.00%); split: -0.00%, +0.00%
Latency: 265381068 -> 265373506 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 42954018 -> 42952350 (-0.00%)
VClause: 819026 -> 819024 (-0.00%)
SClause: 1210348 -> 1210293 (-0.00%)
Copies: 2919525 -> 2919597 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 2889432 -> 2889406 (-0.00%)
VALU: 23757371 -> 23757377 (+0.00%); split: -0.00%, +0.00%
SALU: 5981417 -> 5981485 (+0.00%); split: -0.00%, +0.00%
VOPD: 8966 -> 8964 (-0.02%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>
2026-04-27 23:58:21 +00:00
Marek Olšák
8e036fcaec nir/opt_licm: use nir_metadata_control_flow
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>
2026-04-27 23:58:21 +00:00
Marek Olšák
e0112be522 nir/opt_licm: add a private state structure for the pass
The structure will grow in later commits.

The major change is that the preheader and exit blocks are replaced
by tracking just the innermost optimized nir_loop * and getting the
predecessor and successor blocks out of it.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41220>
2026-04-27 23:58:20 +00:00
Yiwei Zhang
90c4f1f9dc util/android_stub: drop legacy atrace
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The legacy atrace support was added in lack of perfetto c bindings in:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13255

Now that perfetto has matured while atrace has compat issue with C++,
let's drop legacy atrace support in favor of perfetto even for Android.

Reviewed-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41219>
2026-04-27 23:29:27 +00:00
Timothy Arceri
a42c55da46 amd/radeonsi: dont clamp packed user varyings
ac_nir_optimize_outputs() might pack user varyings into the color
built-ins. If this happens we skip adding clamping to the
components that contain the user varying.

This change also fixes a second bug where a color built-in can be
packed into a non-color slot and was no longer being clamped.

Fixes: 3777a5d7 ("radeonsi: assign param export indices before compilation")
Closes: #14443

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40594>
2026-04-27 22:59:58 +00:00
Marek Olšák
0684976de8 ac/nir: add ac_nir_assign_fs_input_locations to set PS input locations in stone
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
No intended functional change.

This prevents possible breakage due to DCE removing input loads followed
by nir_shader_gather_info updating input masks and changing the result of
ac_nir_get_io_driver_location after PS input register contents are already
determined.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41175>
2026-04-27 21:05:53 +00:00
Mel Henning
4b0a0ed7b6 nak: Use NIR_LOOP_PASS
This is similar to
75ede9d9bc ("intel/brw: track last successful pass and leave the loop early")
except that it uses the common nir helpers.

Note that I've also marked nir_opt_peephole_select as NOT_IDEMPOTENT
because I'm skeptical that it actually is idempotent. This differs from
both brw and radv.

I'm also marking gcm as not idempotent because it isn't idempotent in
practice on one of the shaders in my shader-db:
2bf4ba7133/fossils/blender
pipeline hash 0e972f8e349af903

This is about a 4% geomean compile time speedup on my local collection
of shaders.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>
2026-04-27 20:14:05 +00:00
Mel Henning
75fc9e2704 nak: Use shader_info->var_copies_lowered
This mirrors the change from
ba0bc7182d ("anv: use shader_info->var_copies_lowered")

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41118>
2026-04-27 20:14:05 +00:00
Danylo Piliaiev
5b5bc956df tu/perfetto: Move away from single timeline for all apps
This moves from deprecated stage_id/hw_queue_id to per-context
stage_iid/hw_queue_iid, which leads to separate timelines per app.
There are several benefits to this:
- Different driver versions could be used by different apps and perfetto
  won't confuse tracepoints.
- Tracepoints from different apps may not align perfectly, so previously
  we got a fair amount of weird vertical ordering of tracepoints.

The downside is that info is spread across several timelines multiplied
by queues, but I think that's better since it is easier to understand
which tracepoints correspond to which app.

The changes are mostly copied from radeon/intel perfetto integration.

This also fixes app_event emission along the way, previously
debug_marker_stage was called _before_ SEQ_INCREMENTAL_STATE_CLEARED.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41105>
2026-04-27 19:45:42 +00:00
Serdar Kocdemir
117f3cb1fc gfxstream: allow VK_KHR_maintenance extensions
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Add latest maintenance extensions required by Android Vulkan
requirements, except VK_KHR_maintenance5 which enables dynamic rendering
on ANGLE and causes issues with some cuttlefish targets.

Test: CI

Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41210>
2026-04-27 18:57:43 +00:00
Serdar Kocdemir
e8773b96df gfxstream: Add VK_EXT_pipeline_protected_access
Test: dEQP-VK.protected_memory.*
Test: dEQP-VK.pipeline.monolithic.image.dedicated_allocation.*
Test: dEQP-VK.api.info.get_physical_device_properties2.features.*

Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41210>
2026-04-27 18:57:43 +00:00
Ian Romanick
e301817753 brw: Don't lower phis involved in DPAS instructions to scalar
On my Arc A380 (DG2), this more than doubles the performance of Jeff
Bolz's cooperative matrix benchmark. With llama.cpp modified to use
cooperative matrix on DG2, performance is improved by 37%.

Closes: #15311
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
2026-04-27 18:09:16 +00:00
Ian Romanick
09b43966ba brw: Lower all phis to scalar
The next commit will cause some very specific phis to not be lowered to
scalar, and that's the reason the callback is used instead of
nir_lower_all_phis_to_scalar.

It's worth noting that the comment in nir_lower_phis_to_scalar.c
specifically calls out Deus Ex as the reason some phis should not be
lowered. At least on current BRW, zero shaders from Deus Ex trace were
affected for spills or fills on any Intel platform.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050005 -> 17051449 (<.01%)
instructions in affected programs: 41032 -> 42476 (3.52%)
helped: 29 / HURT: 159

total cycles in shared programs: 876411976 -> 876433702 (<.01%)
cycles in affected programs: 1455550 -> 1477276 (1.49%)
helped: 40 / HURT: 150

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 916599633 -> 916694854 (+0.01%); split: -0.00%, +0.01%
CodeSize: 14705971792 -> 14708302384 (+0.02%); split: -0.00%, +0.02%
Send messages: 40870114 -> 40870113 (-0.00%)
Cycle count: 102360965889 -> 102364169753 (+0.00%); split: -0.00%, +0.01%
Spill count: 3460669 -> 3460240 (-0.01%)
Fill count: 4988325 -> 4987891 (-0.01%)
Max live registers: 192914542 -> 192918153 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 48848112 -> 48848128 (+0.00%)
Non SSA regs after NIR: 141633613 -> 141671589 (+0.03%); split: -0.00%, +0.03%

Totals from 5713 (0.28% of 2010434) affected shaders:
Instrs: 5215921 -> 5311142 (+1.83%); split: -0.09%, +1.91%
CodeSize: 88940784 -> 91271376 (+2.62%); split: -0.20%, +2.82%
Send messages: 284751 -> 284750 (-0.00%)
Cycle count: 275671864 -> 278875728 (+1.16%); split: -0.74%, +1.90%
Spill count: 857 -> 428 (-50.06%)
Fill count: 845 -> 411 (-51.36%)
Max live registers: 667776 -> 671387 (+0.54%); split: -0.86%, +1.40%
Max dispatch width: 160416 -> 160432 (+0.01%)
Non SSA regs after NIR: 1127904 -> 1165880 (+3.37%); split: -0.10%, +3.47%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Matt Corallo <git@bluematt.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41172>
2026-04-27 18:09:16 +00:00
Suresh Guttula
71508d90aa ac: Add vcn_5_3_0 support
Enable hardware decode/encode capabilities for VCN 5.3.0 by
configuring the supported codec list. This allows vainfo to
properly enumerate available codec capabilities.

Signed-off-by: Suresh Guttula <suresh.guttula@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41202>
2026-04-27 17:13:18 +00:00
Benjamin Cheng
0e04954c9a radv/video_enc: Use correct swizzle mode for VCN5 with GFX11
Signed-off-by: Suresh Guttula <suresh.guttula@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41202>
2026-04-27 17:13:18 +00:00
Benjamin Cheng
922d04c9a5 ac/vcn: Rename VCN5 swizzle mode to GFX12
The original naming is inaccurate, it depends on the GFX version, not
VCN.

Signed-off-by: Suresh Guttula <suresh.guttula@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41202>
2026-04-27 17:13:18 +00:00
Matt Turner
acba4c9fd8 radv: expose VK_KHR_performance_query on GFX11
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Enable VK_KHR_performance_query on GFX11 (RDNA3 / RDNA3.5) now that the
selector tables and packet emission are in place.

Tested on Strix Halo with dEQP-VK.query_pool.performance_query.* (6 pass,
6 not-supported for the allowCommandBufferQueryCopies cases).

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>
2026-04-27 16:16:00 +00:00
Matt Turner
8499d86b94 radv/perfcounter: add GFX11 performance counter selectors
GFX11 reorganizes the shader perfcounter blocks: wave counts move from
SQ to the SQG registers (still mapped as the SQ block in ac/), while
per-instruction counters move from SQ to the new SQ_WGP block.

Add GFX11-specific selector enums using the new block assignments and
branch radv_query_perfcounter_descs to select them on GFX11+. GL2C,
GL1C, and TCP selectors are unchanged between GFX10.3 and GFX11.

The "Instructions" (total count) counter is dropped on GFX11 as there
is no direct SQ_WGP equivalent for INSTS_ALL.

Selector indices verified against gpu_performance_api's
gpa_hw_counter_gfx11.cc.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>
2026-04-27 16:15:59 +00:00
Matt Turner
703de21af8 radv/perfcounter: guard select1 access in radv_emit_select
Some perfcounter blocks (e.g. SQ_WGP on GFX11) define num_spm_modules
but have no select1 register array. Skip the select1 loop when the
array is NULL.

This is a prerequisite for enabling performance queries on GFX11.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41157>
2026-04-27 16:15:59 +00:00
Matt Turner
2595940b0d radv: fix UB in radv_format_pack_clear_color for snorm formats
Casting a negative float to uint64_t is undefined behavior. GCC 15 with
-O2 produces 0xFFFFFFFFFFFFFFFF for (uint64_t)(-32767.5f), causing snorm
clear values to be packed incorrectly (e.g. 0xFFFF instead of 0x8001 for
snorm16 -1.0). This results in wrong DCC comp-to-single clear colors and
~966 CTS snorm multisample_resolve test failures.

Fix by casting through int64_t first, which is well-defined (truncation
toward zero) and preserves the two's complement bit pattern.

Fixes: 585c25be1e ("radv: fix color conversions for normalized uint/sint formats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41164>
2026-04-27 15:44:09 +00:00
Jaishankar Rajendran
12f43d048e anv: tune parameters of the ASTC software decoding
Signed-off-by: Prakhar Vishwakarma <prakhar.vishwakarma@intel.com>
Signed-off-by: Jaishankar Rajendran <jaishankar.rajendran@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41205>
2026-04-27 15:17:04 +00:00
Jaishankar Rajendran
cd941d3970 vulkan/runtime: enable parametrization of ASTC software decode
Enable the driver to select :
  - LUT allocation alignment
  - LUT memory flags selection

Signed-off-by: Prakhar Vishwakarma <prakhar.vishwakarma@intel.com>
Signed-off-by: Jaishankar Rajendran <jaishankar.rajendran@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41205>
2026-04-27 15:17:04 +00:00
Yiwei Zhang
0f2a42afcf lvp/android: use common ANB implementations
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This has been unblocked by
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40211.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41138>
2026-04-27 14:58:18 +00:00
David Rosca
7b5277ce5c frontends/va: Fix out of bounds write in AV1 decode tile info
For invalid streams tile cols and rows may be higher than 64.
This would overwrite data after the height_in_sbs array, but since
the maximum amount of bytes overwritten is bound by the maximum
supported decode resolution, this can't overwrite any important
fields and thus won't cause any observable issue.
As this can only happen with invalid streams, it still won't decode
correctly with this fixed.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15290
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41016>
2026-04-27 14:29:34 +00:00