Commit graph

7723 commits

Author SHA1 Message Date
Lorenzo Rossi
d2f7b8db9d pan/compiler: Collect nopersp varyings in lower_noperspective_fs
Now that lower_noperspective_fs and varying collection are closer
together we can merge nopersp collection in lower_noperspective_fs
without fear of desyncrhonization, making everything also a bit cleaner.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:13 +00:00
Lorenzo Rossi
dfdb9f1d41 pan/compiler: Sort postprocess
Now that we removed a lot of upcoming bugs using time-travel, we can
reorders the passes in postprocess to be more in-line with modern
compilers.  We also lift a lot of passes from compile_shader_nir into
postprocess.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Co-authored-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:13 +00:00
Lorenzo Rossi
312603b2fa pan/compiler: Rename bifrost_optimize_nir
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:12 +00:00
Lorenzo Rossi
6f05b27b9a panvk: Remove pan_optimize_nir call
The shader will be optimized a few passes later in preprocess, this way
we can have the same pipeline as in Gallium

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:12 +00:00
Lorenzo Rossi
39f54ddea2 panvk,panfrost: Pass inputs and info to postprocess
This is needed if we want postprocess to decide IDVS and layout later in
the series

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:12 +00:00
Lorenzo Rossi
01e6a0555c pan/compiler: Rework scratch memory strategy
Before this commit, all scartch memory was allocated in 16-byte chunks
and indirect references where always lowered into if-else trees.  This
patch tries to clean this up a little bit, by using a more compact layout
that is still TLS friendly, allowing indirect accesses and only lowering
them for optimizations and using the newer nir_lower_explicit_io.

The patches should improve performance on some shaders, but lifts a lot
of dust off the compiler uncovering some new bugs.  They have been kept
at bay by disabling local memory vectorization.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:11 +00:00
Lorenzo Rossi
f0d2ad9840 panvk/jm: Fix tls_size overwrite in indirect draws
Only caused problems when the VS/FS has more TLS than our internal shaders
that doesn't usually happen but will cause bugs when we start to
compress local memory.

Fixes: 005703e5b5 ("panvk: Move TLS preparation logic to cmd_dispatch_prepare_tls")
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:11 +00:00
Lorenzo Rossi
768d7cb149 pan/compiler: Sort preprocess
Reorders the preprocess passes to be more in-line with modern compilers.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Co-authored-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:11 +00:00
Lorenzo Rossi
dd96a1514b pan/compiler: Handle ssbo_atomics in lower_vs_atomics
This way the pass does not depend on lower_ssbo anymore

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:10 +00:00
Lorenzo Rossi
408d03291d pan/compiler: Lower unaligned scratch memory accesses
Using OpenCL size/alignment requirements we might get some types
with a size bigger than their alignment.  This breaks the current TLS
load/stores that expect 16-byte alignment for 16-byte load/stores. This
problem probably hasn't surfaced yet because we reassigned OpenCL scratch
in 16-byte slots, but will break if we compact the layout.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:10 +00:00
Lorenzo Rossi
ac23e3c6c5 pan/compiler: Fix WaRaR hazard in pressure scheduler
A common memory swap operation might be compiled as:
%v1 = LOAD %a1  # L1
%v2 = LOAD %a2  # L2
STORE %v2, %a1  # S1
STORE %v1, %a2  # S2

The current pressure scheduler just records the last load/store
operation for dependencies, thus the dependency chain becomes L2 -> S1
-> S2.  The compiler might thus reorder them as L2, S1, L1, S2, i.e
                #    L1:
%v2 = LOAD %a2  # L2 |
STORE %v2, %a1  # S1 |
%v1 = LOAD %a1  # L1<-
STORE %v1, %a2  # S2

This is incorrect as S1 depends on L1 too.  The fix makes all loads also
depend on each other, restricting load reordering.  The proper fix that
NAK has is to track all loads and make each store depend on every load,
building a more correct DAG.  This doesn't matter as much in panfrost
since all loads are serialized by the scoreboard.  We might still want
to implement it for register pressure in the future.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
2026-04-30 18:26:09 +00:00
Lorenzo Rossi
abde403a7c pan/compiler: Allow 16-bit alpha for atest_pan
We just need to handle it while translating NIR to BIR, the hardware can
do automatic widening to 32-bits.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>
2026-04-30 17:33:09 +00:00
Valentine Burley
ca92f8697e panfrost/ci: Update kernel to pick up ZSTD support for ZRAM
No other changes.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15342
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41217>
2026-04-29 07:24:18 +00:00
Olivia Lee
72e0eda260 pan/bi: fix memory access alignment
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Memory accesses need to be aligned up to the next power of two of the
full access size. Component count and bit-size don't matter to the
hardware, only the total size.

shader-db results are pretty much what you would expect, there are a few
shaders that have increased LS instructions as a result of splitting
accesses to satisfy alignment requirements that were previously ignored.
The one surprising thing is that there are several shaders that have
reduced uniform usage. Looking at some of these individually, what
happened is that splitting UBO loads early allowed the compiler to
eliminate loads from unused ranges of the access.

total instrs in shared programs: 719166 -> 719174 (<.01%)
instrs in affected programs: 2355 -> 2363 (0.34%)
helped: 4
HURT: 6
helped stats (abs) min: 1.0 max: 9.0 x̄: 3.00 x̃: 1
helped stats (rel) min: 0.36% max: 6.52% x̄: 1.99% x̃: 0.54%
HURT stats (abs)   min: 1.0 max: 4.0 x̄: 3.33 x̃: 4
HURT stats (rel)   min: 0.65% max: 2.13% x̄: 1.38% x̃: 1.48%
95% mean confidence interval for instrs value: -2.14 3.74
95% mean confidence interval for instrs %-change: -1.76% 1.82%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 30210.83 -> 30218.81 (0.03%)
cycles in affected programs: 50 -> 57.99 (15.97%)
helped: 2
HURT: 6
helped stats (abs) min: 0.0078129999999999589 max: 0.070312000000000041 x̄: 0.04 x̃: 0
helped stats (rel) min: 1.10% max: 10.23% x̄: 5.66% x̃: 5.66%
HURT stats (abs)   min: 0.03125 max: 5.0 x̄: 1.34 x̃: 1
HURT stats (rel)   min: 2.38% max: 25.00% x̄: 13.05% x̃: 14.26%
95% mean confidence interval for cycles value: -0.42 2.41
95% mean confidence interval for cycles %-change: -1.74% 18.49%
Inconclusive result (value mean confidence interval includes 0).

total cvt in shared programs: 2385.91 -> 2385.91 (<.01%)
cvt in affected programs: 11.14 -> 11.14 (<.01%)
helped: 5
HURT: 4
helped stats (abs) min: 0.0078119999999999301 max: 0.070312000000000041 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.27% max: 10.23% x̄: 2.61% x̃: 0.82%
HURT stats (abs)   min: 0.01562600000000014 max: 0.03125 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 1.31% max: 2.75% x̄: 2.21% x̃: 2.40%
95% mean confidence interval for cvt value: -0.02 0.02
95% mean confidence interval for cvt %-change: -3.51% 2.58%
Inconclusive result (value mean confidence interval includes 0).

total ls in shared programs: 25871 -> 25879 (0.03%)
ls in affected programs: 46 -> 54 (17.39%)
helped: 0
HURT: 4
HURT stats (abs)   min: 1.0 max: 5.0 x̄: 2.00 x̃: 1
HURT stats (rel)   min: 10.00% max: 25.00% x̄: 18.38% x̃: 19.26%
95% mean confidence interval for ls value: -1.18 5.18
95% mean confidence interval for ls %-change: 8.46% 28.30%
Inconclusive result (value mean confidence interval includes 0).

total code size in shared programs: 6302848 -> 6302976 (<.01%)
code size in affected programs: 1536 -> 1664 (8.33%)
helped: 0
HURT: 1

total registers used in shared programs: 117324 -> 117329 (<.01%)
registers used in affected programs: 45 -> 50 (11.11%)
helped: 1
HURT: 2
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 6.25% max: 6.25% x̄: 6.25% x̃: 6.25%
HURT stats (abs)   min: 2.0 max: 4.0 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 12.50% max: 30.77% x̄: 21.63% x̃: 21.63%

total uniforms used in shared programs: 78538 -> 78274 (-0.34%)
uniforms used in affected programs: 2688 -> 2424 (-9.82%)
helped: 104
HURT: 4
helped stats (abs) min: 1.0 max: 18.0 x̄: 2.65 x̃: 2
helped stats (rel) min: 1.96% max: 54.55% x̄: 12.78% x̃: 11.11%
HURT stats (abs)   min: 1.0 max: 5.0 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 3.70% max: 16.13% x̄: 9.92% x̃: 9.92%
95% mean confidence interval for uniforms used value: -3.01 -1.88
95% mean confidence interval for uniforms used %-change: -14.15% -9.74%
Uniforms used are helped.


Total CPU time (seconds): 73.26 -> 74.48 (1.67%)

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 2f2738dc90 (pan/bi: Use nir_lower_mem_access_bit_sizes)
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41033>
2026-04-29 01:26:47 +00:00
Faith Ekstrand
11399b15e0 pan/bi: Improve swizzle propagation
Instead of only propagating when we have a full word, always attempt to
find a propagation, only considering the bytes actually consumed by the
instruction.  This is especially important for v2i8 sources.

Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41247>
2026-04-28 20:27:16 +00:00
Yiwei Zhang
ad857ba7cd panvk: drop panvk_android_create_deferred_image
No need a separate helper anymore since
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41145.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41216>
2026-04-28 17:46:48 +00:00
Christian Gmeiner
7d59c62fde panvk: Wire up VK_EXT_conservative_rasterization on v11+
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Mali >= v11 has a Conservative Rast Mode field in DCD Flags 0 with
values Disabled and Over Estimate. Wire it to vk_runtime's
rasterization state and expose the extension on PAN_ARCH >= 11, with
caps restricted to overestimate only — HW has no underestimate value
and no overestimation-size granularity.

On v11-v13, degenerate triangles produce a wrong fragment w when
overestimate is enabled, so cull_zero_area is forced on alongside
the mode bit and degenerateTrianglesRasterized is reported as false.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41189>
2026-04-28 09:34:28 +02:00
Erik Faye-Lund
4f2de63a27 pan/ci: add a flake from nightly
This failed here:
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/98061409

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41121>
2026-04-27 09:27:02 +00:00
Faith Ekstrand
9c8e8ed655 panvk/csf: Emit INDEX_BUFFER[_SIZE] even for non-indexed draws
The index buffer and index buffer size don't affect whether or not we're
actually doing indexed rendering so we should just emit them whenever
they change.  Otherwise, if someone sets an index buffer and then does a
non-indexed draw and then an indexed draw, the first draw will clear the
dirty bits without setting the index buffer registers and the second
draw won't know to re-emit them.

Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Marc Alcala Prieto <marc.alcalaprieto@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40997>
2026-04-27 08:40:43 +00:00
Christian Gmeiner
3d7d2115f8 panvk: Implement vkCmdFillBuffer with panlib kernels
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Replace the vk_meta_fill_buffer call with direct panlib precomp
dispatches: a KERNEL(32) uint4 bulk path for 16-byte-aligned fills and a
KERNEL(32) uint32 path otherwise, each with a KERNEL(1) scalar tail for
sub-workgroup remainders.

gpu-ratemeter vk.bufbw on Mali-G610 MC4 shows a 1.15-1.18x median
speedup across alignment classes and roughly 5x on fills <= 512 B,
thanks to the removed pipeline bind / descriptor-set setup that
vk_meta_fill_buffer pays per call.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41079>
2026-04-27 08:19:20 +00:00
Yiwei Zhang
0b99d1db0b panvk: adopt common ANB helpers
Below are adopted:
- vk_android_import_anb
- vk_android_import_anb_memory

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41145>
2026-04-24 16:25:36 +00:00
Christian Gmeiner
aed60946a1 panvk: Advertise VK_EXT_dynamic_rendering_unused_attachments
The Vulkan runtime and panvk already handle unused attachments
correctly. Enable the extension and feature flags.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40920>
2026-04-24 07:09:33 +00:00
Valentine Burley
8d4fb52919 panvk: Use vk_android deferred image helper
Switch to using the new helper.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40635>
2026-04-23 21:21:31 +00:00
Christoph Pillmayer
b7f9974f3e pan/bi: Fix format in bi_repair_ssa
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40969>
2026-04-23 15:49:48 +00:00
Christoph Pillmayer
fcfc580f67 pan/bi: Fix source swizzle in bi_repair_ssa
Repairing SSA was creating invalid PHI nodes with source swizzles !=
BI_SWIZZLE_H01. PHI sources can't have non-identity swizzles.

In most cases the repair logic only replaces sources, in which case the
swizzle is taken from the old source that is getting replaced. However,
in add_phi_operands there is no old source because the phi is new, and
so the result from resolve_read is assigned directly. This falsely
carries over the destination swizzle to the source.

Since it never makes sense for resolve_read to carry over the swizzle
from the instruction writing the value, we can make it so that
resolve_read always returns the identity swizzle on indices.
resolve_read returns one of:
- An index stored by record_write
- An index created by bi_temp_like
- The result of a recursive resolve_read call
bi_temp_like already correctly sets the swizzle to H01. Setting it in
record_write leads to both base cases returning the desired swizzle.

Fixes: dd94d183 ("pan/bi: Fixup bi_repair_ssa.c for bi")
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40969>
2026-04-23 15:49:47 +00:00
Lars-Ivar Hesselberg Simonsen
82592433e6 panvk: Fix debug flag overlap
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
PANVK_DEBUG_HSR_PREPASS and PANVK_DEBUG_NO_EXTENDED_VA_RANGE have the
same value, meaning they both get toggled when one is.

This commit moves PANVK_DEBUG_HSR_PREPASS to the following value.

Fixes: 2d9be41706 ("panvk/v13: Support HSR Prepass")
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41106>
2026-04-22 15:14:23 +00:00
Lars-Ivar Hesselberg Simonsen
98c298cf4d pan/va/disasm: Align indentation
The disassembly file had a lot of inconsitencies in indentation, so
align on the standard IndentWidth: 3

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41062>
2026-04-22 08:31:01 +00:00
Lars-Ivar Hesselberg Simonsen
17f1a2c184 pan/va/disasm: Align FAU printing
The current implementation prints FAU entries as 32-bit entries. While
this works, it does not align with the DDK.

Rather than treating FAU as a set of 32-bit entries, treat is as 64-bit
entries that can be split in two words.

This aligns with the DDK and has allows for differentiating 32-bit and
64-bit reads based on whether a word modifier is used.

Finally, add entry values to FAU printing to easily look up specific
reads.

For example:

Vertex FAU @ffd93950:
  43000000 43000000
  3F800000 43000000
  43000000 00000000
  C7000000 47000000
  00000001 00000000

FMAX.f32 r3, r3^, u6
FMIN.f32 r3, r3^, u7

vs

Vertex FAU @ffd93950:
u0  43000000 43000000
u1  3F800000 43000000
u2  43000000 00000000
u3  C7000000 47000000
u4  00000001 00000000

FMAX.f32 r3, r3^, u3.w0
FMIN.f32 r3, r3^, u3.w1

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41062>
2026-04-22 08:31:01 +00:00
Lars-Ivar Hesselberg Simonsen
829eafa076 pan/va/disasm: Print 64 bit src/dest regs as reg pairs
This makes it clear that both registers are read/written, and aligns
with DDK disassembly.

For example:

STORE.i128.istream.slot2.reconverge @r0:r1:r2:r3, r4^, offset:0
vs
STORE.i128.istream.slot2.reconverge @r0:r1:r2:r3, [r4^:r5^], offset:0

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41062>
2026-04-22 08:31:01 +00:00
Lars-Ivar Hesselberg Simonsen
9f049032be pan/genxml: Print shader hex in trace for Valhall
Enable verbose disassembly for Valhall in traces, which adds hex values
to shader printing. Useful for debugging.

For example:

Shader 0xffffbe3ec000 (GPU VA ffdd3000) sz 16384
   LD_ATTR_IMM.v4.f32.slot0.wait0 @r0:r1:r2:r3, r60^, r61^, index:0x0, table:0x0
   FRCP.f32 r3, r3^
   FMAX.f32 r3, r3^, u6

vs

Shader 0xffffa8bf7000 (GPU VA ffdd3000) sz 16384
7c 7d 00 32 08 80 66 08    LD_ATTR_IMM.v4.f32.slot0.wait0 @r0:r1:r2:r3, r60^, r61^, index:0x0, table:0x0
43 00 00 00 00 c3 9c 00    FRCP.f32 r3, r3^
43 86 03 00 00 c3 a4 00    FMAX.f32 r3, r3^, u6

Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41062>
2026-04-22 08:31:00 +00:00
Samuel Pitoiset
9d17a7bdb4 spirv,treewide: rework specialization constant
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
With SPV_KHR_constant_data, it's allowed to specialize array of
constants.

RustiCL changes are from Karol Herbst <kherbst@redhat.com>.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41046>
2026-04-22 06:57:55 +00:00
Erik Faye-Lund
c8ae72f51d panvk: do not enable extension without required feature
The Vulkan spec states that if VK_KHR_shader_clock is supported,
shaderSubgroupClock is a required feature. So let's not enable that
extension unless we can...

Fixes: e9c2c32409 ("panvk: enable VK_KHR_shader_clock")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40944>
2026-04-20 17:36:20 +00:00
Erik Faye-Lund
8cb89853b8 panvk: do not enable extension without required feature
The Vulkan spec states that if VK_ARM_shader_core_builtins is supported,
shaderCoreBuiltins is a required feature. So let's not enable that
extension unless we can...

Fixes: dff1d91c64 ("panvk: Enable VK_ARM_shader_core_builtins")
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40944>
2026-04-20 17:36:20 +00:00
Erik Faye-Lund
36983b50fe panvk: increase maxBufferSize on v11 and later
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The HW supports larger buffer-sizes on v11 and later, so let's bump
this up.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40999>
2026-04-20 16:00:56 +00:00
Erik Faye-Lund
bd2646482b panvk: increase maxResourceSize on v11 and later
The HW supports larger resource-sizes on v11 and later, so let's bump
this up. But since we have a knob to limit the usable VA space, we need
to take that into account here as well.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40999>
2026-04-20 16:00:56 +00:00
Erik Faye-Lund
f3d3102143 panvk: do not artificially limit image dimensions
While we used to need this, we no longer do, thanks to handling
maxResourceSize. From the Vulkan spec:

> If the size of the resultant image would exceed maxResourceSize,
> then vkCreateImage must fail and return VK_ERROR_OUT_OF_DEVICE_MEMORY.
> This failure may occur even when all image creation parameters satisfy
> their valid usage requirements.

Handling of this was added in 86068ad1ee ("panvk: implement sparse
resources"), so we no longer need to make sure of this when reporting
the limits.

The hardware-fields for these are 16 bits, so let's allow the full range
for all of these.

This is effectively a revert of e25a91d919 ("panvk: Lower
maxImageDimension{2D,3D,Cube} to match the HW caps").

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40999>
2026-04-20 16:00:56 +00:00
Erik Faye-Lund
d76e4f6054 pan/lib: validate data_size_B in drivers
In order to be able to properly check for maxResourceSize on Vulkan, we
need to be able to report the size even for resources that overflow that
limit. Otherwise we end up failing to find a usable modifier rather than
properly report the problem to the application. This means we need to move
the check out of the mod-handler.

There's no need to validate the slice-stride. The reason is a little bit
complicated, but we have two possible cases:

1. V10 and before: the image-size and the slice-stride are both limited
   to UINT32_MAX. Since the image-size is always at least as large as the
   slice-stride, it's enough to check the image-stride.
2. V11 and later: 37 bits is large enough to store any valid
   slice-stride. The only way we could blow this one up, would be to
   pass out-of-range width or height, which is already either validated
   by higher-level logic (gallium) or UB (vulkan). This is important,
   because we don't have another mandate to reject large resources on
   Vulkan; we can only reject due to maxResourceSize, not an individual
   plane.

So let's move this out to the call-site. We don't need to do anything
for PanVK, becuase it already checks for maxResourceSize.

To keep the Gallium and Vulkan driver as similar as reasonably possible,
check against the whole resource even in Gallium, where we could have
gotten away with checking a plane at the time instead.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40999>
2026-04-20 16:00:56 +00:00
Erik Faye-Lund
57a80ff78c pan/lib: emit high bits of buffer-size
We can't expose large texel-buffers if we don't emit the high bits.
Whoopsie!

Fixes: 4db7958edc ("pan/bi: Change texel buffer limits")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40999>
2026-04-20 16:00:56 +00:00
Erik Faye-Lund
69b8372fbf pan/lib: fix up afbc and linear layout
A few cases of UINT32_MAX were missed, whoops.

Fixes: c2c91e78fd ("pan/layout: Allow bigger size/surface stride on v12+")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40999>
2026-04-20 16:00:55 +00:00
Christian Gmeiner
917c3dc77a panvk: Advertise VK_EXT_shader_uniform_buffer_unsized_array
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The extension permits a SPIR-V OpTypeRuntimeArray as the trailing
member of a UBO block.

panvk's compiler path handles this correctly without changes: UBO
access goes through nir_lower_explicit_io with address formats that
carry no compile-time size (bounds are enforced by the hardware UBO
descriptor at runtime), so a runtime array inside a UBO is
indistinguishable from any other dynamically-indexed UBO access.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40963>
2026-04-20 12:10:19 +02:00
Erik Faye-Lund
ebe4a56650 panvk: use perf-trilinear when doing anisotropic sampling
This should be faster, and matches what the DDK does.

Reviewed-by: Marc Alcala Prieto <marc.alcalaprieto@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40869>
2026-04-17 12:52:17 +00:00
Marc Alcala Prieto
a073fb193e pan/genxml: Add performance-trilinear enum values
The HW supports these, so let's define the enum values.

Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40869>
2026-04-17 12:52:17 +00:00
Ryan Zhang
62e7120384 panvk: add VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL to host copy layouts
Add the missing layout which do not need implemented anything in
mali gpu.

Fixed: dEQP-VK.image.host_image_copy.properties.properties
unifiedImageLayouts feature is supported, but layout
VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL was not included in
VkPhysicalDeviceHostImageCopyProperties::pCopySrcLayouts.

Fixes: 1cd61ee ("panvk: implement VK_EXT_host_image_copy for linear color images")

Signed-off-by: Ryan Zhang <ryan.zhang@nxp.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40899>
2026-04-17 10:19:44 +00:00
Erik Faye-Lund
f137207108 panvk: drop out-of-date TODO
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We already did this, so let's drop this TODO.

Fixes: d36e6af329 ("panvk: Bump the max image size on v11+")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40990>
2026-04-16 11:21:48 +00:00
Christian Gmeiner
713cecb1df panvk: Advertise VK_EXT_rgba10x6_formats
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Map X6R10X6G10X6B10X6A10_UNORM to the native R10X6G10X6B10X6A10X6_UNORM
HW format on PAN_ARCH >= 11 where it is supported.

Enable the extension with formatRgba10x6WithoutYCbCrSampler in the
physical device, allowing VK_FORMAT_R10X6G10X6B10X6A10X6_UNORM_4PACK16
to be used as a regular color format without YCbCr sampler conversion.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40653>
2026-04-15 12:16:53 +00:00
Lorenzo Rossi
7ccca9f972 pan/compiler: Document compilation pipeline expectations
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40844>
2026-04-15 10:32:19 +00:00
Lorenzo Rossi
43ba475d4c panfrost,panvk: Move lower_texture_early inside preproc
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40844>
2026-04-15 10:32:19 +00:00
Lorenzo Rossi
e24228e327 panfrost,panvk: Move lower_texture_late inside postproc
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40844>
2026-04-15 10:32:19 +00:00
Lorenzo Rossi
eafc822dbd panfrost,panvk: Move postprocess near shader_compile
Ideally there should be only sysval lowering in the middle.

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40844>
2026-04-15 10:32:18 +00:00
Lorenzo Rossi
83fd45aa5a pan/compiler: Fix noperspective int varyings
Ints and floats do not need to match between VS and FS, some crazy
shaders might write an uint from the VS and read a noperspective float
from the FS.  There will be new tests in the conformance tests that
check that too shortly.

Is this a performance regression? yes.
Can we fix this easily? No, we'll need dynamic prolog/epilog linking.

Since maybe_noperspective is almost useless after this fix, the whole
logic has been removed

Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40844>
2026-04-15 10:32:18 +00:00