In order to assign bind_ops[i].syncs a slice of the sync_ops array,
op_sync_cnt must record the exact number sync operations for that vm_bind
operation, so that &sync_ops[syncop_ptr - op_sync_cnt] will give us the
right start of its slice.
Fixes: 97f6a62f7e ("pan/kmod: Add a backend for panthor")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41274>
It has been available in the Panthor KMD since 1.5
Fixes: 590ad83b98 ("panfrost: Use pan_image_test_modifier_with_format() to do our modifier check")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41274>
When using ANDROID_external_format_resolve if a valid
colorAttachmentFormat is returned by
VkAndroidHardwareBufferFormatResolvePropertiesAndroid, the
formatFeatures should include VK_FORMAT_FEATURE_COLOR_ATTACHMENT_BIT so
strict applications can use the buffer as a render target.
Signed-off-by: Allen Ballway <ballway@chromium.org>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41142>
Now that lower_noperspective_fs and varying collection are closer
together we can merge nopersp collection in lower_noperspective_fs
without fear of desyncrhonization, making everything also a bit cleaner.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
Now that we removed a lot of upcoming bugs using time-travel, we can
reorders the passes in postprocess to be more in-line with modern
compilers. We also lift a lot of passes from compile_shader_nir into
postprocess.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Co-authored-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
The shader will be optimized a few passes later in preprocess, this way
we can have the same pipeline as in Gallium
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
This is needed if we want postprocess to decide IDVS and layout later in
the series
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
Before this commit, all scartch memory was allocated in 16-byte chunks
and indirect references where always lowered into if-else trees. This
patch tries to clean this up a little bit, by using a more compact layout
that is still TLS friendly, allowing indirect accesses and only lowering
them for optimizations and using the newer nir_lower_explicit_io.
The patches should improve performance on some shaders, but lifts a lot
of dust off the compiler uncovering some new bugs. They have been kept
at bay by disabling local memory vectorization.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
Only caused problems when the VS/FS has more TLS than our internal shaders
that doesn't usually happen but will cause bugs when we start to
compress local memory.
Fixes: 005703e5b5 ("panvk: Move TLS preparation logic to cmd_dispatch_prepare_tls")
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
Reorders the preprocess passes to be more in-line with modern compilers.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Co-authored-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
This makes panfrost IO lowering similar to panvk and avoids subtle
bugs later in the series.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
panfrost has float16 point size, handling that precision too allows the
compiler to call lower_point_size later in the compilation pipeline
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
This way the pass does not depend on lower_ssbo anymore
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
Using OpenCL size/alignment requirements we might get some types
with a size bigger than their alignment. This breaks the current TLS
load/stores that expect 16-byte alignment for 16-byte load/stores. This
problem probably hasn't surfaced yet because we reassigned OpenCL scratch
in 16-byte slots, but will break if we compact the layout.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
A common memory swap operation might be compiled as:
%v1 = LOAD %a1 # L1
%v2 = LOAD %a2 # L2
STORE %v2, %a1 # S1
STORE %v1, %a2 # S2
The current pressure scheduler just records the last load/store
operation for dependencies, thus the dependency chain becomes L2 -> S1
-> S2. The compiler might thus reorder them as L2, S1, L1, S2, i.e
# L1:
%v2 = LOAD %a2 # L2 |
STORE %v2, %a1 # S1 |
%v1 = LOAD %a1 # L1<-
STORE %v1, %a2 # S2
This is incorrect as S1 depends on L1 too. The fix makes all loads also
depend on each other, restricting load reordering. The proper fix that
NAK has is to track all loads and make each store depend on every load,
building a more correct DAG. This doesn't matter as much in panfrost
since all loads are serialized by the scoreboard. We might still want
to implement it for register pressure in the future.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40924>
We just need to handle it while translating NIR to BIR, the hardware can
do automatic widening to 32-bits.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41096>
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist. They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.
In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup. Remove/Update some stale
comments since we are here.
Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
NVK is only going to use it for `fmul_rtz(frcp(ipa), ipa)` patterns, so
try not too hard to optimize this.
Totals from 10 (0.00% of 1212873) affected shaders:
CodeSize: 34480 -> 34288 (-0.56%); split: -0.60%, +0.05%
Static cycle count: 6225 -> 6132 (-1.49%); split: -1.57%, +0.08%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41179>
This contains new tests for DGC+multiview which are valid in DX12
but invalid in Vulkan, unless RADV allows support for it. Important
to have coverage for us because it's used for Crimson Desert.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41193>
V3D advertises maxComputeWorkGroupInvocations = 256 but ggml-vulkan
in many cases ignores this limit an creates compute pipelines with
over this limit. Although this is a bug in the application we can
take advantage of nir_lower_workgroup_size and make the application
work.
This issue was causing an assertion failure at nir_to_vir.c:
assert(c->local_invocation_index_bits <= 8);
The solution is lowering the oversized workgroups to a 256-invocation
workgroup loop, like radv and radeonsi are doing on GFX7, by running
nir_lower_workgroup_size(256) for this scenario.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
Currently local shared memory is backed by a BO that is read/written
using the TMU.
ggml-vulkan probes the size of maxComputeSharedMemorySize and rejects
V3DV (falling back to CPU) when the value is below what its larger
compute pipelines request, although in the end the shaders ollama
runs don't actually use shared memory.
32 KB is what ggml-vulkan demands; the value can grow further with no
real per-op cost since shared memory currently goes through the TMU
like any other BO.
V3D OpenGL driver also has 32 KB for SharedMemory.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257>
`__builtin_ctz` is not well defined for `0`, so provide a default value
of `0` for this case. The other sensible choice would be `64`, but that
does not fit in the 5 bit `divisor_r` field (which is how I noticed this,
we were triggering a run time assert in a debug build).
We should skip `launch_draw` if there are no vertices to draw.
This avoids a crash in some indirect rendering tests on Bifrost.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41117>
The combination of nir_opt_if and nir_lower_undef_to_zero running inside
the optimization loop could make it to not converge.
This was exercised by ollama running gemma3 compute shaders.
Removing the pass from the optimization loop results in No changes in
shader-db.
Assisted-by: Claude Opus 4.6
Fixes: cbe24a0e9c ("broadcom/compiler: use nir_lower_undef_to_zero")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41256>