Required for SM6.6 in vkd3d-proton and used in a number of UE5 titles.
From descriptor side R64 images are R32G32_UINT, and to get storage_descriptor
we have to move early-return if format doesn't support rendering after
storage_descriptor setup.
Passes vkd3d-proton test:
test_shader_sm66_64bit_atomics
CTS tests:
dEQP-VK.image.atomic_operations.*.r64*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39932>
The Vulkan runtime provides the dynamic state infrastructure via
vk_common_CmdSetAttachmentFeedbackLoopEnableEXT(). This builds on the
attachment feedback loop layout support.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40498>
PanVK treats image layouts as no-ops and already disables Forward Pixel
Kill when the same render target is both read and written.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40498>
The LLVM backend is unmaintained. Let's not encourage users to swap out
entire parts of the driver with an unsupported codepath. Enabling this
option is a footgun nowadays anyway, given that it disables many
features and thus may trigger bigger changes in behavior than intended.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40815>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenz.ca>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40617>
Certain games tend to use rendering patterns that strongly prefer
one mode over the other, and thus we're better off not bothering
with profiling them.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
This introduces a new option that makes autotune optimize for low
preemption latency which is crucial to ensure responsiveness on
systems with GPU-based composition. A large enough draw can entirely
block the compositor from running with draw-level preemption, this can
be mitigated by preferring to use GMEM which breaks up the draw into
smaller pieces and generally has a lower latency for preemption.
As a further mitigation, tiles in GMEM are then divided into smaller
and smaller pieces which lowers the non-preemptible duration. There
are static checks in place to avoid doing this when it would incur a
cost that is too large.
Uses performance counters read during ambles to detect preemption
latency events while rendering in SYSMEM. This approach is superior
to using RBBM draw time thresholds which could be imprecise as only
the average was calculated rather than true maximum draw time.
However, converting the preemption latency performance counter value
from CP ticks to wall clock is based on the average GPU frequency of
the whole period from the start of the RP until the switch-away amble
while the preemption latency stars counting from the request. Thus, if
the GPU frequency shifts rapidly throughout the RP, it may cause the
estimated wall clock time to be inaccurate, but it should be good enough
in the vast majority of cases.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Co-authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
Tuning these small renderpasses is difficult due to their high
variability across command buffers and low impact on overall performance
in most cases. This change disables autotuning for renderpasses with 5
or fewer draw calls unless the TUNE_SMALL modifier flag is explicitly
set.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
This algo measures the time taken by each RP as a whole, and uses that
to move a probability distribution of whether to use GMEM or SYSMEM for
that RP. This is done with a delta of 5% per run, and the probability is
clamped to 5% and 95% to avoid getting stuck when conditions change.
Additionally, an "immediate resolve" variant which tries to work off a
single data point in SYSMEM and GMEM, then immediately resolves to the
faster path. This is useful for usage in CI which runs a single frame
multiple times where the performance isn't varying change from frame to
frame.
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37802>
Make it possible to trigger a emulated GPU reset by creating a file at
the path of the `LP_CONTEXT_RESET_FILE` env var. All llvmpipe contexts
using the `LOSE_CONTEXT_ON_RESET` strategy that where created before
the that file will start returning `PIPE_UNKNOWN_CONTEXT_RESET` in
`get_device_reset_status()`, while newer contexts - most notably those
created by apps in reaction to the device reset - continue to return
`PIPE_NO_RESET`.
Note that, because we're dealing with files, rtime is used instead of
mtime as the later
Usage:
```
LP_CONTEXT_RESET_FILE=~/llvmpipe_reset LIBGL_ALWAYS_SOFTWARE=1
someapp
touch ~/llvmpipe_reset
```
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40681>
The extension is implemented in shared Vulkan/WSI code and
not driver specific. The underlying kms driver needs to
support HDR metadata signalling on the drm connector, which
vc4 kms does for VideoCore 5 and later since April 2021.
Successfully tested on RaspberryPi 4/400 with a HDR-10
capable HDMI monitor.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40696>
These extensions are implemented in shared Vulkan/WSI code and
not driver specific. A Vulkan driver just needs to support
VK_KHR_timeline_semaphore, which v3dv already supports via
emulated timeline semaphores since April 2022.
Successfully tested on RaspberryPi 4/400.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40696>
Enable the VK_KHR_shader_untyped_pointers extension and the
shaderUntypedPointers feature for Valhall and newer (v9+).
Bifrost (v6/v7) has issues with 8-bit vector loads through untyped
pointers combined with 16-bit storage, so restrict the extension
to architectures where it's fully functional.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40457>
Resizing the buffer isn't possible for per-submit captures and it
doesn't make sense either. This introduces a new envvar called
RADV_CACHE_COUNTERS_BUFFER_SIZE to increase the SPM BO size if needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40535>
VK_EXT_non_seamless_cube_map was implemented by pvr previously to fix
some GLES2 CTS failures on Zink (because the driver is not yet ready
to advertise GLES3 on Zink).
As it was thought as a fix, the document bit is missing, although a new
extension is surely advertised.
Add the missing document bit.
Fixes: 71880a2911 ("pvr: support VK_EXT_non_seamless_cube_map")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40699>
Abusing RADV_PERFTEST for experimental features doesn't make real
sense, and I think we should stop doing that.
The existing RADV_PERFTEST options like RADV_PERFTEST=transfer_queue
still exists but they are marked as deprecated, they will be removed
in future Mesa releases.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40646>
Expose float32 atomic exchange support for buffer, shared, and image
operations on all architectures. The existing axchg instruction is
type-agnostic, so no compiler changes are needed. Image atomics are
already lowered to global atomics via nir_lower_image_atomics_to_global.
Also add R32_FLOAT to the STORAGE_IMAGE_ATOMIC format feature flag so
image atomic operations are accepted for r32f images.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40506>
The list in the documentation still doesn’t go higher than v10, and it
isn’t clear from that list of GPU IDs which one actually corresponds to
the newer generations, but at least users can test them.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39564>
The global BO list for app allocations has been enabled by default
since Mesa 25.3 and we didn't find any blockers, so let's make it the
default for real. Note that vkd3d-proton and Zink always used that
path and DXVK started to use it in August 2025 after requiring BDA.
This removes RADV_DEBUG=nobolist which was added only for debugging
purposes since the global BO list was enabled by default for app
allocations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40466>
These are already implemented by common code, so there's nothing to be
done here, really.
A few tests fail due to timeouts. But this seems no different than on
other drivers, we just skip less WSI tests than most drivers does. Skip
those for now.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40502>
Allow adjusting the location of RD dumps and trigger file through the
FD_RD_DUMP_PATH environment variable. When not present, the existing
defaults will be used.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40532>
Enable the extension and feature bits on CSF (v10+). Map the conditional
rendering pipeline stage to all three subqueues for barrier handling.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40452>