This will signal shaders that require concurrent workgroup dispatch,
until we get a proper Vulkan extension.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41562>
It seems we weren't actually using the opcode, but be consistent with
the other place we call OpExtInst handlers.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41562>
According to the SPIR-V spec OpExtInst cannot appear before types,
constants, and global variable declarations. We were handling it anyway,
which is wrong.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41562>
This encapsulates the forward progress guarantee required by PRAGMATA,
the so-called "occupancy bounded execution" over workgroups. On
Adreno we need to be aware of this and compile the shader differently.
There isn't yet a Vulkan extension for this, so we will set this via a
hack in coordination with vkd3d-proton.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41562>
Convert the two nightly panfrost trace replay jobs to @anholt's new GPU
trace snapshot comparison tool.
This allows running a few traces on t860 that couldn't be replayed
before.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42018>
With the trace plumbing in place, fill in the wrappers from
gl_and_es_API.xml so the trace tracks new entrypoints.
Generated-by: Claude
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41147>
Reintroduce VERBOSE_API as a bit that will cause a parallel trace
dispatch table to be installed at context create. The table itself is
populated in a follow-up commit. This commit only wires the TLS-publish
indirection via _mesa_set_dispatch(..).
With Trace NULL (the common case), the helper is equivalent to a plain
_mesa_glapi_set_dispatch — no behaviour change yet.
Assisted-by: Claude
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41147>
This covers some drivers which expose KHR_display and EXT_present_timing.
Based on Emma Anholt's work from 2025, rebased on current Mesa 26.2-devel,
tiny compile fixes and docs/features updates by Mario Kleiner.
See MR 38472 for reference of Emma's work, based on Keith's work.
Tested locally on AMD Polaris for radv, Intel Kabylake for anv, and on
Mesa CI's VK-CTS VK_GOOGLE_display_timing test case for AMD radv,
Intel anv, Qualcomm Adreno tu.
Original code of Emma is
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Update of docs/features.txt + new_features.txt updates is
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
This adds the common plumbing and support for the VK_GOOGLE_display_timing
extension. A followup commit will enable it for KHR_display. It should
also optionally work for suitable other backends like Wayland and X11
on suitable Wayland and X11 servers, if those servers and backends
mostly support VK_EXT_present_timing (minus the relative timing support,
which is not needed for this extension to work). However, fully conformant
use on Wayland or X11 is not possible, as the extension lacks the ability
to report per VKSurface capabilities wrt. timing. Therefore the extension
should only be enabled for Wayland or X11 via explicit opt-in, not by
default.
The extension provides two things:
1) Detailed information about when frames are displayed, including
slack time between GPU execution and display frame.
2) Absolute time control over swapchain queue processing. This allows
the application to request frames be displayed at specific
absolute times, using the same timebase as that used in 1).
It is implemented on top of the VK_EXT_present_timing extension
infrastructure.
This code is inspired by Emma Anholt's work from late 2025, which itself
is based on Keith Packard's original work from 2018. Only a few lines of
their code is left though after an almost complete rewrite on top of
EXT_present_timing. Specifically calculation of .earliestPresentTime
and .presentMargin in fixed refresh rate (FRR) mode is based on Keith
original math, and the followup commit for driver enable is a modified
version of Emma's commit.
See MR 38472 for reference of Emma's work, based on Keith's work.
The final implementation as a whole is so far successfully tested on top
of an AMD Polaris gpu (radv), a Intel Kabylake gpu (anv), and Mesa CI
for direct display mode on AMD radv, Intel anv, and Qualcomm Adreno turnip.
Both VK_EXT_present_timing and VK_GOOGLE_display_timing can be enabled
at the same time on a VkDevice, but only one of the extensions can be
used on a given swapchain for that device. If both extensions are enabled
on a device and VK_EXT_present_timing is requested on some swapchains, it
will be used on those swapchains, whereas on all other swapchains the
VK_GOOGLE_display_timing will be used.
On drivers which don't support queue timestamps, reported values for
earliestPresentTime are identical to actualPresentTime, and presentMargin
is reported as zero, which is a reasonable fallback behaviour. Currently
drivers with this limitation would be pvr, panvk and kk.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
This is a tiny simplification, which should not have any practical
differences, except for a tiny bit more simple enablement of
VK_GOOGLE_display_timing in a followup commit, and dropping one
line of code.
The time_domain can always be assigned, even if present_timing
is not enabled (and neither is GOOGLE_display_timing), because
1. The field isn't used if none of these extensions is enabled.
2. The field would default to a valid initial value of zero ==
VK_TIME_DOMAIN_DEVICE_KHR anyway.
3. Even if wp_presentation_feedback is unavailable, and therefore
presentation_clock_id has an "unknown clock" value of -1, the
mapping through clock_id_to_vk_time_domain(-1) would again map
to VK_TIME_DOMAIN_DEVICE_KHR.
Iow. with or without the dropped if statement, the field gets a
nominally valid value, and also a value that does not get used
can do no harm.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
wsi_swapchain_present_timing_sample_query_pool() queries the timestamp
queue_done_time, corresponding to VK_EXT_present_timing's present stage
VK_PRESENT_STAGE_QUEUE_OPERATIONS_END_BIT_EXT.
The time domain of returned timestamps depends on the number of
available bits for queue timestamps. The same timestamp is needed
as input to calculate timestamps and headroom for some elements
returned by VK_GOOGLE_display_timing, but that extension always
requires timestamps in the host time domain. To allow use of this
function also for a future implementation of GOOGLE_display_timing
in a followup commit, add a flag that asks to always return queue
timestamps in the host time domain.
The flag is set to false to keep current behaviour for use by
VK_EXT_present_timing.
Also clamp a returned queue_done_time in host time domain to the
provided upper_bound, as useful for upcoming GOOGLE_display_timing.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Spec allows to request a present at a specific target time or duration
without actually storing + querying any present records about completion
time. Iow. it allows VkPresentTimingInfoEXT.presentStageQueries == 0.
In this case, skip allocation and processing of a timing history record,
but still assign a VkPresentTimingInfoEXT.targetTime for timed present.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 47d69664d8 ("vulkan/wsi: Add common infrastructure for EXT_present_timing.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
- Queueing a present with VkPresentTimingsInfoEXT in the .pNext chain of
VkPresentInfoKHR, but VkPresentTimingsInfoEXT.pTimingInfos == NULL is
allowed and must not crash, just no-op.
- VkPresentTimingInfoEXT.targetTime == 0 means to ignore targetTime and
to simply present as soon as possible. This is achieved by setting
info->targetTime == 0 ==> target_time = 0. Make sure target_time stays
also 0 if targetTimeDomainPresentStage is set to
VK_PRESENT_STAGE_QUEUE_OPERATIONS_END_BIT_EXT, ie. skip the device->cpu
conversion via wsi_swapchain_present_convert_device_to_cpu(), as that
might map a zero info->targetTime device time to a non zero cpu
target_time.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 47d69664d8 ("vulkan/wsi: Add common infrastructure for EXT_present_timing.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Some hw + kms driver combos do not support vblank related functions
at all, ie. no drmCrtc[Get/Queue]Sequence() ioctl, no crtc sequence
events, no vblank of pageflip completion reported in pageflip events.
Most notable under the present_timing supported Vulkan drivers is
Asahi Linux on Apple Silicon Macs, with no such support: Only pageflip
events with a valid flip timestamp are supported.
To deal with this, we detect lack of vblank support and instead
use the current "vrr timing" path, which doesn't use vblanks, but
absolute time and timed waits. This also required a slight restructuring
of the setup logic.
Also fix semantics of requested relative timed presents via
VK_PRESENT_TIMING_INFO_PRESENT_AT_RELATIVE_TIME_BIT_EXT. The
spec states that the given target time should be relative to
the most recently presented image on a swapchain, and that if
no such image was presented yet (during the first present on
a swapchain), the relative target present time should be ignored.
Take care of this by tracking vblank count and time of the most
recent completed swapchain present separately from the most recent
known vblank count and time of the connector. Choose the swapchain
most recent present vblank data as baseline for relative timed
presents, to optimally implement spec semantics, but the connectors
vblank data for absolute timed presents to minimize rounding errors
and drift when converting between time and vblank cycle counts.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Both local testing and the Mesa CI's VK-CTS VK_GOOGLE_display_timing
test cases show some oddity of amdgpu-kms driven AMD gpu's wrt.
VK_EXT_present_timing and upcoming VK_GOOGLE_display_timing:
The very first present (atomic commit / pageflip) after a full
modeset commit which turns the associated output / connector from
fully off to on (powering up the display hw) will not send a
regular pageflip completion event, but instead send a pageflip
completion event after display hw programming is completed, with
all-zero vblank sequence count and present timestamp. This would
cause invalid timestamps for this very first present reported to
clients, and trips up the VK-CTS VK_GOOGLE_display_timing conformance
tests, because the first present is signalled as completed before it
was even queued. This failure can be observed with AMD gpu's in the
CI, but not with Intel or Qualcomm gpu CI, where CTS is successful.
Note this quirk doesn't happen for regular modesets on an already
running output, ie. one with at least one active hw plane. It does
happen for the CTS, as it seems to start from a powered off output.
Work around this AMD quirk:
1. Detect a pageflip event with all zero frame count and timestamp.
2. Try to query the count and timestamp of the most recent vblank,
as a likely good substitute for the "completed" pageflip, given
that pageflip and vblank counts and timestamps must always match
for the vblank of actual flip completion.
3. If the query should fail or also report non-sensical values, e.g.,
completed before queued, fall back to current system time as a
ok'ish result. Note that during my local testing on AMD Polaris11
with DCE-11.2 display engine this 3rd case was not ever observed,
and 2 did a good job. This is just a fallback for the fallback.
For reference, after digging through lots of amdgpu DC Linux source
code, the relevant decision code for deciding for a regular pageflip
event dispatched from the pageflip completion interrupt handler is
to be found by searching for the call site of the function
prepare_flip_isr(). The fallback code for the special "full modeset
to power on the display engine and skip regular pageflip event" is
the call site of the function drm_send_event_locked().
Successfully tested on AMD Polaris 11, DCE 11.2 display engine, and
also by Mesa CI's VK-CTS VK_GOOGLE_display_timing test cases for
direct display mode on AMD gpu's.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
This improves reliability of VK_EXT_present_timing on wsi/display
and should be backported to Mesa 26.1-rc.
When latching connector->last_nsec timestamps from either a
drmCrtcGetSequence() query, or from a vblank sequence event
timestamp extracted as part of wsi_display_sequence_handler()
-> wsi_display_fence_event_handler() call sequence, the vblank
timestamps are in nanosecond precision/granularity, whereas
latched timestamps from a pageflip completion event in a call
to wsi_display_page_flip_handler2() are in increments of 1000
nsecs, based on microsecond precision/granularity timestamps.
All timestamp sources are based on the same DRM/KMS timestamps,
bit the different interfaces/api's expose those in different
precision.
This could cause a connector timestamp from the sequence path
in nanoseconds to be overwritten by a new timestamp from a
pageflip completion event that is truncated down to the next
lowest microsecond, causing time in connector->last_nsec to
go backwards by up to 999 nsecs. A MAX2 operator prevents
this.
Additionally, this also updates connector->last_nsec from a
successful Vulkan client call to vkGetSwapchainCounterEXT(),
allowing for a potentially more recent and thereby accurate
connector->last_nsec timestamp to be used as baseline for
scheduling timed FRR presents via VK_GOOGLE_display_timing or
VK_EXT_present_timing.
This is an improvement originally made by Keith Packard in his
original VK_GOOGLE_display_timing KHR_Display implementation,
just forward ported by myself, adding a slightly more descriptive
comment in the code.
See MR 38472 for reference of Emma's work, based on Keith's work.
The code from the original commit was...
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Cc: mesa-stable
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Some wsi/display VK-CTS test cases, e.g., for VK_GOOGLE_display_timing, select
swapchain imageUsage flags which are incompatible with the color format
VK_FORMAT_B8G8R8A8_SRGB that was returned as the first ("default") swapchain
image color format by vulkan/wsi/display, but not properly validated for
compatibility by the CTS test cases. This ends badly - with a crash due to
assert(), also in Mesa's CI pipeline, e.g.,
../src/vulkan/wsi/wsi_common_drm.c:710: wsi_configure_native_image: Assertion
`!"Failed to find a supported modifier! This should never " "happen because
LINEAR should always be available"' failed.
Reorder VK_FORMAT_B8G8R8A8_UNORM into the first slot, as this is safe to use,
and make VK_FORMAT_B8G8R8A8_SRGB a safe second. This should be fine, as the
spec doesn't mandate VK_FORMAT_B8G8R8A8_SRGB or any specific format be first,
and vulkan/wsi/wayland regularly exposes other formats on various Wayland
compositors. The macOS Khronos MoltenVK Vulkan ICD also uses unorm first
ordering, as seem to do common MS-Windows Vulkan ICD's.
I assume that apps which really want to specifically test SRGB color formats
will explicitly select such a format, so no harm is done by reordering.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
In kgsl_bo_init(), tu_dump_bo_init() should be called for tu_bo after
it's initialized and before it's possibly mapped, since the mapping
can fail and cause kgsl_bo_finish() to call tu_dump_bo_del() for tu_bo
with an improperly initialized dump_bo_list_idx, leading to crashes.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41983>
si_create_context checks contexts that need recreation but only
destroy them rather than creating them. Creation now belongs to a
single function: si_get_aux_context.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
The shared sdma code used the "sdma_supports_compression" field from info
but radeonsi code still relied on gfx level checks.
Fixes: f5ecc5ffd5 ("ac,radv,radeonsi: add ac_emit_sdma_copy_tiled_sub_window()")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
If the vaapi application submits SPS with pic_width_in_luma_samples
not aligned to be divisible by 64, the driver overwrites it to an
aligned value. But if it does so, then it should also recalculate
the conformance window.
Example from real life: gstreamer vah265enc built with libva < 1.21
or vaapih265enc transcoding a video of width == 854
gst-launch-1.0 uridecodebin
uri=https://media.w3.org/2010/05/sintel/trailer.mp4 ! vaapih265enc !
filesink location=out.h265
The code uploads an SPS with pic_width_in_luma_samples == 864, and
the driver overwrites it to 896.
The conformance window provided in the SPS was 10 : 864 - 10 = 854.
So after encoding the output width results in a wrong value:
896 - 10 = 886
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41997>
Similar to WSI options.
It's possible to ignore options that aren't implemented by a driver and
to set different default values.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41843>
For zero inputs, we end up with intermediate infinities from
frcp(0.0). The final output is not infinity though, so this has to
be well defined even when applications don't request preserving infinities
on their own. Also preserve signed zeros to make the sign of the infinities
well defined.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15585
Cc: mesa-stable
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41953>
Not all GPUs would support this, but it shouldn't affect our shader
compiles, and it does mean that we can support replay of traces with
userspace iova allocation buffer addresses on larger GPUs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37323>