Spec allows to request a present at a specific target time or duration
without actually storing + querying any present records about completion
time. Iow. it allows VkPresentTimingInfoEXT.presentStageQueries == 0.
In this case, skip allocation and processing of a timing history record,
but still assign a VkPresentTimingInfoEXT.targetTime for timed present.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 47d69664d8 ("vulkan/wsi: Add common infrastructure for EXT_present_timing.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
- Queueing a present with VkPresentTimingsInfoEXT in the .pNext chain of
VkPresentInfoKHR, but VkPresentTimingsInfoEXT.pTimingInfos == NULL is
allowed and must not crash, just no-op.
- VkPresentTimingInfoEXT.targetTime == 0 means to ignore targetTime and
to simply present as soon as possible. This is achieved by setting
info->targetTime == 0 ==> target_time = 0. Make sure target_time stays
also 0 if targetTimeDomainPresentStage is set to
VK_PRESENT_STAGE_QUEUE_OPERATIONS_END_BIT_EXT, ie. skip the device->cpu
conversion via wsi_swapchain_present_convert_device_to_cpu(), as that
might map a zero info->targetTime device time to a non zero cpu
target_time.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 47d69664d8 ("vulkan/wsi: Add common infrastructure for EXT_present_timing.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Some hw + kms driver combos do not support vblank related functions
at all, ie. no drmCrtc[Get/Queue]Sequence() ioctl, no crtc sequence
events, no vblank of pageflip completion reported in pageflip events.
Most notable under the present_timing supported Vulkan drivers is
Asahi Linux on Apple Silicon Macs, with no such support: Only pageflip
events with a valid flip timestamp are supported.
To deal with this, we detect lack of vblank support and instead
use the current "vrr timing" path, which doesn't use vblanks, but
absolute time and timed waits. This also required a slight restructuring
of the setup logic.
Also fix semantics of requested relative timed presents via
VK_PRESENT_TIMING_INFO_PRESENT_AT_RELATIVE_TIME_BIT_EXT. The
spec states that the given target time should be relative to
the most recently presented image on a swapchain, and that if
no such image was presented yet (during the first present on
a swapchain), the relative target present time should be ignored.
Take care of this by tracking vblank count and time of the most
recent completed swapchain present separately from the most recent
known vblank count and time of the connector. Choose the swapchain
most recent present vblank data as baseline for relative timed
presents, to optimally implement spec semantics, but the connectors
vblank data for absolute timed presents to minimize rounding errors
and drift when converting between time and vblank cycle counts.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Both local testing and the Mesa CI's VK-CTS VK_GOOGLE_display_timing
test cases show some oddity of amdgpu-kms driven AMD gpu's wrt.
VK_EXT_present_timing and upcoming VK_GOOGLE_display_timing:
The very first present (atomic commit / pageflip) after a full
modeset commit which turns the associated output / connector from
fully off to on (powering up the display hw) will not send a
regular pageflip completion event, but instead send a pageflip
completion event after display hw programming is completed, with
all-zero vblank sequence count and present timestamp. This would
cause invalid timestamps for this very first present reported to
clients, and trips up the VK-CTS VK_GOOGLE_display_timing conformance
tests, because the first present is signalled as completed before it
was even queued. This failure can be observed with AMD gpu's in the
CI, but not with Intel or Qualcomm gpu CI, where CTS is successful.
Note this quirk doesn't happen for regular modesets on an already
running output, ie. one with at least one active hw plane. It does
happen for the CTS, as it seems to start from a powered off output.
Work around this AMD quirk:
1. Detect a pageflip event with all zero frame count and timestamp.
2. Try to query the count and timestamp of the most recent vblank,
as a likely good substitute for the "completed" pageflip, given
that pageflip and vblank counts and timestamps must always match
for the vblank of actual flip completion.
3. If the query should fail or also report non-sensical values, e.g.,
completed before queued, fall back to current system time as a
ok'ish result. Note that during my local testing on AMD Polaris11
with DCE-11.2 display engine this 3rd case was not ever observed,
and 2 did a good job. This is just a fallback for the fallback.
For reference, after digging through lots of amdgpu DC Linux source
code, the relevant decision code for deciding for a regular pageflip
event dispatched from the pageflip completion interrupt handler is
to be found by searching for the call site of the function
prepare_flip_isr(). The fallback code for the special "full modeset
to power on the display engine and skip regular pageflip event" is
the call site of the function drm_send_event_locked().
Successfully tested on AMD Polaris 11, DCE 11.2 display engine, and
also by Mesa CI's VK-CTS VK_GOOGLE_display_timing test cases for
direct display mode on AMD gpu's.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
This improves reliability of VK_EXT_present_timing on wsi/display
and should be backported to Mesa 26.1-rc.
When latching connector->last_nsec timestamps from either a
drmCrtcGetSequence() query, or from a vblank sequence event
timestamp extracted as part of wsi_display_sequence_handler()
-> wsi_display_fence_event_handler() call sequence, the vblank
timestamps are in nanosecond precision/granularity, whereas
latched timestamps from a pageflip completion event in a call
to wsi_display_page_flip_handler2() are in increments of 1000
nsecs, based on microsecond precision/granularity timestamps.
All timestamp sources are based on the same DRM/KMS timestamps,
bit the different interfaces/api's expose those in different
precision.
This could cause a connector timestamp from the sequence path
in nanoseconds to be overwritten by a new timestamp from a
pageflip completion event that is truncated down to the next
lowest microsecond, causing time in connector->last_nsec to
go backwards by up to 999 nsecs. A MAX2 operator prevents
this.
Additionally, this also updates connector->last_nsec from a
successful Vulkan client call to vkGetSwapchainCounterEXT(),
allowing for a potentially more recent and thereby accurate
connector->last_nsec timestamp to be used as baseline for
scheduling timed FRR presents via VK_GOOGLE_display_timing or
VK_EXT_present_timing.
This is an improvement originally made by Keith Packard in his
original VK_GOOGLE_display_timing KHR_Display implementation,
just forward ported by myself, adding a slightly more descriptive
comment in the code.
See MR 38472 for reference of Emma's work, based on Keith's work.
The code from the original commit was...
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Cc: mesa-stable
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
Some wsi/display VK-CTS test cases, e.g., for VK_GOOGLE_display_timing, select
swapchain imageUsage flags which are incompatible with the color format
VK_FORMAT_B8G8R8A8_SRGB that was returned as the first ("default") swapchain
image color format by vulkan/wsi/display, but not properly validated for
compatibility by the CTS test cases. This ends badly - with a crash due to
assert(), also in Mesa's CI pipeline, e.g.,
../src/vulkan/wsi/wsi_common_drm.c:710: wsi_configure_native_image: Assertion
`!"Failed to find a supported modifier! This should never " "happen because
LINEAR should always be available"' failed.
Reorder VK_FORMAT_B8G8R8A8_UNORM into the first slot, as this is safe to use,
and make VK_FORMAT_B8G8R8A8_SRGB a safe second. This should be fine, as the
spec doesn't mandate VK_FORMAT_B8G8R8A8_SRGB or any specific format be first,
and vulkan/wsi/wayland regularly exposes other formats on various Wayland
compositors. The macOS Khronos MoltenVK Vulkan ICD also uses unorm first
ordering, as seem to do common MS-Windows Vulkan ICD's.
I assume that apps which really want to specifically test SRGB color formats
will explicitly select such a format, so no harm is done by reordering.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
In kgsl_bo_init(), tu_dump_bo_init() should be called for tu_bo after
it's initialized and before it's possibly mapped, since the mapping
can fail and cause kgsl_bo_finish() to call tu_dump_bo_del() for tu_bo
with an improperly initialized dump_bo_list_idx, leading to crashes.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41983>
si_create_context checks contexts that need recreation but only
destroy them rather than creating them. Creation now belongs to a
single function: si_get_aux_context.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
The shared sdma code used the "sdma_supports_compression" field from info
but radeonsi code still relied on gfx level checks.
Fixes: f5ecc5ffd5 ("ac,radv,radeonsi: add ac_emit_sdma_copy_tiled_sub_window()")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
If the vaapi application submits SPS with pic_width_in_luma_samples
not aligned to be divisible by 64, the driver overwrites it to an
aligned value. But if it does so, then it should also recalculate
the conformance window.
Example from real life: gstreamer vah265enc built with libva < 1.21
or vaapih265enc transcoding a video of width == 854
gst-launch-1.0 uridecodebin
uri=https://media.w3.org/2010/05/sintel/trailer.mp4 ! vaapih265enc !
filesink location=out.h265
The code uploads an SPS with pic_width_in_luma_samples == 864, and
the driver overwrites it to 896.
The conformance window provided in the SPS was 10 : 864 - 10 = 854.
So after encoding the output width results in a wrong value:
896 - 10 = 886
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41997>
Similar to WSI options.
It's possible to ignore options that aren't implemented by a driver and
to set different default values.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41843>
For zero inputs, we end up with intermediate infinities from
frcp(0.0). The final output is not infinity though, so this has to
be well defined even when applications don't request preserving infinities
on their own. Also preserve signed zeros to make the sign of the infinities
well defined.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15585
Cc: mesa-stable
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41953>
Not all GPUs would support this, but it shouldn't affect our shader
compiles, and it does mean that we can support replay of traces with
userspace iova allocation buffer addresses on larger GPUs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37323>
Gallivm runs shaders that are originally compiled with another backend's
compiler options, which may have optimizations that introduce opcodes
that gallivm does not support. Add a pass to lower these.
Assisted-by: Claude Opus 4.6
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41302>
Previously, the mem alloc wait barrier is via a separate renderer
submission (e.g. execbuf for virtgpu backend). In fact, we can leverage
the cmd payload in resource_create_blob to avoid the extra submission.
This would help downstream win32 backend as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42003>
This is to leverage drm_virtgpu_resource_create_blob::cmd for expressing
the blob mem host resource dependency in the virtgpu backend, which can
avoid the execbuf. Similar for vtest backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42003>
Currently, when hashing a pipeline stage, the final hash is different
when the module is passed as VkPipelineShaderStageCreateInfo::module
(the module's hash is hashed) or as a VkShaderModuleCreateInfo in its
pNext chain (the module's code is hashed). This causes unnecessary cache
misses. To prevent this, hash the code first in the latter case and add
that hash to the stage's hash.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42014>
Current size of prev_refs is 8, which just means the size of ref-frames
but needs to be aligned with full size of dpb, which is 9.
Also prev_refs is now indexed by dpb slot and holds the last intra frame
written to that slot.
This fixes visible artifacts on AV1 streams that mix super-res and
non-super-res frames in a hierarchical reference structure.
Closes: mesa/mesa#15503
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41846>
This may delete existing pointer flags coming from the instance if the
traversal loop is exited and then restarted, as is done with ray
queries.
Fixes geometry being incorrectly culled due to FLIP_FACING flags going
missing.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41965>
V3D 7.1 now exposes shaderFloat16, shaderInt8, shaderInt16 and
VK_KHR_shader_float16_int8.
Partial native Float16 support is already available. But the rest of
sub-32-bit ALU operations are widened to 32-bit by nir_lower_bit_size
in v3d_lower_nir(); conversion and pack operations are kept at their
native bit width so the QPU's 16-bit pack/unpack paths on mul/mov can
be used.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
Keep f16 fadd/fsub/fmul/fmin/fmax/fneg/fabs at 16-bit through
nir_lower_bit_size on V3D 7.1+ and emit the matching VF* op in
nir_to_vir, instead of widening to f32 with f16<->f32 round-trip
movs that pack-fold can absorb into hints. The native path saves
the absorption overhead in f16-heavy shaders.
Only the lower half of each VF* result is consumed; the upper half
is computed but unused.
New VIR helpers vir_VFADD, vir_VFSUB, vir_VFCMP, vir_VFMIN,
vir_VFMUL, vir_VFMOV, vir_VFABS, vir_VFNEG, vir_VFNAB were added.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
Add the V3D 7.1+ 2x16-bit f16 add-pipe ops (VFADD/VFSUB/VFCMP and
the sign-manipulation family VFMOV/VFABS/VFNEG/VFNAB), wire VFMAX
into v3d71_add_ops, and complete the V3D 7.1 decode/encode for
VFMIN/VFMAX/VFMUL.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
The liveness analysis treated any output-pack write (D.l /
D.h) as a partial definition, refusing to mark the variable as
defined in the block. That extended live ranges all the way to the
top of the program for every f16 temporary, artificially increasing
register pressure.
D.l/h only modifies the written bits, leaving the unwritten half bits
preserved. So a pack write is a full definition whenever no
consumer ever observes the unwritten half, or when both halves are
written before the variable is used.
This scans every instruction into a per-temp read-flag array
(TEMP_READ_LO / TEMP_READ_HI, with FULL = LO | HI) by inspecting
each source's input unpack. And recognizes two patterns as full
definitions:
* Both PACK_L and PACK_H written unconditionally in the same block.
* The instruction's pack writes the half that covers every observed
read of the variable across the program (the unwritten half is never
read).
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
nir_lower_subgroups lowers reduce/scan to a tree of shuffle + ALU
chains over the source data type. When the source is sub-32-bit
(int8, int16, float16, or vector forms) those new ALU ops escape
the bit_size widening done earlier in v3d_lower_nir, leaving the
QPU codegen to emit raw min/max/etc. on 32-bit channel registers
whose upper bits are unspecified. The result is wrong reductions
for signed integer min/max (the upper bits make a signed int8 look
like a positive int32), wrong unsigned reductions (high-bit garbage
mixes into the result), and wrong f16 reductions.
Re-run nir_lower_bit_size after nir_lower_subgroups so the
generated sub-32-bit ALU ops are widened with the correct
sign/zero extension on inputs and the matching narrow on outputs.
Also widen vote_feq/vote_ieq when the source operand is sub-32-bit:
the V3D backend emits ALLFEQ/ALLEQ on full 32-bit channels (it does
not use yet the f16 vfcmp/vfmin/vfmax HW path), so the comparison input
must be 32-bit.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
flrp32 is already lowered; mirror it for flrp16 so V3D's f16 ALU
path doesn't see an unsupported flrp@16 leftover after bit_size
widening. No measurable test impact on the current f16 sweep,
but matches the f32 behaviour and keeps the lowering surface
consistent across bit sizes.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
The frexp lowering decomposes frexp into bit manipulation (fabs, ushr,
iand, ior) that relies on implicit float-to-int bit reinterpretation.
When lowered at 16-bit, the subsequent nir_lower_bit_size pass widens
float operations with f2f32 (changing the bit pattern to IEEE fp32)
and integer operations with u2u32 (zero-extending 16-bit bits). This
breaks the reinterpretation: ushr on the fabs result gets f2f32-widened
float bits instead of the original fp16 bit pattern, causing the sign
bit to leak into the exponent extraction for negative inputs.
Moving nir_lower_frexp into v3d_lower_nir after nir_lower_bit_size.
This way frexp decomposition operates at 32-bit where float and integer
operations share the same bit width, and the bit manipulation masks use
the correct IEEE fp32 constants.
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
itof and utof natively support packing the f32 result to f16
(.l/.h), but the encode/decode paths fell through to the default
case and rejected any non-NONE pack, breaking nir_op_i2f16 /
nir_op_u2f16 codegen with "Failed to pack instruction: itof rfN.l".
Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>