Commit graph

223789 commits

Author SHA1 Message Date
Mario Kleiner
2beb0c8820 wsi/common: Allow VK_EXT_present_timing present without presentStageQueries.
Spec allows to request a present at a specific target time or duration
without actually storing + querying any present records about completion
time. Iow. it allows VkPresentTimingInfoEXT.presentStageQueries == 0.

In this case, skip allocation and processing of a timing history record,
but still assign a VkPresentTimingInfoEXT.targetTime for timed present.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 47d69664d8 ("vulkan/wsi: Add common infrastructure for EXT_present_timing.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
2026-06-05 10:21:51 +00:00
Mario Kleiner
784a41cb8b wsi/common: Small compliance fixes for VK_EXT_present_timing.
- Queueing a present with VkPresentTimingsInfoEXT in the .pNext chain of
VkPresentInfoKHR, but VkPresentTimingsInfoEXT.pTimingInfos == NULL is
allowed and must not crash, just no-op.

- VkPresentTimingInfoEXT.targetTime == 0 means to ignore targetTime and
to simply present as soon as possible. This is achieved by setting
info->targetTime == 0 ==> target_time = 0. Make sure target_time stays
also 0 if targetTimeDomainPresentStage is set to
VK_PRESENT_STAGE_QUEUE_OPERATIONS_END_BIT_EXT, ie. skip the device->cpu
conversion via wsi_swapchain_present_convert_device_to_cpu(), as that
might map a zero info->targetTime device time to a non zero cpu
target_time.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 47d69664d8 ("vulkan/wsi: Add common infrastructure for EXT_present_timing.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
2026-06-05 10:21:51 +00:00
Mario Kleiner
d7b23e9f3a wsi/display: Deal with vblank-less systems for VK_EXT_present_timing.
Some hw + kms driver combos do not support vblank related functions
at all, ie. no drmCrtc[Get/Queue]Sequence() ioctl, no crtc sequence
events, no vblank of pageflip completion reported in pageflip events.

Most notable under the present_timing supported Vulkan drivers is
Asahi Linux on Apple Silicon Macs, with no such support: Only pageflip
events with a valid flip timestamp are supported.

To deal with this, we detect lack of vblank support and instead
use the current "vrr timing" path, which doesn't use vblanks, but
absolute time and timed waits. This also required a slight restructuring
of the setup logic.

Also fix semantics of requested relative timed presents via
VK_PRESENT_TIMING_INFO_PRESENT_AT_RELATIVE_TIME_BIT_EXT. The
spec states that the given target time should be relative to
the most recently presented image on a swapchain, and that if
no such image was presented yet (during the first present on
a swapchain), the relative target present time should be ignored.
Take care of this by tracking vblank count and time of the most
recent completed swapchain present separately from the most recent
known vblank count and time of the connector. Choose the swapchain
most recent present vblank data as baseline for relative timed
presents, to optimally implement spec semantics, but the connectors
vblank data for absolute timed presents to minimize rounding errors
and drift when converting between time and vblank cycle counts.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
2026-06-05 10:21:51 +00:00
Mario Kleiner
b4a1756fd1 wsi/display: Add workaround for all-zero valued pageflip events.
Both local testing and the Mesa CI's VK-CTS VK_GOOGLE_display_timing
test cases show some oddity of amdgpu-kms driven AMD gpu's wrt.
VK_EXT_present_timing and upcoming VK_GOOGLE_display_timing:

The very first present (atomic commit / pageflip) after a full
modeset commit which turns the associated output / connector from
fully off to on (powering up the display hw) will not send a
regular pageflip completion event, but instead send a pageflip
completion event after display hw programming is completed, with
all-zero vblank sequence count and present timestamp. This would
cause invalid timestamps for this very first present reported to
clients, and trips up the VK-CTS VK_GOOGLE_display_timing conformance
tests, because the first present is signalled as completed before it
was even queued. This failure can be observed with AMD gpu's in the
CI, but not with Intel or Qualcomm gpu CI, where CTS is successful.

Note this quirk doesn't happen for regular modesets on an already
running output, ie. one with at least one active hw plane. It does
happen for the CTS, as it seems to start from a powered off output.

Work around this AMD quirk:

1. Detect a pageflip event with all zero frame count and timestamp.
2. Try to query the count and timestamp of the most recent vblank,
   as a likely good substitute for the "completed" pageflip, given
   that pageflip and vblank counts and timestamps must always match
   for the vblank of actual flip completion.
3. If the query should fail or also report non-sensical values, e.g.,
   completed before queued, fall back to current system time as a
   ok'ish result. Note that during my local testing on AMD Polaris11
   with DCE-11.2 display engine this 3rd case was not ever observed,
   and 2 did a good job. This is just a fallback for the fallback.

For reference, after digging through lots of amdgpu DC Linux source
code, the relevant decision code for deciding for a regular pageflip
event dispatched from the pageflip completion interrupt handler is
to be found by searching for the call site of the function
prepare_flip_isr(). The fallback code for the special "full modeset
to power on the display engine and skip regular pageflip event" is
the call site of the function drm_send_event_locked().

Successfully tested on AMD Polaris 11, DCE 11.2 display engine, and
also by Mesa CI's VK-CTS VK_GOOGLE_display_timing test cases for
direct display mode on AMD gpu's.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
2026-06-05 10:21:51 +00:00
Mario Kleiner
f7fb4dd5ff wsi/display: Improve connector->last_nsec timestamping.
This improves reliability of VK_EXT_present_timing on wsi/display
and should be backported to Mesa 26.1-rc.

When latching connector->last_nsec timestamps from either a
drmCrtcGetSequence() query, or from a vblank sequence event
timestamp extracted as part of wsi_display_sequence_handler()
-> wsi_display_fence_event_handler() call sequence, the vblank
timestamps are in nanosecond precision/granularity, whereas
latched timestamps from a pageflip completion event in a call
to wsi_display_page_flip_handler2() are in increments of 1000
nsecs, based on microsecond precision/granularity timestamps.

All timestamp sources are based on the same DRM/KMS timestamps,
bit the different interfaces/api's expose those in different
precision.

This could cause a connector timestamp from the sequence path
in nanoseconds to be overwritten by a new timestamp from a
pageflip completion event that is truncated down to the next
lowest microsecond, causing time in connector->last_nsec to
go backwards by up to 999 nsecs. A MAX2 operator prevents
this.

Additionally, this also updates connector->last_nsec from a
successful Vulkan client call to vkGetSwapchainCounterEXT(),
allowing for a potentially more recent and thereby accurate
connector->last_nsec timestamp to be used as baseline for
scheduling timed FRR presents via VK_GOOGLE_display_timing or
VK_EXT_present_timing.

This is an improvement originally made by Keith Packard in his
original VK_GOOGLE_display_timing KHR_Display implementation,
just forward ported by myself, adding a slightly more descriptive
comment in the code.

See MR 38472 for reference of Emma's work, based on Keith's work.

The code from the original commit was...

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>

Fixes: 5e2814c8a4 ("wsi/display: Implement present timing on KHR_display.")
Cc: mesa-stable
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
2026-06-05 10:21:51 +00:00
Mario Kleiner
d22e0844ef wsi/display: Expose VK_FORMAT_B8G8R8A8_UNORM before VK_FORMAT_B8G8R8A8_SRGB
Some wsi/display VK-CTS test cases, e.g., for VK_GOOGLE_display_timing, select
swapchain imageUsage flags which are incompatible with the color format
VK_FORMAT_B8G8R8A8_SRGB that was returned as the first ("default") swapchain
image color format by vulkan/wsi/display, but not properly validated for
compatibility by the CTS test cases. This ends badly - with a crash due to
assert(), also in Mesa's CI pipeline, e.g.,

../src/vulkan/wsi/wsi_common_drm.c:710: wsi_configure_native_image: Assertion
`!"Failed to find a supported modifier!  This should never " "happen because
LINEAR should always be available"' failed.

Reorder VK_FORMAT_B8G8R8A8_UNORM into the first slot, as this is safe to use,
and make VK_FORMAT_B8G8R8A8_SRGB a safe second. This should be fine, as the
spec doesn't mandate VK_FORMAT_B8G8R8A8_SRGB or any specific format be first,
and vulkan/wsi/wayland regularly exposes other formats on various Wayland
compositors. The macOS Khronos MoltenVK Vulkan ICD also uses unorm first
ordering, as seem to do common MS-Windows Vulkan ICD's.

I assume that apps which really want to specifically test SRGB color formats
will explicitly select such a format, so no harm is done by reordering.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41168>
2026-06-05 10:21:51 +00:00
Zan Dobersek
fbdc5814ad tu/kgsl: initialize dump bo state in kgsl_bo_init sooner
In kgsl_bo_init(), tu_dump_bo_init() should be called for tu_bo after
it's initialized and before it's possibly mapped, since the mapping
can fail and cause kgsl_bo_finish() to call tu_dump_bo_del() for tu_bo
with an improperly initialized dump_bo_list_idx, leading to crashes.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41983>
2026-06-05 09:39:36 +00:00
Pierre-Eric Pelloux-Prayer
bed8008c9d ac/parse_ib: initialize data variables to 0
Avoids "warning: ‘data1’ may be used uninitialized" messages.

Fixes: 2aec2e8dba ("ac/parse_ib: Add VCN decode queue parsing")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
2026-06-05 09:16:57 +00:00
Pierre-Eric Pelloux-Prayer
c68e4d229b radeonsi: use aux context locks in si_destroy_screen
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
2026-06-05 09:16:57 +00:00
Pierre-Eric Pelloux-Prayer
9487c1b0e9 radeonsi: consolidate aux context creation into si_get_aux_context
si_create_context checks contexts that need recreation but only
destroy them rather than creating them. Creation now belongs to a
single function: si_get_aux_context.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
2026-06-05 09:16:57 +00:00
Pierre-Eric Pelloux-Prayer
3b3181a14d radeonsi: fix sdma copy for gfx10
The shared sdma code used the "sdma_supports_compression" field from info
but radeonsi code still relied on gfx level checks.

Fixes: f5ecc5ffd5 ("ac,radv,radeonsi: add ac_emit_sdma_copy_tiled_sub_window()")
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41966>
2026-06-05 09:16:57 +00:00
Alexander Slobodeniuk
7e040483ec radeonsi: fix conformance window emission in the SPS
If the vaapi application submits SPS with pic_width_in_luma_samples
not aligned to be divisible by 64, the driver overwrites it to an
aligned value. But if it does so, then it should also recalculate
the conformance window.

Example from real life: gstreamer vah265enc built with libva < 1.21
or vaapih265enc transcoding a video of width == 854

gst-launch-1.0 uridecodebin
uri=https://media.w3.org/2010/05/sintel/trailer.mp4 ! vaapih265enc !
filesink location=out.h265

The code uploads an SPS with pic_width_in_luma_samples == 864, and
the driver overwrites it to 896.

The conformance window provided in the SPS was 10 : 864 - 10 = 854.

So after encoding the output width results in a wrong value:
896 - 10 = 886

Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41997>
2026-06-05 08:58:56 +00:00
Samuel Pitoiset
e984014d56 turnip: declare common VK drirc options using the helper
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41843>
2026-06-05 09:14:45 +02:00
Samuel Pitoiset
64e63051dc anv: declare common VK drirc options using the helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41843>
2026-06-05 09:14:45 +02:00
Samuel Pitoiset
4e436fbd3d radv: declare common VK drirc options using the helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41843>
2026-06-05 09:14:45 +02:00
Samuel Pitoiset
38ce035860 util/drirc_gen: add a function to declare commmon VK options
Similar to WSI options.

It's possible to ignore options that aren't implemented by a driver and
to set different default values.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41843>
2026-06-05 09:14:45 +02:00
Samuel Pitoiset
9237656171 util: remove declared but unused DRIC_CONF_VK_REQUIRE_ASTC
Only used by RADV and ANV and options are auto-generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41843>
2026-06-05 09:14:45 +02:00
Alessandro Astone
e84e9dc582 gallivm: Fix armhf build against LLVM 22
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
StringMapIterator<bool> became StringMapIterBase<bool, false /* IsConst */>;
Use `auto` to handle either case.

Reviewed-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40161>
2026-06-05 05:47:39 +00:00
Karol Herbst
a6172f19a0 vtn/opencl: fix edge case behavior for tanpi
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41726>
2026-06-04 21:39:02 +00:00
Karol Herbst
1e9b1075b6 vtn/opencl: fix edge case behavior for sinpi
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41726>
2026-06-04 21:39:02 +00:00
Karol Herbst
8c109381ed vtn/opencl: fix edge case behavior for cospi
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41726>
2026-06-04 21:39:01 +00:00
Karol Herbst
3531a05fdd vtn/opencl: convert libclc workaround handling to a switch statement
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41726>
2026-06-04 21:39:00 +00:00
Georg Lehmann
3a815a5969 nir: preserve infinities and signed zero during atan2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
For zero inputs, we end up with intermediate infinities from
frcp(0.0). The final output is not infinity though, so this has to
be well defined even when applications don't request preserving infinities
on their own. Also preserve signed zeros to make the sign of the infinities
well defined.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15585
Cc: mesa-stable

Tested-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41953>
2026-06-04 21:02:11 +00:00
Emma Anholt
c634bf11ce drm-shim/freedreno: Report VM_BIND support.
This lets me replay .rdcs traced with sparse support so I can look at
their shaders.  We don't have to do anything with the iovas being bound,
since we don't execute anything.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37323>
2026-06-04 20:17:38 +00:00
Emma Anholt
053c8025a2 drm-shim/freedreno: report a 48-bit address space.
Not all GPUs would support this, but it shouldn't affect our shader
compiles, and it does mean that we can support replay of traces with
userspace iova allocation buffer addresses on larger GPUs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37323>
2026-06-04 20:17:36 +00:00
Emma Anholt
2b89d3da70 drm-shim/freedreno: Provide a dummy set of UBWC config params.
This reduces noise from drm-shim with current tu.  These values don't
affect shader compiles, so just pick some values from 750.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37323>
2026-06-04 20:17:35 +00:00
Emma Anholt
37ae0e4255 drm-shim: Include the hex of the driver ioctl for unimplemented ioctls.
Some headers #define them in hex, so make it easier to look up which one
isn't implemented.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37323>
2026-06-04 20:17:34 +00:00
Marek Olšák
1375ba209d ac: add basic HTILE dword printing
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42004>
2026-06-04 19:55:19 +00:00
Marek Olšák
9f3af96552 ac/surface: print the modifier in ac_surface_print_info
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42004>
2026-06-04 19:55:19 +00:00
Benjamin Cheng
a989ca8c8f mesa/st: run the lower_opcodes pass for draw shaders
Fixes: 5eb0136a3c ("mesa/st: when creating draw shader variants, use the base nir and skip driver opts")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15304
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41302>
2026-06-04 19:29:57 +00:00
Benjamin Cheng
a4a862a605 draw: Add lower_opcodes NIR pass
Gallivm runs shaders that are originally compiled with another backend's
compiler options, which may have optimizations that introduce opcodes
that gallivm does not support. Add a pass to lower these.

Assisted-by: Claude Opus 4.6
Signed-off-by: Benjamin Cheng <benjamin.cheng@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41302>
2026-06-04 19:29:57 +00:00
Faith Ekstrand
364b5f806c compiler/rust/smallvec: Optimize extend()
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42005>
2026-06-04 18:09:19 +00:00
Yiwei Zhang
4e8595da21 venus: let resource_create_blob wait for mem alloc
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Previously, the mem alloc wait barrier is via a separate renderer
submission (e.g. execbuf for virtgpu backend). In fact, we can leverage
the cmd payload in resource_create_blob to avoid the extra submission.
This would help downstream win32 backend as well.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42003>
2026-06-04 16:33:02 +00:00
Yiwei Zhang
77b73d8595 venus: update create_from_device_memory to take a cmd payload
This is to leverage drm_virtgpu_resource_create_blob::cmd for expressing
the blob mem host resource dependency in the virtgpu backend, which can
avoid the execbuf. Similar for vtest backend.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42003>
2026-06-04 16:33:02 +00:00
Job Noorman
2b37a0b410 vulkan: use consistent module hashing for pipeline stages
Currently, when hashing a pipeline stage, the final hash is different
when the module is passed as VkPipelineShaderStageCreateInfo::module
(the module's hash is hashed) or as a VkShaderModuleCreateInfo in its
pNext chain (the module's code is hashed). This causes unnecessary cache
misses. To prevent this, hash the code first in the latter case and add
that hash to the stage's hash.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42014>
2026-06-04 16:01:55 +00:00
Job Noorman
0a60a53c81 vulkan: add vk_shader_module_hash helper
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42014>
2026-06-04 16:01:55 +00:00
Hyunjun Ko
bea1212ee7 anv/video: Change size of the cached array of recently decoded AV1 frames.
Current size of prev_refs is 8, which just means the size of ref-frames
but needs to be aligned with full size of dpb, which is 9.
Also prev_refs is now indexed by dpb slot and holds the last intra frame
written to that slot.

This fixes visible artifacts on AV1 streams that mix super-res and
non-super-res frames in a hierarchical reference structure.

Closes: mesa/mesa#15503

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41846>
2026-06-04 15:43:54 +00:00
Hyunjun Ko
11c8930e2b anv/video: define ANV_VIDEO_AV1_MAX_DPB_SLOTS
this is a prep-work for the follwing fix.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41846>
2026-06-04 15:43:54 +00:00
Hyunjun Ko
6875286159 anv/video: Add to check size mismatch during motion field estimation.
Due to super resolution size can change so we need to keep coded size
and check whether the change happens during motion field estimation.

Closes: mesa/mesa#15503

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41846>
2026-06-04 15:43:54 +00:00
Natalie Vock
1a8953c956 radv: Dump printf buffer after detecting a GPU hang
This allows us to use printf debugging when the GPU hangs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41961>
2026-06-04 15:22:07 +00:00
Natalie Vock
c8518581bf radv/rt: Don't overwrite bvh_base at the start of the traversal loop
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This may delete existing pointer flags coming from the instance if the
traversal loop is exited and then restarted, as is done with ray
queries.

Fixes geometry being incorrectly culled due to FLIP_FACING flags going
missing.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41965>
2026-06-04 14:55:30 +00:00
Karmjit Mahil
10c914693d freedreno/computerator: Remove VLA giving a build warning
```
../src/freedreno/computerator/main.cc:327:24: warning: variable length arrays in C++ are a Clang extension [-Wvla-cxx-extension]
  327 |       uint64_t results[num_perfcntrs];
      |                        ^~~~~~~~~~~~~
../src/freedreno/computerator/main.cc:327:24: note: read of non-const variable 'num_perfcntrs' is not allowed in a constant expression
../src/freedreno/computerator/main.cc:206:13: note: declared here
  206 |    unsigned num_perfcntrs = 0;
      |             ^
```

Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42017>
2026-06-04 14:38:43 +00:00
Jose Maria Casanova Crespo
28e584b687 v3dv: enable lowered shaderFloat16/Int16/Int8 + VK_KHR_shader_float16_int8
V3D 7.1 now exposes shaderFloat16, shaderInt8, shaderInt16 and
VK_KHR_shader_float16_int8.

Partial native Float16 support is already available. But the rest of
sub-32-bit ALU operations are widened to 32-bit by nir_lower_bit_size
in v3d_lower_nir(); conversion and pack operations are kept at their
native bit width so the QPU's 16-bit pack/unpack paths on mul/mov can
be used.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:39 +00:00
Jose Maria Casanova Crespo
4c5b0fa7f4 v3d: emit packed-f16 ALU ops natively on V3D 7.1
Keep f16 fadd/fsub/fmul/fmin/fmax/fneg/fabs at 16-bit through
nir_lower_bit_size on V3D 7.1+ and emit the matching VF* op in
nir_to_vir, instead of widening to f32 with f16<->f32 round-trip
movs that pack-fold can absorb into hints. The native path saves
the absorption overhead in f16-heavy shaders.

Only the lower half of each VF* result is consumed; the upper half
is computed but unused.

New VIR helpers vir_VFADD, vir_VFSUB, vir_VFCMP, vir_VFMIN,
vir_VFMUL, vir_VFMOV, vir_VFABS, vir_VFNEG, vir_VFNAB were added.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:39 +00:00
Jose Maria Casanova Crespo
16856adff5 broadcom/qpu: expose V3D 7.1 packed-f16 instructions
Add the V3D 7.1+ 2x16-bit f16 add-pipe ops (VFADD/VFSUB/VFCMP and
the sign-manipulation family VFMOV/VFABS/VFNEG/VFNAB), wire VFMAX
into v3d71_add_ops, and complete the V3D 7.1 decode/encode for
VFMIN/VFMAX/VFMUL.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:39 +00:00
Jose Maria Casanova Crespo
5a575cca8e v3d: improve liveness analysis for packed partial writes
The liveness analysis treated any output-pack write (D.l /
D.h) as a partial definition, refusing to mark the variable as
defined in the block. That extended live ranges all the way to the
top of the program for every f16 temporary, artificially increasing
register pressure.

D.l/h only modifies the written bits, leaving the unwritten half bits
preserved. So a pack write is a full definition whenever no
consumer ever observes the unwritten half, or when both halves are
written before the variable is used.

This scans every instruction into a per-temp read-flag array
(TEMP_READ_LO / TEMP_READ_HI, with FULL = LO | HI) by inspecting
each source's input unpack. And recognizes two patterns as full
definitions:

 * Both PACK_L and PACK_H written unconditionally in the same block.
 * The instruction's pack writes the half that covers every observed
   read of the variable across the program (the unwritten half is never
   read).

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:39 +00:00
Jose Maria Casanova Crespo
66ac3b55af v3d: widen sub-32-bit subgroup arithmetic and vote ops
nir_lower_subgroups lowers reduce/scan to a tree of shuffle + ALU
chains over the source data type. When the source is sub-32-bit
(int8, int16, float16, or vector forms) those new ALU ops escape
the bit_size widening done earlier in v3d_lower_nir, leaving the
QPU codegen to emit raw min/max/etc. on 32-bit channel registers
whose upper bits are unspecified. The result is wrong reductions
for signed integer min/max (the upper bits make a signed int8 look
like a positive int32), wrong unsigned reductions (high-bit garbage
mixes into the result), and wrong f16 reductions.

Re-run nir_lower_bit_size after nir_lower_subgroups so the
generated sub-32-bit ALU ops are widened with the correct
sign/zero extension on inputs and the matching narrow on outputs.

Also widen vote_feq/vote_ieq when the source operand is sub-32-bit:
the V3D backend emits ALLFEQ/ALLEQ on full 32-bit channels (it does
not use yet the f16 vfcmp/vfmin/vfmax HW path), so the comparison input
must be 32-bit.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:39 +00:00
Jose Maria Casanova Crespo
54de903ae4 v3dv: lower flrp16 for consistency with flrp32
flrp32 is already lowered; mirror it for flrp16 so V3D's f16 ALU
path doesn't see an unsupported flrp@16 leftover after bit_size
widening. No measurable test impact on the current f16 sweep,
but matches the f32 behaviour and keeps the lowering surface
consistent across bit sizes.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:38 +00:00
Jose Maria Casanova Crespo
0a5200d051 v3d: move nir_lower_frexp after nir_lower_bit_size
The frexp lowering decomposes frexp into bit manipulation (fabs, ushr,
iand, ior) that relies on implicit float-to-int bit reinterpretation.
When lowered at 16-bit, the subsequent nir_lower_bit_size pass widens
float operations with f2f32 (changing the bit pattern to IEEE fp32)
and integer operations with u2u32 (zero-extending 16-bit bits). This
breaks the reinterpretation: ushr on the fabs result gets f2f32-widened
float bits instead of the original fp16 bit pattern, causing the sign
bit to leak into the exponent extraction for negative inputs.

Moving nir_lower_frexp into v3d_lower_nir after nir_lower_bit_size.
This way frexp decomposition operates at 32-bit where float and integer
operations share the same bit width, and the bit manipulation masks use
the correct IEEE fp32 constants.

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:38 +00:00
Jose Maria Casanova Crespo
cac92fecac broadcom/qpu: support output pack on itof/utof
itof and utof natively support packing the f32 result to f16
(.l/.h), but the encode/decode paths fell through to the default
case and rejected any non-NONE pack, breaking nir_op_i2f16 /
nir_op_u2f16 codegen with "Failed to pack instruction: itof rfN.l".

Assisted-by: Claude Opus 4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41810>
2026-06-04 13:29:38 +00:00