Use fd after dup instead of the one before dup to avoid
drm_syncobj_find failed in guest kernel when dev is found in
dev_list.
When dev is not found in dev_list, it uses device fd which is
duplicated, to init sync provider. And when it's found, the same
device fd should be used. Otherwise, it would caused inconsistency
and failures like in the Android domU CTS test where the guest
kernel attempts to locate a syncobj. This occurs because
vdrm_device_connect and VIRTGPU_EXECBUFFER ioctl use fd after dup
while util_sync_provider_drm uses the one before dup.
The fix has been validated with the CtsSdkSandboxWebkitTestCases in
Android domU, and the previously failing test cases no longer occur.
Signed-off-by: Ruitang.Wang@amd.com
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39520>
Similar to the low latency option for encode, this reduces latency
of decoding at the cost of increased power usage.
Can be enabled with AMD_DEBUG=lowlatencydec
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39450>
Some apps (old FFmpeg, contemporary CTS) send down pMi{Col,Row}Starts in
SB units, not MI units. Instead of dependening on those values which
could be unreliable, derive the tile sizes in SB using other parameters.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39492>
radv_dump_shader_stats() printed stats for every shader with a certain
stage, and we called this function each time an RT shader is compiled.
This means we could repeat the stats for a shader.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39484>
This can happen if a loop has no continues, and the later code should work
fine in this situation.
This fixes war_thunder/0013a69e097b2471 on navi21.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b9d28ab9b ("aco/insert_fp_mode: insert fp mode in reverse")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39481>
Compute shaders are the fastest for all copies and some clears.
Note that this is a very different compute shader than the one in RADV.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39290>
Now that all larger workgroup sizes are lowered to 256,
the regalloc hang cannot mess up the compute queues anymore.
Still don't allow compute queues on GFX6 though,
they are prone to hangs. Needs further investigation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
PAL always set WD_SWITCH_ON_EOP for pre gfx10 when primitive
restart is enabled to prevent gpu hang.
It only happens when specific index stream with primitive
restart. Since we don't know what's the exact problem,
just follow PAL to disable 4x primitive rate when primitive
restart is enabled.
GFX10+ does not use this function.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39292>
Query the signature of the traversal function stored in the any-hit
shader and make the parameter locations between the two match up, to
remove unnecessary movs inside the traversal loop.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
Parameter assignment hints allow to influence parameter register
assignment logic with user-specified affinities. If there is an affinity
declared for a parameter, the assignment logic will try to match the
registers a parameter and its affinity are assigned.
It also allows to hint that certain registers are not suitable for
assigning parameters to and should be avoided.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
We can express any-hit/intersection shaders as functions, too.
Any-hit/Intersection shaders need the usual parameters like launch
IDs/descriptor data/ray properties, origin, direction/etc., but also
some special parameters related to traversal state. Any-hit/intersection
shaders need to return whether the hit was accepted and/or traversal
should be terminated, as well as the intersection T value (for
intersection shaders). Both any-hit and intersection shaders also need
to be passed hit attributes via parameters. Closest-Hit shaders need
those too, but we pass them out-of-band via LDS. LDS is used for the
traversal stack when any-hit/intersection shaders, so we need to pass
them via parameters.
Hit attributes are similar to ray payloads in the sense that they're
dynamically sized depending on how much space the application uses.
However, unlike ray payloads, hit attribute sizes have a strict upper
bound of 8 dwords. To make managing parameters easier, we put all hit
attributes in a single vector parameter with 0-8 components. This
prevents having a function with two sets of arbitrary numbers of
parameters.
This commit sets up ahit/isec function signatures and implements
lowering for ahit/isec-specific intrinsics in the context of these
functions. Subsequent commits will merely have to call into these
functions to execute a separate-compiled any-hit/intersection shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
terminate_ray should only return from any-hit shaders, it should not
skip the intersection shader. If we insert a nir_jump_return when
processing the already-inlined any-hit shader, the intersection shader
will be skipped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
comp-to-single is supported since GFX10, it's a new type of DCC fast
clear which doesn't require FCE and doesn't require to set fast clear
registers (ie. comp-to-reg). This means that it's possible to fast clear
even if not all slices are bound, because the clear code is stored in
the main image.
This improves performance in Dirt Rally 2.0 by +2-5%. Other games that
have layered clears would also benefit on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39394>
Like vkCmdClearAttachments(). This is a preliminary change for the
next commit which will enable these fast clears when comp-to-single
is supported.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39394>
Vulkan spec requires binding flags to be matched with the binding with
the same index, however currently bindings are sorted with flags not
properly sorted, which leads to bindings and flags mismatch.
Resolve this by adding optional flags info to the parameters of
vk_create_sorted_bindings(), and refactoring panvk/pvr (which really
pair bindings and flags instead of only iterating flags) to use sorted
flags.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38967>
This is possible because no comp-to-reg and no FCE. This probably helps
a bunch on GFX11+ if GENERAL is widely used with color images. And
since VK_KHR_unified_image_layout it's likely the case on GFX11-11.5
GFX10-10.3 could also benefit from this but some MSAA with DCC
fast-clears are currently broken and they need to be fixed first.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39396>