HiZ must only be cleared when the full HiZ workaround is enabled. This
means that the previous slow clear draw would disable HiZ because it
hits the conditions (ie. depth/stencil enable and depth writes enabled).
So, the draw and the dispatch can run in parallel by moving the barrier
earlier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39433>
Needs to consider the base offset, otherwise it's resolving to the
first 3D slice.
Fixes very recent VKCTS coverage dEQP-VK.pipeline.*.multisample.m10_resolve.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39393>
Use fd after dup instead of the one before dup to avoid
drm_syncobj_find failed in guest kernel when dev is found in
dev_list.
When dev is not found in dev_list, it uses device fd which is
duplicated, to init sync provider. And when it's found, the same
device fd should be used. Otherwise, it would caused inconsistency
and failures like in the Android domU CTS test where the guest
kernel attempts to locate a syncobj. This occurs because
vdrm_device_connect and VIRTGPU_EXECBUFFER ioctl use fd after dup
while util_sync_provider_drm uses the one before dup.
The fix has been validated with the CtsSdkSandboxWebkitTestCases in
Android domU, and the previously failing test cases no longer occur.
Signed-off-by: Ruitang.Wang@amd.com
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39520>
Similar to the low latency option for encode, this reduces latency
of decoding at the cost of increased power usage.
Can be enabled with AMD_DEBUG=lowlatencydec
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39450>
Some apps (old FFmpeg, contemporary CTS) send down pMi{Col,Row}Starts in
SB units, not MI units. Instead of dependening on those values which
could be unreliable, derive the tile sizes in SB using other parameters.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39492>
radv_dump_shader_stats() printed stats for every shader with a certain
stage, and we called this function each time an RT shader is compiled.
This means we could repeat the stats for a shader.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39484>
This can happen if a loop has no continues, and the later code should work
fine in this situation.
This fixes war_thunder/0013a69e097b2471 on navi21.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b9d28ab9b ("aco/insert_fp_mode: insert fp mode in reverse")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39481>
Compute shaders are the fastest for all copies and some clears.
Note that this is a very different compute shader than the one in RADV.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39290>
Now that all larger workgroup sizes are lowered to 256,
the regalloc hang cannot mess up the compute queues anymore.
Still don't allow compute queues on GFX6 though,
they are prone to hangs. Needs further investigation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39288>
PAL always set WD_SWITCH_ON_EOP for pre gfx10 when primitive
restart is enabled to prevent gpu hang.
It only happens when specific index stream with primitive
restart. Since we don't know what's the exact problem,
just follow PAL to disable 4x primitive rate when primitive
restart is enabled.
GFX10+ does not use this function.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39292>
Query the signature of the traversal function stored in the any-hit
shader and make the parameter locations between the two match up, to
remove unnecessary movs inside the traversal loop.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
Parameter assignment hints allow to influence parameter register
assignment logic with user-specified affinities. If there is an affinity
declared for a parameter, the assignment logic will try to match the
registers a parameter and its affinity are assigned.
It also allows to hint that certain registers are not suitable for
assigning parameters to and should be avoided.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
We can express any-hit/intersection shaders as functions, too.
Any-hit/Intersection shaders need the usual parameters like launch
IDs/descriptor data/ray properties, origin, direction/etc., but also
some special parameters related to traversal state. Any-hit/intersection
shaders need to return whether the hit was accepted and/or traversal
should be terminated, as well as the intersection T value (for
intersection shaders). Both any-hit and intersection shaders also need
to be passed hit attributes via parameters. Closest-Hit shaders need
those too, but we pass them out-of-band via LDS. LDS is used for the
traversal stack when any-hit/intersection shaders, so we need to pass
them via parameters.
Hit attributes are similar to ray payloads in the sense that they're
dynamically sized depending on how much space the application uses.
However, unlike ray payloads, hit attribute sizes have a strict upper
bound of 8 dwords. To make managing parameters easier, we put all hit
attributes in a single vector parameter with 0-8 components. This
prevents having a function with two sets of arbitrary numbers of
parameters.
This commit sets up ahit/isec function signatures and implements
lowering for ahit/isec-specific intrinsics in the context of these
functions. Subsequent commits will merely have to call into these
functions to execute a separate-compiled any-hit/intersection shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
terminate_ray should only return from any-hit shaders, it should not
skip the intersection shader. If we insert a nir_jump_return when
processing the already-inlined any-hit shader, the intersection shader
will be skipped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39314>
comp-to-single is supported since GFX10, it's a new type of DCC fast
clear which doesn't require FCE and doesn't require to set fast clear
registers (ie. comp-to-reg). This means that it's possible to fast clear
even if not all slices are bound, because the clear code is stored in
the main image.
This improves performance in Dirt Rally 2.0 by +2-5%. Other games that
have layered clears would also benefit on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39394>