If the main CS is SDMA and the gang CS is ACE, this would emit a
SDMA_FENCE packet on ACE which just hangs.
Fixes: b1938901d0 ("radv: Use SDMA fence packet when flushing gang semaphores")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39211>
For transfer queue operations that aren't supported by SDMA,
implement them with ACE (Async Compute Engine) using the pre-
existing compute copy functions.
Add a helper radv_get_pm4_cs that returns the ACE gang CS for
transfer command buffers and the main CS for graphics/compute
command buffers. Use radv_get_pm4_cs to make sure to emit the
compute commands to the correct command stream.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25594>
They will be called from the transfer copy functions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
We need to use gang semaphores in the following two scenarios:
1. Leader to follower semaphore:
Increment the leader to follower semaphore when the leader wants
to block the follower: a transfer operation on ACE needs to wait
for a previous operation on SDMA.
2. Follower to leader semaphore:
Increment the follower to leader semaphore when the follower wants
to block the leader: a transfer operation on SDMA needs to wait
for a previous operation on ACE.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
Change the explanation to use "leader" and "follower" terminology.
Explain better how it is used with GFX/ACE and SDMA/ACE.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
It is enough to compute them after upload.
This saves some disk space and eliminates an unlikely
bug where the shader cache is shared between two GPUs
with the same chip but a different number of enabled CUs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38970>
GFX10 hangs when drawing from a 0-sized index buffer.
GFX6 has a HW bug when the index buffer address is 0.
Looking at VK CTS runs, GFX6 still triggers VM faults despite the
current mitigation, and it also tries to access memory when the
index buffer is zero sized. So it looks like GFX6 and GFX10
really have the same bug.
Let's share the mitigation between the two.
Use a zero-filled BO instead of the upload buffer.
This fixes VM faults on GFX6, and should speed up GFX10 a bit.
Note that the zero-filled BO is also going to be used for
other bug mitigations on GFX6-7.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958>
This fixes a real issue when ESO uses fbfetch output because this
was determined after instead of before.
This solution isn't the most elegant one but binding graphics shaders
earlier would require more work. Let's just handle this specific corner
case for now.
This fixes
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.custom_resolve.shader_objects.fragment_region*
on some GPUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38617>
Unlike GFX10.3, on GFX11+ VRS override is part of PA_SC_VRS_OVERRIDE_CNTL
which also controls whether the VRS surface is enabled or not. This
new dirty state will allow us to re-emit that state without re-emitting
the complete framebuffer for VRS flat shading.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38527>
Previously legacy_gs_info calculated based on
gs_info->legacy_gs_info.esgs_itemsize which is calculated based on gs
input varyings.
However, when using ESO vs/tes can have outputs not read by gs, which
leads to underestimating LDS usage.
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38514>
When there are no color outputs in the rendering state, but color write
enable/write aren't masked out (which seems legal with
VK_EXT_dynamic_rendering_unused_attachments), the driver must emit
CB_DISABLE to disable CB rendering completely.
Otherwise, if there is also a depth/stencil attachment in the rendering
state, CB0 is always set to 32_R for RB+. That means, the pixel shader
would still export fragments but to the previously bound color
attachment.
VKCTS is missing coverage.
Fixes: 4580293ab2 ("radv: implement RB+ depth-only rendering for better perf")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14319
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38509>
VGT_OUTPRIM_TYPE should be programmed correctly when PointMode is only
set in TCS with ESO.
Fixes dEQP-VK.shader_object.tessellation.hlsl.point_mode.
Fixes: c6d9b9b4e0 ("radv: support more tessellation parameters with TCS for ESO unlinked shaders"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38376>
The Vulkan spec change hasn't been released yet but the VKCTS test
is public, so let's merge the fix to make VKCTS green again locally.
Fixes dEQP-VK.shader_object.tessellation.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38209>