To not print a warning about missing SPM by default on < GFX10.
Also move the function to radv_physical_device.c and make it non-static.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39065>
Initialize gang CS on unsupported transfer operations.
Add a wait when:
- SDMA needs to wait for previous transfer operations on ACE
- ACE needs to wait for previous transfer operations on SDMA
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
They will be called from the transfer copy functions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
We need to use gang semaphores in the following two scenarios:
1. Leader to follower semaphore:
Increment the leader to follower semaphore when the leader wants
to block the follower: a transfer operation on ACE needs to wait
for a previous operation on SDMA.
2. Follower to leader semaphore:
Increment the follower to leader semaphore when the follower wants
to block the leader: a transfer operation on SDMA needs to wait
for a previous operation on ACE.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
Change the explanation to use "leader" and "follower" terminology.
Explain better how it is used with GFX/ACE and SDMA/ACE.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
RADV's transfer queue implementation will use compute for
the transfer operations that aren't supported by the SDMA,
so we'll need gang submissions for that.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
The following are not supported by SDMA:
- Sparse images (aka. PRT) on older GPUs
- Multisampled images
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
When the "gang leader" is SDMA, we need to ensure that the
gang semaphores BO is coherent between SDMA and CP.
To achieve this, we need bypass the L2 cache when either SDMA
or CP are connected to L2.
Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39057>
The pass now uses the ring descriptors to figure these out.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39032>
This reduces the number of initialized VGPRs by 1 when no barycentric
coordinates are used.
I have verified with zink that this indeed increases performance for
cases where sysvals like frag_coord and front_face are used without
interpolated PS inputs.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38936>
Because the tracked registers are really driver dependant, the driver
is expected to handle the tracked_registers struct itself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38740>
It is enough to compute them after upload.
This saves some disk space and eliminates an unlikely
bug where the shader cache is shared between two GPUs
with the same chip but a different number of enabled CUs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38970>
Fixes the following building error happening with clang:
FAILED: src/amd/vulkan/libvulkan_radeon.so.p/nir_radv_nir_rt_traversal_shader.c.o
...
../src/amd/vulkan/nir/radv_nir_rt_traversal_shader.c:1159:49: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_nir_rt_traversal_params params = {};
^
1 error generated.
Fixes: f692ac76 ("radv/rt: Use traversal vars for object origin/direction in ahit/isec")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38954>
GFX10 hangs when drawing from a 0-sized index buffer.
GFX6 has a HW bug when the index buffer address is 0.
Looking at VK CTS runs, GFX6 still triggers VM faults despite the
current mitigation, and it also tries to access memory when the
index buffer is zero sized. So it looks like GFX6 and GFX10
really have the same bug.
Let's share the mitigation between the two.
Use a zero-filled BO instead of the upload buffer.
This fixes VM faults on GFX6, and should speed up GFX10 a bit.
Note that the zero-filled BO is also going to be used for
other bug mitigations on GFX6-7.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958>
We should not manipulate the session buffers at command recording time.
It shouldn't cause any problems as these initialized probability tables
are not modified by firmware, but moving these to bind time should be
safer and also faster if an application frequently RESETs.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38926>
It's not good for performance, but it's possible to use for debugging.
Running single-wave GS workgroups could work around any LDS race conditions.
Setting the workgroup size to 64 reliably works around
GLCTS *primitive_counter*line failures, indicating streamout data
corruption with multi-wave GS workgroups.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38328>
Replace the duplicated swapchain image detection pattern across all
Vulkan drivers with the new wsi_common_is_swapchain_image() helper.
Since the swapchain handle can be extracted from VkImageCreateInfo's
pNext chain inside wsi_common_create_swapchain_image(), remove the
now-redundant VkSwapchainKHR parameter from that function.
This removes the #ifdef guards for Android/WSI platforms from each
driver, as the helper now handles this uniformly.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38541>
This calculation needs to happen in the same loop as the
geometry/triangle id calculations in case the selected invocation is
before all invocations that were already selected.
Totals from 1269 (15.10% of 8406) affected BVHs:
compacted_size: 137581888 -> 137606464 (+0.02%); split: -0.08%, +0.10%
sah: 6496048424 -> 6496048450 (+0.00%); split: -0.00%, +0.00%
primitive_node_count: 604384 -> 604656 (+0.05%); split: -0.14%, +0.19%
Fixes: c18a7d0 ("radv: Emit compressed primitive nodes on GFX12")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38462>