SQTT stands for SQ Thread Trace but it's shorter.
Note that environment variables aren't renamed because this might
break external applications.
This renames:
- ac_thread_trace_data to ac_sqtt (this is the main struct)
- ac_thread_trace_info to ac_sqtt_data_info
- ac_thread_trace_se to ac_sqtt_data_se
- ac_thread_trace to ac_sqtt_trace (this contains trace only)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22732>
RGP seems to use that to calibrate timestamps. This also introduces
a new helper to reset thread data between captures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22594>
This change updates the sqtt layer to add mesh shader specific RGP
instrumentation logic. This should allow RGP to correctly identify GPU
work derived from vkCmdDrawMeshTasksEXT, vkCmdDrawMeshTasksIndirectEXT,
and vkCmdDrawMeshTasksIndirectCountEXT API calls.
This change also updates the mesa-to-RGP shader stage translation logic
to handle the mesh & task stages.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21917>
This introduces tracking of the required semaphore values in pipelines,
which is then propagated to cmd_buffers on bind. Each queue also keeps
track the maximum count it has waited for, so that we can avoid the waiting
overhead once all the shaders are loaded and referenced.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
Following PAL's implementation, this patch avoids allocating shader code
buffers in BAR and use SDMA to upload them to invisible VRAM
directly.
For some games like HZD, shaders can take as much as 400MB, which exceeds
the non-resizable BAR size (256MB) and cause inconsistent spilling
behavior. The kernel will normally move these to invisible VRAM on its own,
but there are a few cases that it does not reliably happen. This patch does
the moving explicitly in the driver to ensure predictable results.
In this patch, we upload the shaders synchronously; so the shader will be
ready as soon as vkCreate*Pipeline returns. A following patch will make
this asynchronous and don't block until we see a use of the pipeline.
As a side effect, when SQTT is used we now store the shaders on a cacheable
buffer which would speed up writing the trace to the disk.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16271>
RGP traces need a dump of shader code in order to display ISA and
instruction trace. Previously, this was read back from GPU at trace
creation time. However, for future changes that implements upload shader
to invisible VRAM, the upload destination will be a temporary staging
buffer and will be only accessible during shader creation.
To allow dumping in such cases, copy the shader code to a separate buffer
at creation time, if thread tracing is enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21513>
This change is wrong for two reasons:
- it hangs most of the time maybe, because changing PSTATE when the
application is running is broken somehow
- it increases the time between triggering and generating the capture
considerably, because there is a delay for changing PSTATE
This restores previous logic where PSTATE is set to profile_peak at
logical device creation. Though, it also re-introduces an issue with
multiple logical devices (kernel returns -EBUSY) but this will be
fixed in the next commit.
This fixes GPU hangs when trying to record RGP captures on my NAVI21.
Note that profile_peak is only required for some RDNA2 chips (including
VanGogh).
Cc: mesa-stable
This reverts commit 923a864d94.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21222>
They aren't executable pipelines and they might not contain all
shader stages.
This fixes a crash when generating RGP captures with GPL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21235>
Keeping around copies of the BVHs in CPU memory can cause issues with
Applications creating a large amount of acceleration structures (Control).
This commit adds back the old path of copying acceleration structures
while still keeping the deferred, possibly more accurate path around.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20700>
RGP requires shaders to be uploaded consecutively inside the same
buffer object. Otherwise, either it makes the driver generating
huge traces (ie. in GiB) or it fails to load traces at all. Hopefully,
this will be improved soon when AMDGPU drivers will have GPL support.
To workaround this, the driver relocates graphics shaders in the same
buffer object when a pipeline is created. Then at draw time, it
overwrites SPI_SHADER_PGM_xxx registers to make sure SQTT can match
between emitted and exported shaders. It's a bit suboptimal because
graphics shaders are uploaded twice but it's the best solution I found.
This will allow to implement GPL caching without breaking capturing
shaders with RGP.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21078>
If the buffer hasn't been bound to memory yet, we will dereference a
NULL pointer in radv_CreateAccelerationStructureKHR.
cc: mesa-stable
Closes: #8199
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21019>
This game seems to incorrectly set the render area and since we switched
to full dynamic rendering, the framebuffer dimensions is no longer used.
Forcing the render area to be the framebuffer dimensions restore the
previous logic and it fixes rendering issues.
Fixes: c7d0d328d5 ("radv: Set the window scissor to the render area, not framebuffer")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20900>
Every layer has its own dispatch table that it can use to call down the
layer stack. This allows us to use RRA and RGP tracing simultaneously.
Using application layers with tracing should work as well.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20439>
Delete radv_CmdDispatch, radv_CmdSetDeviceMask, and
radv_GetDeviceGroupPeerMemoryFeatures so that the vk_common_*
versions will be used instead. This will avoid repeated code.
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20218>
Such cursed behavior is almost non existent in practise. When capturing
a Doom Eternal, this warning spams the output for no reason.
The warning is also unnecessary since we copy acceleration structures
right after building them now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20047>
AMDGPU pstate is per-device, not per Vulkan logical devices. The same
AMDGPU device is shared accross logical devices because the driver
creates only one winsys per fd. The kernel only allows one context
at a time per AMDGPU device, otherwise it returns -EBUSY.
Fixes this by acquiring pstate on-demand to avoid this multiple
logical device problem.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17712>
Gets rid of a bit of code and fixes the RRA accel_struct_vas table if
the BO is freed before vkDestroyAccelerationStructureKHR is called.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18530>
When validating a BVH, rra_validate_node uses _mesa_hash_table_u64_search to lookup, whether a BLAS pointer is valid. Since _mesa_hash_table_u64_search returns the data field of the found entry, we need to populate it. Otherwise, the NULL-check won't work.
Fixes: 5749806 ("radv: Add Radeon Raytracing Analyzer trace dumping utilities")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18530>
Also, update list of expected failures.
dEQP-VK.image.sample_texture.*_bit_compressed_format_two_samplers_*
now reliably pass on Polaris10 (GFX8) and Pitcairn (GFX6).
Stoney has new failures but given there is already a lot of
depth/stencil resolve failures, we shouldn't worry about them.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15587>
This reports RT commands like vkCmdTraceRaysKHR and
vkCmdBuildAccelerationStructuresKHR in RGP.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18496>
They aren't supported and lead to GPU hangs.
Reported-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17256>
If we introduce another queue type (video decode) we can have a
disconnect between the RADV_QUEUE_ enum and the API queue_family_index.
currently the driver has
GENERAL, COMPUTE, TRANSFER which would end up at QFI 0, 1, <nothing>
since we don't create transfer.
Now if I add VDEC we get
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, 1, <nothing>, 2
or if you do nocompute
GENERAL, COMPUTE, TRANSFER, VDEC at QFI 0, <nothing>, <nothing>, 1
This means we have to add a remapping table between the API qfi
and the internal qf.
This patches tries to do that, in theory right now it just adds
overhead, but I'd like to exercise these paths.
v2: add radv_queue_ring abstraction, and pass physical device in,
as it makes adding uvd later easier.
v3: rename, and drop one direction as unneeded now, drop queue_family_index
from cmd_buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13687>
This is incorrect, it will calls the function recursively.
It seems there is random build failures with common runtime code
and MSVC but that's a different issue.
Closes: #5815
Fixes: 46c59e8fd6 ("radv: Remove dependencies on vk_common entrypoints.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14374>