Make sure to take the db alignment into account when sizing the underlying
images.
Fixes a 360p sample from Lynne.
Cc: mesa-stable
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lynne <dev@lynne.ee>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25168>
I don't know if these can be done properly, but for now just don't
emit the standard cp stuff since it hangs the GPU.
"Fixes" dEQP-VK.video.synchronizat*
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25170>
The third parameter of PKT3 is the predicate bit and this was wrong.
PAL sets the RESET_FILTER_CAM bit when emitting SQTT userdata.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25158>
In order to compile monolithic shaders with pipeline libraries, we need
to keep the NIR around for inlining recursive stages.
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21929>
VK_PIPELINE_CREATE_2_RETAIN_LINK_TIME_OPTIMIZATION_INFO_BIT_EXT is only
allowed for pipeline libs, so VK_PIPELINE_CREATE_2_LIBRARY_BIT_KHR
should also be set.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25110>
Instead of failing the copy we can use multiple chunks.
This codepath shouldn't really be used since the source
image should usually be tiled but it still better to not
fail when possible.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24771>
Some RGP data showing that a large amount of NOPs might be a performance
concern.
Some data from a Granite demo repurposed as benchmark:
- with max_count = 16, actual draw count 1-4, the new path is ~5% slower
- with max_count = 2048, actual draw count 1-4, the new path is >2x as fast.
- with max_count = 16384, actual draw count 1-4, the new path is >7x as fast.
Due to the new path being slower in e.g. small cmdbuffers I added a heuristic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25046>
Starfield has a lot of empty ExecuteIndirect() calls. This optimizes
them by using the indirect sequence count as predicate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25060>
cs->base.cdw here is the size of the last CS in the chain, but we are
passing in the first CS in the chain to begin decoding. Hence,
cs->ib_buffers[0].cdw is the correct size here.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25061>
SAMPLE_STREAMOUTSTATS requires PIPELINESTAT_START to be enabled,
otherwise the hw doesn't count anything.
This fixes
dEQP-VK.transform_feedback.primitives_generated_query.concurrent.pipeline_statistics_2.*
on GFX8. GFX6-9 are probably also affected by this bug, but with NGG
these queries are slightly different and don't use legacy streamout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25049>
Add a driconf to force the swapchain size to match
`VkSurfaceCapabilities2KHR::currentExtent` as a workaround for
misbehaved games
Fixes: 6139493ae3 ("vulkan/wsi: return VK_SUBOPTIMAL_KHR for sw/x11 on window resize")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24818>
Otherwise, if a secondary CS is grown and then executed without IB2,
the INDIRECT_BUFFER packet would have been copied but it shouldn't.
This fixes a regression that introduced GPU hangs with
gl_vk_meshlet_cadscene on RDNA2.
Fixes: df0c742543 ("radv/amdgpu: rework growing a CS with the chained IB path slightly")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24891>
If a secondary cmdbuf has been grown and is executed without IB2
(eg. on compute queue or when it's not allowed), the ib size ptr
contains chaining info, which means the IB size was wrong.
This fixes CPU crashes when running gl_vk_meshlet_cadscene.
Fixes: 277b2afd70 ("radv/amdgpu: add support for executing DGC cmdbuf with RADV_DEBUG=noibs")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24891>
Looks like indirect dispatches require an event marker instead of an
event marker with dims. That makes sense somehow given the blocks size
is not known at record time with indirect dispatches.
This allows RGP to report correct block sizes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24994>
It's not disallowed by spec to load instance-related data in case of a
miss where no instance was ever visited. Such loads make no sense, so we
can return garbage, but it mustn't hang the GPU. Initialize the instance
addresses to the TLAS base to make sure we always have valid memory to load from.
Partially fixes GPU hangs in RTX Remix games.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24971>
amdgpu_create_bo_from_user_mem() may fail for multiple reasons.
Only return VK_ERROR_INVALID_EXTERNAL_HANDLE if the kernel
returned EINVAL, which indicates a bad input parameter.
Signed-off-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24858>
Otherwise, if eg. PSIZ is exported the ESGS stride is wrong. This isn't
optimal yet but let's start with this to support separate compilation
of VS/TCS/TES/GS correctly first.
This fixes a bunch of issues when forcing separate compilation on RDNA2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24908>