It's not possible when they aren't all inlined because they need to be
copied to the upload BO and the DGC shader also copies the ones that
come from the indirect layout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25935>
With DGC, push constants can be set from the cmdbuf (CmdPushConstants())
or from the indirect layout. Instead of always emitting inlined push
constants from the DGC shader, just update the ones that come from the
indirect layout and rely on cmdbuf updates for the other ones.
With that, it should be possible to preprocess push constants with
graphics when all can be inlined in shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25935>
No change in behavior. The previous overalignment is preserved.
It sets ib_pad_dw_mask sooner.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25578>
This re-introduces "radv: fix alignment of DGC command buffers" and
"radv/amdgpu: fix alignment of command buffers" which were valid
changes.
IBs need to be aligned to the IB size requirement, not the number of
padded NOPs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25588>
Otherwise, DGC command buffers might not be correctly aligned.
This fixes a regression with the vkd3d-proton DGC tests.
Fixes: 4f660f9937 ("ac/gpu_info: pad IBs according to ib_size_alignment")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25500>
Some RGP data showing that a large amount of NOPs might be a performance
concern.
Some data from a Granite demo repurposed as benchmark:
- with max_count = 16, actual draw count 1-4, the new path is ~5% slower
- with max_count = 2048, actual draw count 1-4, the new path is >2x as fast.
- with max_count = 16384, actual draw count 1-4, the new path is >7x as fast.
Due to the new path being slower in e.g. small cmdbuffers I added a heuristic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25046>
The preprocess buffer is the buffer used to generate the cmdbuf. It
was aligned to 256 bytes but the correct alignment is actually
ac_gpu_info::ib_alignment.
Otherwise, if a DGC IB is executed like a IB1, this hits an assertion
in radv_amdgpu_cs_submit() because the alignment is incorrect.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23764>
For emitting VK_INDIRECT_COMMANDS_TOKEN_TYPE_STATE_FLAGS_NV.
The scissor workaround for GFX9 is only needed if the state is emitted,
so move it there as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23584>
Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
Need to explicitly clamp the indirect count against maxSequencesCount,
or we risk writing bogus commands into spill region.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23252>
radv_get_user_sgpr() no longer relies on radv_pipeline which is
another step for moving the shaders array outside of it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21878>
Instead, check if the cache is the meta shader cache. This catches the
shaders created by radv_create_radix_sort_u64().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21606>
Delete radv_CmdDispatch, radv_CmdSetDeviceMask, and
radv_GetDeviceGroupPeerMemoryFeatures so that the vk_common_*
versions will be used instead. This will avoid repeated code.
Signed-off-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20218>