This has been tested again with vkoverhead on 4 different CPUs and
using 32-bit counters is the fastest combination overall.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34229>
A begin/end sequence is something like (it's all macros based):
radeon_begin(cs);
radeon_emit(PKT3(PKT3_DRAW_INDEX_AUTO, 1, cmd_buffer->state.predicating));
radeon_emit(vertex_count);
radeon_emit(V_0287F0_DI_SRC_SEL_AUTO_INDEX | use_opaque);
radeon_end();
This is loosely based on RadeonSI (see !8653 (a0978fff)) and it seems
indeed faster overall.
The main goal of this rework is to re-use the same logic as RadeonSI
for paired packets on GFX12 (also GFX11 dGPUs) because it's supposed
to be way faster, especially on GFX12 where the CP is slow. The other
goal is to share more cmdbuf emission between both drivers in the near
future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34229>
This is the base helper for emitting packets, it will be much more
closer to the new command buffer recording mecanishm if we decide to
use the same helpers as RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29192>
We already track a couple of registers per cmdbuf and this introduces
a generic mechanism, instead of having a bunch of last_xxx fields.
Loosely based on RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28644>
The vast majority of AMD GPUs (except the very first GCN) have
the same SDMA packet format, so let's just call it SDMA instead
of CIK_SDMA.
(And leave the oldest GPUs with SI_SDMA as they are now.)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26110>
Create a version of this function that takes a CS and queue family.
move it to radv_cs.h so it can be called from multiple other files.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25770>
The third parameter of PKT3 is the predicate bit and this was wrong.
PAL sets the RESET_FILTER_CAM bit when emitting SQTT userdata.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25158>
The check for max_dw means that none of checks triggered reliably
when we had an issue. Use a stricter reserved dw measure to increase
the probability of catching issues.
Adds a radeon_check_space to some places after cs_create as they
previously relied on the min. cs size, but that would still trigger
the checks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20152>
Instead of having ac_set_reg_cu_en that sets the register, replace it with
ac_apply_cu_en that only returns the modified register value,
which allows a large simplification in both drivers because a lot of code
becomes duplicated after it's switched to ac_apply_cu_en.
RADV also didn't apply it to a few registers. Fixed.
This removes 82 lines of code in total.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21641>
RMW context registers have been removed in RadeonSI a while ago
because they don't seem good for performance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18234>
For emitting RMW packets in the command stream. This new helper
will be useful for implementing extended dynamic states to only
overwrite the fields that need to be updated instead of storing
more values in the pipeline.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4531>
Using the wrong bounds
Fixes: "219d6939df8 radv: add more assertions to make sure packets are correctly emitted"
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is ported from AMDVLK, it's probably not requires unless
we want to use "real time queues", but it might be nice to just have
in place.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>