Commit graph

523 commits

Author SHA1 Message Date
Pierre-Eric Pelloux-Prayer
df1224c8b2 radv: rework VM_ALWAYS_VALID handling
Instead of assuming that VM_ALWAYS_VALID is always available,
make its use conditionnal on its support.

This allows to remove the virtio nctx special case (where
VM_ALWAYS_VALID is only possible with virtio for buffers that
also have the NO_CPU_ACCESS flag since CPU access is implemented
through dmabuf on the host).

Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34470>
2025-06-27 08:15:50 +00:00
Pierre-Eric Pelloux-Prayer
999d5098b4 radv/virtio: support vpipe
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34470>
2025-06-27 08:15:50 +00:00
Pierre-Eric Pelloux-Prayer
5d63d2fb04 ac/drm: replace direct ioctl calls by util_sync_provider
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34470>
2025-06-27 08:15:50 +00:00
Eve
f4ad6e6d4a radv: add RADV_PERFTEST option to turn off gtt spilling
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8107
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35355>
2025-06-17 06:46:27 +00:00
Samuel Pitoiset
25eb836eec radv: fix CP DMA with NULL PRT pages on GFX8-9
On GFX8-9 (starting from Polaris10), CP DMA is broken with NULL PRT
pages. It doesn't read 0 and doesn't discard writes which can cause
GPU hangs.

Fix that by always using the compute path when a BO is sparse.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12828
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35071>
2025-05-21 09:41:23 +00:00
Konstantin Seurer
84b9c281fe radv: Return VK_ERROR_INCOMPATIBLE_DRIVER for unsupported devices
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
VK_ERROR_INITIALIZATION_FAILED will fail physical device enumeration.
Returning VK_ERROR_INCOMPATIBLE_DRIVER means that the driver can still
be used on supported GPUs when multiple GPUs are installed.

cc: mesa-stable

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34783>
2025-05-07 08:26:33 +02:00
Paul Gofman
96765935e8 radv/amdgpu: Fix hash key in radv_amdgpu_winsys_destroy().
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34774>
2025-05-02 07:51:23 +00:00
Samuel Pitoiset
410f7f9f6e radv: only enable DCC for invisible VRAM on GFX12
DCC should only be allowed on invisible VRAM, otherwise the CPU could
read the data and it will read garbage if it's compressed.

This also caused GPU hangs after suspend/resume probably because
some buffers were compressed when moved back from GTT to VRAM.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12962
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12922
Fixes: 9af11bf306 ("radv: add initial DCC support on GFX12")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34347>
2025-04-14 07:39:33 +00:00
Samuel Pitoiset
042770ceea ac,radv: remove has_scheduled_fence_dependency
This isn't used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34375>
2025-04-07 06:44:22 +00:00
Samuel Pitoiset
f0b3a6f9d4 radv: rework command buffer emission with begin/end sequences
A begin/end sequence is something like (it's all macros based):

   radeon_begin(cs);
   radeon_emit(PKT3(PKT3_DRAW_INDEX_AUTO, 1, cmd_buffer->state.predicating));
   radeon_emit(vertex_count);
   radeon_emit(V_0287F0_DI_SRC_SEL_AUTO_INDEX | use_opaque);
   radeon_end();

This is loosely based on RadeonSI (see !8653 (a0978fff)) and it seems
indeed faster overall.

The main goal of this rework is to re-use the same logic as RadeonSI
for paired packets on GFX12 (also GFX11 dGPUs) because it's supposed
to be way faster, especially on GFX12 where the CP is slow. The other
goal is to share more cmdbuf emission between both drivers in the near
future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34229>
2025-04-01 06:18:28 +00:00
Rhys Perry
0619cc45b7 radv/winsys: set has_distributed_tess for null winsys
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33978>
2025-03-26 20:52:53 +00:00
Rhys Perry
ee0be147b9 radv/winsys: set gart_page_size for null winsys
Fixes assertion failure when initializing memory types for devices without
dedicated vram.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33978>
2025-03-26 20:52:53 +00:00
Rhys Perry
4632ca258b radv/winsys: increase gfx12 vgprs for null winsys
LLVM has Feature1_5xVGPRs for both gfx1200 and gfx1201.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33978>
2025-03-26 20:52:53 +00:00
Bas Nieuwenhuizen
61feea6954 radv: Move support check out of winsys.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
To get the right error code. Mostly shouldn't be winsys dependent
anyway, outside of the idea that if we explicitly emulate a device
we should just assume th euser knows what they're doing.

Fixes: c942d957b0 ("radv: fail to initialize when the AMD GPU generation is unsupported")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12792
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33964>
2025-03-14 23:18:13 +00:00
Samuel Pitoiset
b8e3f66328 radv/winsys: enable has_timeline_syncobj for the null winsys
For testing the dedicated sparse queue drirc with the null winsys.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33875>
2025-03-12 09:07:16 +00:00
Samuel Pitoiset
c627097841 radv/amdgpu: fix device deduplication
To correctly deduplicate device inside the winsys, it should use the
fd or amdgpu_device_handle. Using the allocated ac_drm_device as key
is obviously broken.

Not deduplicating devices breaks memory budget and a bunch of games
were broken.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12686
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12775
Fixes: a565f2994f ("amd: move all uses of libdrm_amdgpu to ac_linux_drm")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34005>
2025-03-11 22:35:46 +00:00
Samuel Pitoiset
01f92acf10 radv/winsys: use real info for GFX12 in the null winsys
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33970>
2025-03-11 06:50:49 +00:00
Ivan Avdeev
ff6504d4c0 radv: add experimental support for AMD BC-250 board
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.

It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html

Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
2025-03-04 08:07:31 +00:00
Julia Zhang
313aa44bf1 radv: add obj_id to radeon_winsys_bo
mem->bo->obj_id will be used by device memory report.

Signed-off-by: Julia Zhang <julia.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33088>
2025-03-03 08:26:51 +00:00
Samuel Pitoiset
9993f3dd6a ac,radv,radeonsi: add new GFX12_DCC_WRITE_COMPRESS_DISABLE tiling flag
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33301>
2025-02-03 21:12:07 +00:00
Timur Kristóf
e1be943f10 ac/nir/ngg: Add and use a has_ngg_passthru_no_msg field to ac_gpu_info.
Instead of using the chip family field.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
2025-01-30 15:26:45 +00:00
Timur Kristóf
a40000b85b ac/nir/ngg: Add and use a has_ngg_fully_culled_bug field to ac_gpu_info.
Better than applying the workaround ad-hoc based on GFX level.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
2025-01-30 15:26:45 +00:00
Timur Kristóf
cad0d26dbf ac/nir/ngg: Add and use a has_attr_ring field to ac_gpu_info.
While theoretically all GFX11+ GPUs have an attribute ring, it is
nicer to have this property instead of deciding ad-hoc based on
the GFX level.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
2025-01-30 15:26:45 +00:00
Timur Kristóf
b163ce51b1 ac/nir/ngg: Add and use a has_attr_ring_wait_bug field to ac_gpu_info.
And apply the attribute ring wait workaround based on the new field.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
2025-01-30 15:26:45 +00:00
Samuel Pitoiset
d6f9c19755 radv/amdgpu: add support for AMDGPU_GEM_CREATE_GFX12_DCC
This flags will be used to set PTE.DCC to VRAM allocations
(ie. compression).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33284>
2025-01-30 08:18:22 +00:00
Samuel Pitoiset
ee4a1021d1 radv: add support for BO metadata on GFX12
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33193>
2025-01-25 02:10:02 +00:00
Rhys Perry
5cc977bee4 radv: set has_image_bvh_intersect_ray for null winsys
This is needed for fossilize-replay.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 14e3231b56 ("radv: add a flag to indicate ray tracing support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33201>
2025-01-24 18:08:12 +00:00
Samuel Pitoiset
ede0d534ef radv: add GFX12 support to the null winsys
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33063>
2025-01-17 10:51:49 +00:00
Samuel Pitoiset
4526f2692e radv: mark AMD CDNA as unsupported
No access to the hw and likely broken.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33031>
2025-01-17 01:30:16 +00:00
Samuel Pitoiset
c942d957b0 radv: fail to initialize when the AMD GPU generation is unsupported
Better to be conservative than allowing something that isn't supposed
to be working.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33031>
2025-01-17 01:30:16 +00:00
Pierre-Eric Pelloux-Prayer
1278d5286c radeonsi, radv, virtio: use AMDGPU_GEM_CREATE_VIRTIO_SHARED
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
2025-01-16 12:24:33 +00:00
Pierre-Eric Pelloux-Prayer
6b6340fea1 radv/virtio: disable syncobj timeline support
The virtio-gpu guest kmd reports timelines as supported, so querying
DRM_CAP_SYNCOBJ_TIMELINE as vk_drm_syncobj_get_type() does will return
true.

The native context code on the other hand doesn't support timelines,
and support is disabled in the "ac/virtio: disable timeline_syncobj support"
commit.

Fix the inconsistency by manually disabling timeline support when
info.has_timeline_syncobj is false.

Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
2025-01-16 12:24:33 +00:00
Pierre-Eric Pelloux-Prayer
9de728c4d0 radv: enable virtio native context support
No big code changes are needed to support virtio. The main caveat with
radv is about buffer allocations.

Allocating a cpu visible buffer requires the host process (eg qemu) to
create a dmabuf, mmap it, then map the host CPU address into the guest
application CPU address space.

The first issue is about the number of dmabuf created because we
might hit the number of open file limit. The host process limit can
be raised but we would hit the second issue - at least on qemu:
there's a limit on how many sections can be mapped and ultimately
we hit this assert:

   assert(map->sections_nb < TARGET_PAGE_SIZE);

(the third issue is a performance one: these operations have a cost,
and this increases some Vulkan app loading times)

radeonsi is not really affected because it's using pb_slab to
suballocate small buffers from larger ones.
radv on the other hand doesn't, so if an app decides to allocate
lots of cpu visible small buffers, we're likely to fail.

Earlier versions of the amdgpu nctx code had a suballoctor, but it
was removed to simplified the code. It could be restored later; or
radv could be modified to use a suballocator (like anv does AFAICT).

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
2025-01-16 12:24:32 +00:00
Pierre-Eric Pelloux-Prayer
22263616ed amd: amdgpu-virtio implementation
Native context support is implemented by diverting the libdrm_amdgpu functions
into new functions that use virtio-gpu.
VA allocations are done directly in the guest, using newly exposed libdrm_amdgpu
helpers (retrieved through dlopen/dlsym).

Guest <-> Host roundtrips can be expensive so we try to avoid them as much as
possible. When possible we also don't wait for the host reply in case where
it's not needed to get correct result.

Implicit sync works because virtio-gpu commands are submitted in order to the
host (there a single queue per device, shared by all the guest processes).

virtio-gpu also only supports one context per file description (but multiple
file descriptions per process) while amdgpu only allows one fd per process,
but multiple contexts per fd. This causes synchronization problems, because
virtio-gpu drops all sync primitive if they belong to the same fd/context/ring:
ie the amdgpu_ctx can't be expressed in virtio-gpu terms.

For now the solution is to only allocate a single amdgpu_ctx per application.

Contrary to radeonsi/radv, amdgpu_virtio can use libdrm_amdgpu directly: the
ones that don't rely on ioctl() are safe to use here.

Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
2025-01-16 12:24:32 +00:00
Pierre-Eric Pelloux-Prayer
a565f2994f amd: move all uses of libdrm_amdgpu to ac_linux_drm
This is required to implement virtio native-context.

In a virtualized environment, most of the functions provided
by libdrm_amdgpu will be implemented using virtio.
This allows to implement efficient virtualization, by
forwarding the kernel API to the host, instead of the GL/VK calls.

Similarly, the raw 'fd' or 'gem_handle' arguments are replaced
by opaque types. This allows to encapsulate all the needed
state in the handle, and use unmodified API between baremetal
and virtualized contexts.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
2025-01-16 12:24:32 +00:00
David Rosca
96cb12ac68 radv/amdgpu: Set VCN version for ac_parse_ib
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32760>
2024-12-27 08:17:16 +00:00
Samuel Pitoiset
64101baecf radv: promote VK_KHR_global_priority to core 1.4 API
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32432>
2024-12-03 10:21:54 +00:00
Hans-Kristian Arntzen
cb15b34295 radv/winsys: Report VA mappings in bo_log too.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32146>
2024-11-29 12:57:42 +00:00
Marek Olšák
a3516dafc9 util,amd: add inlinable versions of drmIoctl/drmCommandWrite*
The reason for this is to inline those calls in drivers.
They are very trivial, so why not.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32067>
2024-11-26 00:16:02 -05:00
Marek Olšák
049641ca54 amd: import libdrm_amdgpu ioctl wrappers
This imports 35 libdrm_amdgpu functions into Mesa.

The following 15 functions are still in use:
   amdgpu_bo_alloc
   amdgpu_bo_cpu_map
   amdgpu_bo_cpu_unmap
   amdgpu_bo_export
   amdgpu_bo_free
   amdgpu_bo_import
   amdgpu_create_bo_from_user_mem
   amdgpu_device_deinitialize
   amdgpu_device_get_fd
   amdgpu_device_initialize
   amdgpu_get_marketing_name
   amdgpu_query_sw_info
   amdgpu_va_get_start_addr
   amdgpu_va_range_alloc
   amdgpu_va_range_free

We can't import them because they make sure that we only use 1 VMID
per process shared by all APIs. (except the marketing name)

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32067>
2024-11-25 21:03:41 -05:00
Konstantin Seurer
0963a0a2b4 radv: Move ac_addrlib to the physical device
There is nothing amdgpu specific here so this does not need to be
abstracted away. max_alignment also is not used in winsys code.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31643>
2024-10-28 20:06:38 +00:00
Samuel Pitoiset
2643c48700 radv/amdgpu: remove unused code about external IBs in the submit path
Now that everything is chained, the driver no longer uses external IBs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30809>
2024-10-10 14:08:39 +00:00
Samuel Pitoiset
d686ba36a9 radv/amdgpu: simplify cs_execute_ib()
It's only used for executing IB2 on GFX.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30809>
2024-10-10 14:08:39 +00:00
Samuel Pitoiset
c1b2cb6ef7 radv: implement IB chaining for DGC when it's executed on compute
The IB2 packet is only supported on the graphics queue. To execute DGC
IB on compute, the previous solution was to submit it separately
without any chaining. Though this solution was incomplete because it's
easy to reach the maximum number of IBs per submit when there is a lot
of ExecuteIndirect() calls.

To fix that, the proposed solution is to implement DGC IB chaining when
it's executed on the compute only. The idea is to add a trailer that is
added at the beginning of the DGC IB (to know the offset). This trailer
is used to chain back back the DGC IB to a normal CS, it's patched at
execution time. Patching is fine because it's not allowed to execute
the same DGC IB concurrently and the entire solution relies on that.

When the DGC IB is executed on graphics, the trailer isn't patched and
it only contains NOPs padding. Performance should be mostly similar.

This fixes
dEQP-VK.dgc.nv.compute.misc.execute_many_*_primary_cmd_compute_queue.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30809>
2024-10-10 14:08:39 +00:00
Samuel Pitoiset
e76a26579a radv/amdgpu: add assertions to check the IB size
This can be triggered with DGC if the maximum number of sequences count
is too high. Luckily, vkd3d-proton doesn't do that, but it should be
fixed for EXT DGC.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31464>
2024-10-02 08:44:47 +00:00
Samuel Pitoiset
d1f3a92671 radv/amdgpu: do not use a constant value for the IB size in dwords
Better to avoid magic number.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31464>
2024-10-02 08:44:47 +00:00
Samuel Pitoiset
a7547a9781 radv/amdgpu: assert that the DGC IB VA is correctly aligned
It must be aligned to what the kernel returns.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30768>
2024-08-26 08:22:06 +00:00
Samuel Pitoiset
28c957409f radv/amdgpu: do not check that a CS is aligned if no padding is added
Some video queues don't require padding.

Fixes: d5efbc7f1c ("radv/amdgpu: fix CS padding for non-GFX/COMPUTE queues")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30795>
2024-08-23 13:48:51 +00:00
Samuel Pitoiset
d5efbc7f1c radv/amdgpu: fix CS padding for non-GFX/COMPUTE queues
I forgot that SDMA and VIDEO existed somehow.

Fixes: d690f293c6 ("radv/winsys: pad gfx and compute IBs with only one NOP")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30769>
2024-08-22 11:10:29 +00:00
Samuel Pitoiset
d690f293c6 radv/winsys: pad gfx and compute IBs with only one NOP
1-dword NOPs are slow and it's better to emit a sized NOP packet when
possible.

Based on RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30743>
2024-08-21 14:55:04 +00:00