Instead of assuming that VM_ALWAYS_VALID is always available,
make its use conditionnal on its support.
This allows to remove the virtio nctx special case (where
VM_ALWAYS_VALID is only possible with virtio for buffers that
also have the NO_CPU_ACCESS flag since CPU access is implemented
through dmabuf on the host).
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34470>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34470>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34470>
VK_ERROR_INITIALIZATION_FAILED will fail physical device enumeration.
Returning VK_ERROR_INCOMPATIBLE_DRIVER means that the driver can still
be used on supported GPUs when multiple GPUs are installed.
cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34783>
A begin/end sequence is something like (it's all macros based):
radeon_begin(cs);
radeon_emit(PKT3(PKT3_DRAW_INDEX_AUTO, 1, cmd_buffer->state.predicating));
radeon_emit(vertex_count);
radeon_emit(V_0287F0_DI_SRC_SEL_AUTO_INDEX | use_opaque);
radeon_end();
This is loosely based on RadeonSI (see !8653 (a0978fff)) and it seems
indeed faster overall.
The main goal of this rework is to re-use the same logic as RadeonSI
for paired packets on GFX12 (also GFX11 dGPUs) because it's supposed
to be way faster, especially on GFX12 where the CP is slow. The other
goal is to share more cmdbuf emission between both drivers in the near
future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34229>
Fixes assertion failure when initializing memory types for devices without
dedicated vram.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33978>
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.
It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html
Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
While theoretically all GFX11+ GPUs have an attribute ring, it is
nicer to have this property instead of deciding ad-hoc based on
the GFX level.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33218>
This is needed for fossilize-replay.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 14e3231b56 ("radv: add a flag to indicate ray tracing support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33201>
The virtio-gpu guest kmd reports timelines as supported, so querying
DRM_CAP_SYNCOBJ_TIMELINE as vk_drm_syncobj_get_type() does will return
true.
The native context code on the other hand doesn't support timelines,
and support is disabled in the "ac/virtio: disable timeline_syncobj support"
commit.
Fix the inconsistency by manually disabling timeline support when
info.has_timeline_syncobj is false.
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
No big code changes are needed to support virtio. The main caveat with
radv is about buffer allocations.
Allocating a cpu visible buffer requires the host process (eg qemu) to
create a dmabuf, mmap it, then map the host CPU address into the guest
application CPU address space.
The first issue is about the number of dmabuf created because we
might hit the number of open file limit. The host process limit can
be raised but we would hit the second issue - at least on qemu:
there's a limit on how many sections can be mapped and ultimately
we hit this assert:
assert(map->sections_nb < TARGET_PAGE_SIZE);
(the third issue is a performance one: these operations have a cost,
and this increases some Vulkan app loading times)
radeonsi is not really affected because it's using pb_slab to
suballocate small buffers from larger ones.
radv on the other hand doesn't, so if an app decides to allocate
lots of cpu visible small buffers, we're likely to fail.
Earlier versions of the amdgpu nctx code had a suballoctor, but it
was removed to simplified the code. It could be restored later; or
radv could be modified to use a suballocator (like anv does AFAICT).
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
Native context support is implemented by diverting the libdrm_amdgpu functions
into new functions that use virtio-gpu.
VA allocations are done directly in the guest, using newly exposed libdrm_amdgpu
helpers (retrieved through dlopen/dlsym).
Guest <-> Host roundtrips can be expensive so we try to avoid them as much as
possible. When possible we also don't wait for the host reply in case where
it's not needed to get correct result.
Implicit sync works because virtio-gpu commands are submitted in order to the
host (there a single queue per device, shared by all the guest processes).
virtio-gpu also only supports one context per file description (but multiple
file descriptions per process) while amdgpu only allows one fd per process,
but multiple contexts per fd. This causes synchronization problems, because
virtio-gpu drops all sync primitive if they belong to the same fd/context/ring:
ie the amdgpu_ctx can't be expressed in virtio-gpu terms.
For now the solution is to only allocate a single amdgpu_ctx per application.
Contrary to radeonsi/radv, amdgpu_virtio can use libdrm_amdgpu directly: the
ones that don't rely on ioctl() are safe to use here.
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
This is required to implement virtio native-context.
In a virtualized environment, most of the functions provided
by libdrm_amdgpu will be implemented using virtio.
This allows to implement efficient virtualization, by
forwarding the kernel API to the host, instead of the GL/VK calls.
Similarly, the raw 'fd' or 'gem_handle' arguments are replaced
by opaque types. This allows to encapsulate all the needed
state in the handle, and use unmodified API between baremetal
and virtualized contexts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
This imports 35 libdrm_amdgpu functions into Mesa.
The following 15 functions are still in use:
amdgpu_bo_alloc
amdgpu_bo_cpu_map
amdgpu_bo_cpu_unmap
amdgpu_bo_export
amdgpu_bo_free
amdgpu_bo_import
amdgpu_create_bo_from_user_mem
amdgpu_device_deinitialize
amdgpu_device_get_fd
amdgpu_device_initialize
amdgpu_get_marketing_name
amdgpu_query_sw_info
amdgpu_va_get_start_addr
amdgpu_va_range_alloc
amdgpu_va_range_free
We can't import them because they make sure that we only use 1 VMID
per process shared by all APIs. (except the marketing name)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32067>
There is nothing amdgpu specific here so this does not need to be
abstracted away. max_alignment also is not used in winsys code.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31643>
The IB2 packet is only supported on the graphics queue. To execute DGC
IB on compute, the previous solution was to submit it separately
without any chaining. Though this solution was incomplete because it's
easy to reach the maximum number of IBs per submit when there is a lot
of ExecuteIndirect() calls.
To fix that, the proposed solution is to implement DGC IB chaining when
it's executed on the compute only. The idea is to add a trailer that is
added at the beginning of the DGC IB (to know the offset). This trailer
is used to chain back back the DGC IB to a normal CS, it's patched at
execution time. Patching is fine because it's not allowed to execute
the same DGC IB concurrently and the entire solution relies on that.
When the DGC IB is executed on graphics, the trailer isn't patched and
it only contains NOPs padding. Performance should be mostly similar.
This fixes
dEQP-VK.dgc.nv.compute.misc.execute_many_*_primary_cmd_compute_queue.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30809>
This can be triggered with DGC if the maximum number of sequences count
is too high. Luckily, vkd3d-proton doesn't do that, but it should be
fixed for EXT DGC.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31464>