Previously, we just used the next block after a loop that
has a back-edge. This assumes that loop-exit blocks can
only be removed when falling through to the next block,
when in fact it can also be a jump to somewhere else,
in future even to some block before the actual loop.
12 (0.02% of 79395) affected shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
They might be needed as convergence point in order to
insert code (e.g. for loop alignment, wait states, etc.).
Totals from 1 (0.00% of 79395) affected shaders:
CodeSize: 12672 -> 12716 (+0.35%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
This returns the underlying device pointer but as an opaque
uintptr_t.
This will be required because libdrm_amdgpu will return the
same device when called multiple times from the same process.
radeonsi relies on the pointer value to identify if the device
are the same and adjust the synchronisation logic based on that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33081>
Requests with no reply will report errors by default to the event
loop, which then usually cause the not very useful log like this
to be printed:
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 149 ()
Minor opcode of failed request: 2
Serial number of failed request: 33
Current serial number in output stream: 34
This commit introduce some helpers to handle the xcb errors in Mesa,
and be able to report errors properly.
For instance the same error will now log:
MESA: error: dri3_alloc_render_buffer:1634 xcb_dri3_pixmap_from_buffer[s] failed
MESA: error: X error: 11
It's not fixing the underlying issue, but at least now tests like
"glx-visuals-stencil -pixmap" and "glx-visuals-depth pixmap" fail
properly.
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33036>
* GLES3.x is only valid for x <= 2
* The expected error is GLXBadProfileARB, not BadValue
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33036>
GDS is gone on GFX12 and generated/written primitives queries need to
be emulated using global atomics. This new user SGPR will be used to
pass the 32-bit VA of the shader query buffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33041>
The original implementation based on RadeonSI was broken for
pause/resume and for indirect draws with a counter buffer basically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33017>
This shouldn't be needed because GE_GS_OREDERD_ID is always reset to 0
when streamout is started. Thus it's technically impossible that the
ordered ID is more than 12-bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33017>
There is nothing to do for
VK_PIPELINE_CACHE_CREATE_INTERNALLY_SYNCHRONIZED_MERGE_BIT_KHR because
the vulkan/runtime code already locks the dstCache unconditionally.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33091>
The opposite is already supported. Note that only one aspect (depth or
stencil) is supported when it's a copy<->depth/stencil copy, and
multiplanar images aren't supported.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33091>
This is pure guess but I think GFX12 now uses 48-bits VAs for
configuring the SQTT buffer. This isn't yet enough to generate a
capture because it's missing some info I don't know, but it's a start.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33068>
Determine whether the device has hardware raytracing support early, and
then use this result where needed, instead of checking for `gfx_level`
every time.
This is a prerequisite for CYAN_SKILLFISH chip enablement. This chip is
still GFX10, not GFX10_3, but has hardware support for accelerated
`image_bvh{,64}_intersect_ray` instructions. Just checking for `gfx_level`
is insufficient for it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33109>
RadeonSI compiles shader variants with streamout disabled but RADV
doesn't do that. The alternative solution is to set the streamout
buffer size to 0 to indicate that streamout isn't bound.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33058>
Without this, WQM is only used for the lds_param_load like this:
s_wqm_b64 exec, exec
lds_param_load v5, attr0.x wait_vdst:15
s_mov_b64 exec, s[0:1]
v_mov_b32_dpp v5, v5 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf
With this change we get:
s_wqm_b64 exec, exec
lds_param_load v5, attr0.x wait_vdst:15
s_mov_b64 exec, s[0:1]
...
s_wqm_b64 exec, exec
v_mov_b32_dpp v5, v5 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf
s_mov_b64 exec, s[0:1]
This fixes KHR-GL46.shaders.uniform_block.random.nested_structs_instance_arrays.0
and other similar tests with LLVM.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32959>
And to ac_build_dpp because it's used from quad_swizzle.
No functional changes but will be used in the next commit.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32959>
Update expectation files for the test
runs with kernel 6.13-rc4.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Reviewed-by: David Heidelberg <None>
Reviewed-by: Sergi Blanch Torné <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32788>
On the host side virglrenderer creates dmabuf on demand when:
* cpu mapping is requested
* setting up scan out
* sharing buffers between guest processes
On-demand dmabuf creation only works if the ctx that created the
BO still exists and knows about this BO. This assumption works ok for
the first 2 cases, but can break with the last one (and it does cause
issues on Android). eg:
* process A allocates BO and exports it as a guest dmabuf
* process A closes its handle to the BO (-> detach_resource)
* process B imports the guest dmabuf -> this triggers the attach_resource
function in virglrenderer. If the given resource isn't a
VIRGL_RESOURCE_FD_DMABUF it'll try to get one... But for this to work,
process A needs to be used -> this fails because this resource was
detached from it.
The reason we create dmabuf on demand is to avoid hitting the number of
open file descriptor limit. So to cover the 3rd case, we'll use the
VIRTGPU_BLOB_FLAG_USE_SHAREABLE flag, but try to limit to as few possible
buffers as possible.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
The virtio-gpu guest kmd reports timelines as supported, so querying
DRM_CAP_SYNCOBJ_TIMELINE as vk_drm_syncobj_get_type() does will return
true.
The native context code on the other hand doesn't support timelines,
and support is disabled in the "ac/virtio: disable timeline_syncobj support"
commit.
Fix the inconsistency by manually disabling timeline support when
info.has_timeline_syncobj is false.
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
No big code changes are needed to support virtio. The main caveat with
radv is about buffer allocations.
Allocating a cpu visible buffer requires the host process (eg qemu) to
create a dmabuf, mmap it, then map the host CPU address into the guest
application CPU address space.
The first issue is about the number of dmabuf created because we
might hit the number of open file limit. The host process limit can
be raised but we would hit the second issue - at least on qemu:
there's a limit on how many sections can be mapped and ultimately
we hit this assert:
assert(map->sections_nb < TARGET_PAGE_SIZE);
(the third issue is a performance one: these operations have a cost,
and this increases some Vulkan app loading times)
radeonsi is not really affected because it's using pb_slab to
suballocate small buffers from larger ones.
radv on the other hand doesn't, so if an app decides to allocate
lots of cpu visible small buffers, we're likely to fail.
Earlier versions of the amdgpu nctx code had a suballoctor, but it
was removed to simplified the code. It could be restored later; or
radv could be modified to use a suballocator (like anv does AFAICT).
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
They're not supported yet so let's not pretend they are.
In particular use_local_buffers can cause VM_ALWAYS_VALID to be
used, which then prevents the creation of a dmabuf.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
Native context support is implemented by diverting the libdrm_amdgpu functions
into new functions that use virtio-gpu.
VA allocations are done directly in the guest, using newly exposed libdrm_amdgpu
helpers (retrieved through dlopen/dlsym).
Guest <-> Host roundtrips can be expensive so we try to avoid them as much as
possible. When possible we also don't wait for the host reply in case where
it's not needed to get correct result.
Implicit sync works because virtio-gpu commands are submitted in order to the
host (there a single queue per device, shared by all the guest processes).
virtio-gpu also only supports one context per file description (but multiple
file descriptions per process) while amdgpu only allows one fd per process,
but multiple contexts per fd. This causes synchronization problems, because
virtio-gpu drops all sync primitive if they belong to the same fd/context/ring:
ie the amdgpu_ctx can't be expressed in virtio-gpu terms.
For now the solution is to only allocate a single amdgpu_ctx per application.
Contrary to radeonsi/radv, amdgpu_virtio can use libdrm_amdgpu directly: the
ones that don't rely on ioctl() are safe to use here.
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>
This is required to implement virtio native-context.
In a virtualized environment, most of the functions provided
by libdrm_amdgpu will be implemented using virtio.
This allows to implement efficient virtualization, by
forwarding the kernel API to the host, instead of the GL/VK calls.
Similarly, the raw 'fd' or 'gem_handle' arguments are replaced
by opaque types. This allows to encapsulate all the needed
state in the handle, and use unmodified API between baremetal
and virtualized contexts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658>