stable/linux-dmabuf: allow compositors to advertise multiple devices

With multi gpu systems, if the client is rendering on a device other than the
main_device advertised by the compositor, there's no way for the client to know
whether or not the import will be successful except for allocating a buffer and
attempting to import the buffer into the target device, and if that fails,
trying again with different formats, modifiers and potentially allocation flags.
Similarly, compositors that want to supoprt buffers from non-main devices need
to attempt importing the buffer to every device they support. In some cases,
this import may even trigger expensive operations such as the buffer being
copied into system memory by the kernel.

This protocol addition allows compositors to explicitly advertise support for
multiple devices on the system, and specify which formats and modifiers they
can sample from.
For buffer imports, the client explicitly specifies a device for the compositor
to import the buffer to, which allows the compositor to avoid additional copies.

Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
This commit is contained in:
Xaver Hugl 2023-12-10 20:49:45 +01:00
parent c364bf61ec
commit 5b5d60d495
2 changed files with 85 additions and 23 deletions

View file

@ -31,10 +31,12 @@ linux-dmabuf feedback introduces the following concepts:
appropriate format/modifier and also to avoid allocating in private device
memory when cross-device operations are going to happen.
linux-dmabuf feedback implementation notes
==========================================
3. Starting with version 6, the assumption of a single main device is removed.
Clients should instead use the target device of any tranche marked with the
``sampling`` flag instead.
This section contains recommendations for client and compositor implementations.
linux-dmabuf feedback implementation notes for version 4 and 5
==========================================
For clients
-----------
@ -162,6 +164,28 @@ slower than direct scan-out but faster than texturing. For instance, a
compositor could insert an intermediate tranche if it's possible to use a
mem2mem device to convert buffers to be able to use scan-out.
linux-dmabuf feedback implementation nodes for version 6
==========================================
With version 6 of the protocol, most of the logic described for version 4 still
stands, with a few exceptions:
- the main device is no longer sent. If the client needs an equivalent to the
main device, the target device of the first tranche with the ``sampling``
flag can be used.
- compositors should send tranches with the ``sampling`` flag for all devices
they can sample from without triggering buffer migrations.
- clients are strongly recommended to set the device they allocated buffers for
with ``wp_linux_buffer_params.set_sampling_device``. This avoids both implicit
buffer migrations and unnecessary copies by the compositor.
- compositors should avoid directly importing buffers into devices other than the
one set by the client with the ``sampling`` device whenever possible, to avoid
implicit buffer migrations by the graphics driver (which can reduce performance
significantly)
- if the compositor wishes a client to change which device it's rendering on, it
can change the order of tranches (and thus devices) in the per-surface feedback.
In response, clients that are capable of switching devices should re-allocate
on the first device they can use.
``dev_t`` encoding
==================

View file

@ -24,7 +24,7 @@
DEALINGS IN THE SOFTWARE.
</copyright>
<interface name="zwp_linux_dmabuf_v1" version="5">
<interface name="zwp_linux_dmabuf_v1" version="6">
<description summary="factory for creating dmabuf-based wl_buffers">
This interface offers ways to create generic dmabuf-based wl_buffers.
@ -184,7 +184,7 @@
</request>
</interface>
<interface name="zwp_linux_buffer_params_v1" version="5">
<interface name="zwp_linux_buffer_params_v1" version="6">
<description summary="parameters for creating a dmabuf-based wl_buffer">
This temporary object is a collection of dmabufs and other
parameters that together form a single logical buffer. The temporary
@ -222,6 +222,8 @@
<entry name="invalid_wl_buffer" value="7"
summary="invalid wl_buffer resulted from importing dmabufs via
the create_immed request on given buffer_params"/>
<entry name="invalid_dev_t_size" value="8"
summary="an array with mismatching size for a dev_t was used"/>
</enum>
<request name="destroy" type="destructor">
@ -394,9 +396,28 @@
<arg name="format" type="uint" summary="DRM_FORMAT code"/>
<arg name="flags" type="uint" enum="flags" summary="see enum flags"/>
</request>
<request name="set_sampling_device" since="6">
<description summary="set the target device of the wl_buffer">
Set the device the compositor should import the dmabufs to for sampling
in the next create or create_immed request.
To avoid race conditions when the compositor removes a device from the
tranches, it is not a protocol error if the device hasn't been advertised
by the compositor in a tranche with the sampling flag, but the import is
likely to fail in that case.
If the client doesn't know a suitable target device, it shouldn't set one,
and the compositor should attempt import on all devices it supports.
If the array is too small to contain a dev_t or larger than required, the
invalid_dev_t_size error will be emitted.
</description>
<arg name="device" type="array" summary="device dev_t value"/>
</request>
</interface>
<interface name="zwp_linux_dmabuf_feedback_v1" version="5">
<interface name="zwp_linux_dmabuf_feedback_v1" version="6">
<description summary="dmabuf feedback">
This object advertises dmabuf parameters feedback. This includes the
preferred devices and the supported formats/modifiers.
@ -419,10 +440,13 @@
descending order of preference. All formats and modifiers in the same
tranche have the same preference.
To send parameters, the compositor sends one main_device event, tranches
(each consisting of one tranche_target_device event, one tranche_flags
event, tranche_formats events and then a tranche_done event), then one
done event.
To send parameters, the compositor sends one main_device event (unless
the client bound version 6 or above), tranches (each consisting of one
tranche_target_device event, one tranche_flags event, tranche_formats
events and then a tranche_done event), then one done event.
With version 6 and above, the compositor must always advertise at least
one tranche with the sampling flag set.
</description>
<request name="destroy" type="destructor">
@ -463,7 +487,7 @@
<arg name="size" type="uint" summary="table size, in bytes"/>
</event>
<event name="main_device">
<event name="main_device" deprecated-since="6">
<description summary="preferred main device">
This event advertises the main device that the server prefers to use
when direct scan-out to the target device isn't possible. The
@ -488,6 +512,9 @@
If explicit modifiers are not supported and the client performs buffer
allocations on a different device than the main device, then the client
must force the buffer to have a linear layout.
With version 6 and above, this event is no longer sent. Clients should
use a device with the sampling flag in the tranches instead.
</description>
<arg name="device" type="array" summary="device dev_t value"/>
</event>
@ -516,9 +543,9 @@
The client can use this hint to allocate the buffer in a way that makes
it accessible from the target device, ideally directly. The buffer must
still be accessible from the main device, either through direct import
or through a potentially more expensive fallback path. If the buffer
can't be directly imported from the main device then clients must be
still be accessible from a device with the sampling flag, either through
direct import or a potentially more expensive fallback path. If the
buffer can't be directly imported for sampling, then clients must be
prepared for the compositor changing the tranche priority or making
wl_buffer creation fail (see the zwp_linux_buffer_params_v1.create and
create_immed requests for details).
@ -564,19 +591,30 @@
</event>
<enum name="tranche_flags" bitfield="true">
<entry name="scanout" value="1" summary="direct scan-out tranche"/>
<entry name="scanout" value="1" since="4">
<description summary="direct scan-out tranche">
The scanout flag is a hint that direct scan-out may be attempted by
the compositor on the target device if the client appropriately
allocates a buffer. How to allocate a buffer that can be scanned out
on the target device is implementation-defined.
</description>
</entry>
<entry name="sampling" value="2" since="6">
<description summary="sampling tranche">
The sampling flag describes that the compositor is able to efficiently
sample from buffers imported to the target device if the client
appropriately allocates a buffer. How to allocate a buffer that can be
efficiently sampled on the target device is implementation defined.
</description>
</entry>
</enum>
<event name="tranche_flags">
<description summary="tranche flags">
This event sets tranche-specific flags.
The scanout flag is a hint that direct scan-out may be attempted by the
compositor on the target device if the client appropriately allocates a
buffer. How to allocate a buffer that can be scanned out on the target
device is implementation-defined.
This event is tied to a preference tranche, see the tranche_done event.
This event sets tranche-specific flags. This event is tied to a
preference tranche, see the tranche_done event.
With version 6 and above, the compositor must set at least one flag
in each tranche.
</description>
<arg name="flags" type="uint" enum="tranche_flags" summary="tranche flags"/>
</event>