From 5b5d60d495ef0ca6717ea0bc94043096d0b0a62e Mon Sep 17 00:00:00 2001 From: Xaver Hugl Date: Sun, 10 Dec 2023 20:49:45 +0100 Subject: [PATCH] stable/linux-dmabuf: allow compositors to advertise multiple devices With multi gpu systems, if the client is rendering on a device other than the main_device advertised by the compositor, there's no way for the client to know whether or not the import will be successful except for allocating a buffer and attempting to import the buffer into the target device, and if that fails, trying again with different formats, modifiers and potentially allocation flags. Similarly, compositors that want to supoprt buffers from non-main devices need to attempt importing the buffer to every device they support. In some cases, this import may even trigger expensive operations such as the buffer being copied into system memory by the kernel. This protocol addition allows compositors to explicitly advertise support for multiple devices on the system, and specify which formats and modifiers they can sample from. For buffer imports, the client explicitly specifies a device for the compositor to import the buffer to, which allows the compositor to avoid additional copies. Signed-off-by: Xaver Hugl --- stable/linux-dmabuf/feedback.rst | 30 +++++++++- stable/linux-dmabuf/linux-dmabuf-v1.xml | 78 ++++++++++++++++++------- 2 files changed, 85 insertions(+), 23 deletions(-) diff --git a/stable/linux-dmabuf/feedback.rst b/stable/linux-dmabuf/feedback.rst index a3f94ed..8f09058 100644 --- a/stable/linux-dmabuf/feedback.rst +++ b/stable/linux-dmabuf/feedback.rst @@ -31,10 +31,12 @@ linux-dmabuf feedback introduces the following concepts: appropriate format/modifier and also to avoid allocating in private device memory when cross-device operations are going to happen. -linux-dmabuf feedback implementation notes -========================================== +3. Starting with version 6, the assumption of a single main device is removed. + Clients should instead use the target device of any tranche marked with the + ``sampling`` flag instead. -This section contains recommendations for client and compositor implementations. +linux-dmabuf feedback implementation notes for version 4 and 5 +========================================== For clients ----------- @@ -162,6 +164,28 @@ slower than direct scan-out but faster than texturing. For instance, a compositor could insert an intermediate tranche if it's possible to use a mem2mem device to convert buffers to be able to use scan-out. +linux-dmabuf feedback implementation nodes for version 6 +========================================== + +With version 6 of the protocol, most of the logic described for version 4 still +stands, with a few exceptions: +- the main device is no longer sent. If the client needs an equivalent to the + main device, the target device of the first tranche with the ``sampling`` + flag can be used. +- compositors should send tranches with the ``sampling`` flag for all devices + they can sample from without triggering buffer migrations. +- clients are strongly recommended to set the device they allocated buffers for + with ``wp_linux_buffer_params.set_sampling_device``. This avoids both implicit + buffer migrations and unnecessary copies by the compositor. +- compositors should avoid directly importing buffers into devices other than the + one set by the client with the ``sampling`` device whenever possible, to avoid + implicit buffer migrations by the graphics driver (which can reduce performance + significantly) +- if the compositor wishes a client to change which device it's rendering on, it + can change the order of tranches (and thus devices) in the per-surface feedback. + In response, clients that are capable of switching devices should re-allocate + on the first device they can use. + ``dev_t`` encoding ================== diff --git a/stable/linux-dmabuf/linux-dmabuf-v1.xml b/stable/linux-dmabuf/linux-dmabuf-v1.xml index 9bdb73c..7d727f7 100644 --- a/stable/linux-dmabuf/linux-dmabuf-v1.xml +++ b/stable/linux-dmabuf/linux-dmabuf-v1.xml @@ -24,7 +24,7 @@ DEALINGS IN THE SOFTWARE. - + This interface offers ways to create generic dmabuf-based wl_buffers. @@ -184,7 +184,7 @@ - + This temporary object is a collection of dmabufs and other parameters that together form a single logical buffer. The temporary @@ -222,6 +222,8 @@ + @@ -394,9 +396,28 @@ + + + + Set the device the compositor should import the dmabufs to for sampling + in the next create or create_immed request. + + To avoid race conditions when the compositor removes a device from the + tranches, it is not a protocol error if the device hasn't been advertised + by the compositor in a tranche with the sampling flag, but the import is + likely to fail in that case. + + If the client doesn't know a suitable target device, it shouldn't set one, + and the compositor should attempt import on all devices it supports. + + If the array is too small to contain a dev_t or larger than required, the + invalid_dev_t_size error will be emitted. + + + - + This object advertises dmabuf parameters feedback. This includes the preferred devices and the supported formats/modifiers. @@ -419,10 +440,13 @@ descending order of preference. All formats and modifiers in the same tranche have the same preference. - To send parameters, the compositor sends one main_device event, tranches - (each consisting of one tranche_target_device event, one tranche_flags - event, tranche_formats events and then a tranche_done event), then one - done event. + To send parameters, the compositor sends one main_device event (unless + the client bound version 6 or above), tranches (each consisting of one + tranche_target_device event, one tranche_flags event, tranche_formats + events and then a tranche_done event), then one done event. + + With version 6 and above, the compositor must always advertise at least + one tranche with the sampling flag set. @@ -463,7 +487,7 @@ - + This event advertises the main device that the server prefers to use when direct scan-out to the target device isn't possible. The @@ -488,6 +512,9 @@ If explicit modifiers are not supported and the client performs buffer allocations on a different device than the main device, then the client must force the buffer to have a linear layout. + + With version 6 and above, this event is no longer sent. Clients should + use a device with the sampling flag in the tranches instead. @@ -516,9 +543,9 @@ The client can use this hint to allocate the buffer in a way that makes it accessible from the target device, ideally directly. The buffer must - still be accessible from the main device, either through direct import - or through a potentially more expensive fallback path. If the buffer - can't be directly imported from the main device then clients must be + still be accessible from a device with the sampling flag, either through + direct import or a potentially more expensive fallback path. If the + buffer can't be directly imported for sampling, then clients must be prepared for the compositor changing the tranche priority or making wl_buffer creation fail (see the zwp_linux_buffer_params_v1.create and create_immed requests for details). @@ -564,19 +591,30 @@ - + + + The scanout flag is a hint that direct scan-out may be attempted by + the compositor on the target device if the client appropriately + allocates a buffer. How to allocate a buffer that can be scanned out + on the target device is implementation-defined. + + + + + The sampling flag describes that the compositor is able to efficiently + sample from buffers imported to the target device if the client + appropriately allocates a buffer. How to allocate a buffer that can be + efficiently sampled on the target device is implementation defined. + + - This event sets tranche-specific flags. - - The scanout flag is a hint that direct scan-out may be attempted by the - compositor on the target device if the client appropriately allocates a - buffer. How to allocate a buffer that can be scanned out on the target - device is implementation-defined. - - This event is tied to a preference tranche, see the tranche_done event. + This event sets tranche-specific flags. This event is tied to a + preference tranche, see the tranche_done event. + With version 6 and above, the compositor must set at least one flag + in each tranche.