stable/linux-dmabuf: allow compositors to advertise multiple devices

With multi gpu systems, if the client is rendering on a device other than the main_device advertised by the compositor, there's no way for the client to know whether or not the import will be successful except for allocating a buffer and attempting to import the buffer into the target device, and if that fails, trying again with different formats, modifiers and potentially allocation flags. Similarly, compositors that want to supoprt buffers from non-main devices need to attempt importing the buffer to every device they support. In some cases, this import may even trigger expensive operations such as the buffer being copied into system memory by the kernel. This protocol addition allows compositors to explicitly advertise support for multiple devices on the system, and specify which formats and modifiers they can sample from. For buffer imports, the client explicitly specifies a device for the compositor to import the buffer to, which allows the compositor to avoid additional copies. Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
2026-05-22 00:48:11 +02:00 · 2023-12-10 20:49:45 +01:00 · 2023-12-10 20:49:45 +01:00 · 5b5d60d495
commit 5b5d60d495
parent c364bf61ec
2 changed files with 85 additions and 23 deletions
--- a/stable/linux-dmabuf/feedback.rst
+++ b/stable/linux-dmabuf/feedback.rst
@ -31,10 +31,12 @@ linux-dmabuf feedback introduces the following concepts:
   appropriate format/modifier and also to avoid allocating in private device
   memory when cross-device operations are going to happen.

-linux-dmabuf feedback implementation notes
-==========================================
+3. Starting with version 6, the assumption of a single main device is removed.
+   Clients should instead use the target device of any tranche marked with the
+   ``sampling`` flag instead.

-This section contains recommendations for client and compositor implementations.
+linux-dmabuf feedback implementation notes for version 4 and 5
+==========================================

 For clients
 -----------
@ -162,6 +164,28 @@ slower than direct scan-out but faster than texturing. For instance, a
 compositor could insert an intermediate tranche if it's possible to use a
 mem2mem device to convert buffers to be able to use scan-out.

+linux-dmabuf feedback implementation nodes for version 6
+==========================================
+
+With version 6 of the protocol, most of the logic described for version 4 still
+stands, with a few exceptions:
+- the main device is no longer sent. If the client needs an equivalent to the
+  main device, the target device of the first tranche with the ``sampling``
+  flag can be used.
+- compositors should send tranches with the ``sampling`` flag for all devices
+  they can sample from without triggering buffer migrations.
+- clients are strongly recommended to set the device they allocated buffers for
+  with ``wp_linux_buffer_params.set_sampling_device``. This avoids both implicit
+  buffer migrations and unnecessary copies by the compositor.
+- compositors should avoid directly importing buffers into devices other than the
+  one set by the client with the ``sampling`` device whenever possible, to avoid
+  implicit buffer migrations by the graphics driver (which can reduce performance
+  significantly)
+- if the compositor wishes a client to change which device it's rendering on, it
+  can change the order of tranches (and thus devices) in the per-surface feedback.
+  In response, clients that are capable of switching devices should re-allocate
+  on the first device they can use.
+
 ``dev_t`` encoding
 ==================

--- a/stable/linux-dmabuf/linux-dmabuf-v1.xml
+++ b/stable/linux-dmabuf/linux-dmabuf-v1.xml
@ -24,7 +24,7 @@
    DEALINGS IN THE SOFTWARE.
  </copyright>

-  <interface name="zwp_linux_dmabuf_v1" version="5">
+  <interface name="zwp_linux_dmabuf_v1" version="6">
    <description summary="factory for creating dmabuf-based wl_buffers">
      This interface offers ways to create generic dmabuf-based wl_buffers.

@ -184,7 +184,7 @@
    </request>
  </interface>

-  <interface name="zwp_linux_buffer_params_v1" version="5">
+  <interface name="zwp_linux_buffer_params_v1" version="6">
    <description summary="parameters for creating a dmabuf-based wl_buffer">
      This temporary object is a collection of dmabufs and other
      parameters that together form a single logical buffer. The temporary
@ -222,6 +222,8 @@
      <entry name="invalid_wl_buffer" value="7"
             summary="invalid wl_buffer resulted from importing dmabufs via
               the create_immed request on given buffer_params"/>
+      <entry name="invalid_dev_t_size" value="8"
+             summary="an array with mismatching size for a dev_t was used"/>
    </enum>

    <request name="destroy" type="destructor">
@ -394,9 +396,28 @@
      <arg name="format" type="uint" summary="DRM_FORMAT code"/>
      <arg name="flags" type="uint" enum="flags" summary="see enum flags"/>
    </request>
+
+    <request name="set_sampling_device" since="6">
+      <description summary="set the target device of the wl_buffer">
+        Set the device the compositor should import the dmabufs to for sampling
+        in the next create or create_immed request.
+
+        To avoid race conditions when the compositor removes a device from the
+        tranches, it is not a protocol error if the device hasn't been advertised
+        by the compositor in a tranche with the sampling flag, but the import is
+        likely to fail in that case.
+
+        If the client doesn't know a suitable target device, it shouldn't set one,
+        and the compositor should attempt import on all devices it supports.
+
+        If the array is too small to contain a dev_t or larger than required, the
+        invalid_dev_t_size error will be emitted.
+      </description>
+      <arg name="device" type="array" summary="device dev_t value"/>
+    </request>
  </interface>

-  <interface name="zwp_linux_dmabuf_feedback_v1" version="5">
+  <interface name="zwp_linux_dmabuf_feedback_v1" version="6">
    <description summary="dmabuf feedback">
      This object advertises dmabuf parameters feedback. This includes the
      preferred devices and the supported formats/modifiers.
@ -419,10 +440,13 @@
      descending order of preference. All formats and modifiers in the same
      tranche have the same preference.

-      To send parameters, the compositor sends one main_device event, tranches
-      (each consisting of one tranche_target_device event, one tranche_flags
-      event, tranche_formats events and then a tranche_done event), then one
-      done event.
+      To send parameters, the compositor sends one main_device event (unless
+      the client bound version 6 or above), tranches (each consisting of one
+      tranche_target_device event, one tranche_flags event, tranche_formats
+      events and then a tranche_done event), then one done event.
+
+      With version 6 and above, the compositor must always advertise at least
+      one tranche with the sampling flag set.
    </description>

    <request name="destroy" type="destructor">
@ -463,7 +487,7 @@
      <arg name="size" type="uint" summary="table size, in bytes"/>
    </event>

-    <event name="main_device">
+    <event name="main_device" deprecated-since="6">
      <description summary="preferred main device">
        This event advertises the main device that the server prefers to use
        when direct scan-out to the target device isn't possible. The
@ -488,6 +512,9 @@
        If explicit modifiers are not supported and the client performs buffer
        allocations on a different device than the main device, then the client
        must force the buffer to have a linear layout.
+
+        With version 6 and above, this event is no longer sent. Clients should
+        use a device with the sampling flag in the tranches instead.
      </description>
      <arg name="device" type="array" summary="device dev_t value"/>
    </event>
@ -516,9 +543,9 @@

        The client can use this hint to allocate the buffer in a way that makes
        it accessible from the target device, ideally directly. The buffer must
-        still be accessible from the main device, either through direct import
-        or through a potentially more expensive fallback path. If the buffer
-        can't be directly imported from the main device then clients must be
+        still be accessible from a device with the sampling flag, either through
+        direct import or a potentially more expensive fallback path. If the
+        buffer can't be directly imported for sampling, then clients must be
        prepared for the compositor changing the tranche priority or making
        wl_buffer creation fail (see the zwp_linux_buffer_params_v1.create and
        create_immed requests for details).
@ -564,19 +591,30 @@
    </event>

    <enum name="tranche_flags" bitfield="true">
-      <entry name="scanout" value="1" summary="direct scan-out tranche"/>
+      <entry name="scanout" value="1" since="4">
+        <description summary="direct scan-out tranche">
+          The scanout flag is a hint that direct scan-out may be attempted by
+          the compositor on the target device if the client appropriately
+          allocates a buffer. How to allocate a buffer that can be scanned out
+          on the target device is implementation-defined.
+        </description>
+      </entry>
+      <entry name="sampling" value="2" since="6">
+        <description summary="sampling tranche">
+          The sampling flag describes that the compositor is able to efficiently
+          sample from buffers imported to the target device if the client
+          appropriately allocates a buffer. How to allocate a buffer that can be
+          efficiently sampled on the target device is implementation defined.
+        </description>
+      </entry>
    </enum>

    <event name="tranche_flags">
      <description summary="tranche flags">
-        This event sets tranche-specific flags.
-
-        The scanout flag is a hint that direct scan-out may be attempted by the
-        compositor on the target device if the client appropriately allocates a
-        buffer. How to allocate a buffer that can be scanned out on the target
-        device is implementation-defined.
-
-        This event is tied to a preference tranche, see the tranche_done event.
+        This event sets tranche-specific flags. This event is tied to a
+        preference tranche, see the tranche_done event.
+        With version 6 and above, the compositor must set at least one flag
+        in each tranche.
      </description>
      <arg name="flags" type="uint" enum="tranche_flags" summary="tranche flags"/>
    </event>