Merge branch 'dactions' into 'main'

Draft: Add triggerable desktop actions

See merge request wayland/wayland-protocols!478
This commit is contained in:
DorotaC 2026-05-13 15:34:44 +00:00
commit e55884d76c
2 changed files with 373 additions and 0 deletions

View file

@ -0,0 +1,4 @@
Desktop actions protocol
Maintainers:
Dorota Czaplejewicz <gilaac.dcz@porcupinefactory.org>

View file

@ -0,0 +1,369 @@
<?xml version="1.0" encoding="UTF-8"?>
<protocol name="desktop_actions_experimental_v1">
<copyright>
Copyright 2018 Mike Blumenkrantz
Copyright 2018 Samsung Electronics Co., Ltd
Copyright 2018 Red Hat Inc.
Copyright 2025 DorotaC
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next
paragraph) shall be included in all copies or substantial portions of the
Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
</copyright>
<!-- TODO: mirrored input-method side protocol -->
<description summary="Shortcuts for focused applications">
This protocol allows a client to advertise some actions it can perform, and lets the compositor trigger them directly.
That directness of triggering actions is the main point of this protocol. Traditionally, the way an action is triggered is a keyboard combinations or a mouse motion. That is then interpreted by the application.
In order to trigger such an action without a keyboard or a mouse, a keyboard or mouse must be emulated. That's an unnecessary difficulty affecting on-screen input methods.
An application supporting desktop-actions lets the input method trigger shortcuts without the emulation baggage.
This protocol also helps avoid clashing shortcuts by letting the compositor detect what the application expects and by assigning another shortcut if it knows of any collisions.
This protocol can be implemented on its own if the compositor wishes to trigger the actions directly. It can also be accompanied by an "input" protocol to let other applications trigger actions, similar to how input methods submit text.
High-level overview of the interfaces:
The desktop_actions_manager exposes the bind_to_input_method request which binds a wl_keyboard to an xx_input_method.
The resulting keyboard_filter object has the can be then used for intercepting keyboard events in accordance to input method needs.
This document adheres to the RFC 2119 when using words like "must",
"should", "may", etc.
Warning! The protocol described in this file is currently in the
experimental phase. Backwards incompatible major versions of the
protocol are to be expected. Exposing this protocol without an opt-in
mechanism is discouraged.
</description>
<interface name="xx_desktop_actions_v1" version="1">
<description summary="Advertise and trigger actions">
Advertise and trigger actions executed by a seat.
The ability to input actions follows keyboard focus.
The application may ignore events which arrive while it is not holding keyboard focus.
</description>
<enum name="universal_action">
<description summary="Action that is not application-specific">
This enum is for actions that are very often seen in different kinds of applications, with shortcuts ingrained into a collective hacker memory so deep they are almost standardized.
<!-- Of course, there's no standard for shortcuts. That's why this protocol was created.
The criterion for inclusion is thus based on gut feel. -->
<!-- XKB keysyms serve the same purpose, unfortunately with text typing mixed in. Typing should be done using text input instead. -->
</description>
<entry name="cut" value="0" summary="Cut into clipboard" />
<entry name="copy" value="0" summary="Copy into clipboard " />
<entry name="paste" value="0" summary="Paste from clipboard" />
<entry name="paste_unformatted" value="0" summary="Paste from clipboard, specifically without formatting" />
<entry name="paste_formatted" value="0" summary="Paste from clipboard, specifically trying to retain formatting" />
<entry name="input_next" value="0" summary="Focus next intractive element" />
<entry name="input_previous" value="0" summary="Focus previous interactive element" />
<entry name="document_new" value="0" summary="New document" />
<entry name="document_open" value="0" summary="Select a document to open" />
<entry name="document_reload" value="0" summary="Reload current document" />
<entry name="document_save" value="0" summary="Save current document" />
<entry name="document_save_as" value="0" summary="Save current document into a new copy" />
<entry name="document_close" value="0" summary="Close current document" />
<entry name="document_previous" value="0" summary="Switch to previous document" />
<entry name="document_next" value="0" summary="Switch to next document" />
<entry name="document_share" value="0" summary="share current" />
<entry name="search" value="0" summary="Begin search" />
<entry name="search_previous" value="0" summary="Go to previous search result" />
<entry name="search_next" value="0" summary="Go to next search result" />
<entry name="undo" value="0" summary="Undo previous action" />
<entry name="redo" value="0" summary="Repeat following action" />
<entry name="confirm" value="0" summary="Ok, confirm, accept, yes" />
<entry name="reject" value="0" summary="Reject, cancel, no" />
<entry name="fullscreen_toggle" value="0" summary="Toggle fullscreen mode" />
<entry name="fullscreen_enter" value="0" summary="Enter fullscreen mode" />
<entry name="fullscreen_leave" value="0" summary="Enter fullscreen mode" />
<entry name="magnify_reset" value="0" summary="Use default magnification level" />
<entry name="magnify_bigger" value="0" summary="Zoom in" />
<entry name="magnify_smaller" value="0" summary="Zoom out" />
<entry name="navigate_back" value="0" summary="Select previous view or go back in hierarchy" />
<entry name="navigate_forward" value="0" summary="Select next view, go forward in hierarchy, follow current link" />
<entry name="start" value="0" summary="Start activity, execute, run" />
<entry name="pause" value="0" summary="Stop activity, allowing to resume" />
<entry name="stop" value="0" summary="Stop activity, without possibility to resume" />
<entry name="details" value="0" summary="Display details, document properties" />
<entry name="remove_selected" value="0" summary="Remove or delete selected item(s)" />
<entry name="select_all" value="0" summary="Select all items" />
<entry name="select_none" value="0" summary="Clear selection" />
<!-- Actions copying XKB keysyms. Not including anything that belongs to the realm of typing text, desktop management. -->
<entry name="select" value="0" summary="Select, mark item(s)" />
<entry name="print" value="0" summary="Print document" />
<entry name="insert" value="0" summary="Insert, insert here" />
<entry name="menu" value="0" summary="menu" />
<entry name="help" value="0" summary="View help" />
<entry name="break" value="0" summary="break" />
</enum>
<request name="set_available_universal_actions">
<description summary="announce the available actions">
Announces the actions available to trigger.
Values set with this request are double-buffered. They will get applied
on the next .commit request.
They get reset to the initial value on every keyboard .enter event.
The initial value is an empty set: no actions are available.
Values in the available_actions array come from the .universal_action enum.
</description>
<arg name="available_actions" type="array" summary="available actions"/>
</request>
<!-- this loosely follows XDG Global Shortcuts:
https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.GlobalShortcuts.html
GlobalShortcuts had trouble with shortcuts being added dynamically.
This doesn't consider this at all. Maybe it works, maybe it doesn't. It should work, though, because the user doesn't manually need to approve shortcuts before they take effect - the application is already in focus.
-->
<!-- Why is this even needed?
It's because keyboard shortcuts don't make sense and can't be triggered at will.
Consider the shortcut for closing the current tab in Kate on the same keyboard, different layouts.
Layout | Keysym | Keycode | Location
QWERTY: Ctrl+W, 17, button 3 row 3
AZERTY: Ctrl+W, 44, button 2 row 5
RU: Ctrl+В, 17, button 3 row 3
How can this action be triggered using keyboard? The issuer must track the current system layout, and inject key events directly, depending on layout. This is a lot of complexity, but it's possible (even though in the wild, I have received complaints that in practice, implementations are broken, which spurred this investigation).
Even worse, input methods have extra difficulties implementing this right. From private chats, an input method wants to send arbitrary characters as shortcuts (Ctrl+Ü).
If we tie the input method to the current layout (keymap), arbitrary characters won't be sendable any more.
If we let the input method choose arbitrary key syms by the means of a custom key map, normal shortcuts will be broken (RU depends on a specific key code, ignoring the key sym).
If we create a special protocol just to send arbitrary key syms, shortcuts will still be broken (there's no reason to think Ctrl+W keysym triggers anything on the RU layout if it's not in the layout).
Or maybe adding a special keysym protocol can help in most cases, like in Latin alphabet lands?
But if we're adding a special protocol, then it won't apply to old applications. The compositor will have to convert those keysyms back to keycodes before submitting, with the same pitfalls.
If we're modifying protocols for this use case, let's create something dedicated to actions, without the pitfalls above.
This is the desktop actions protocol.
-->
<request name="add_available_app_action">
<description summary="announce application-specific action">
Adds an application-specific actions to the available list.
Conceptually, this adds an action to the compositor's "app_action" set.
Values set with this event are double-buffered. They will get applied
on the next .commit request.
The "app_action" set persists until the client exits.
The initial value is an empty set: no actions are available.
</description>
<arg name="name" type="string" summary="Human-readable description"/>
<arg name="handle" type="uint" summary="Handle to trigger the action"/>
<arg name="existing_trigger" type="string">
<description summary="Optional trigger description">
Describes the default trigger(s) for this action.
The compositor should attempt to use this trigger.
The empty string means the trigger has no keyboard shortcut. Upon receiving, the compositor may not assign one.
If the string is not empty, it takes the form {format}:{trigger}, where {trigger} is a string defined by {format}.
Currently supported formats:
"xdg", with trigger specified in https://specifications.freedesktop.org/shortcuts/latest/ .
<!-- xdg only supports shortcuts triggered by keyboards.
The "format" field was added so that compositors can come up with something else, like touch gestures, game controllers, voice control, etc.
In case the xdg protocol doesn't get extended.
-->
Formats not recognized by the compositor must be ignored.
Multiple suggestions may be present. Multiple suggestions must be separated by newline characters.
<!-- This lets an application spam unstandardized formats, hoping that the compositor understands one of them, before falling back to the keyboard. It might be useful if some action is *especially* well suited to a gesture or other non-keyboard trigger. -->
<!-- I do hope no one has the bright idea to include a newline in the format spec... Otherwise suggestions would need to go in a separate request, which would be annoying. -->
<description/>
</arg>
</request>
<request name="remove_available_app_action">
<description summary="application-specific action becomes invalid">
The application no longer needs this action.
This remove the shortcut from "app_action" list.
</description>
<arg name="handle" type="uint" summary="Handle to trigger the action"/>
</request>
<request name="commit">
<description summary="commit an update">
Applies the values of available_actions and stores the "app_action" list.
</description>
</request>
<!-- Should this be optionally synchronized with text_input? What kind of race conditions can happen? Under what circumstances will an action be sent alongside a text input commit? -->
<event name="perform_universal_action">
<description summary="action requested">
The compositor requested a universal action to be performed.
The application should ignore this request if it didn't advertise the action in the most recent `available_actions`.
</description>
<arg name="action" type="uint" enum="universal_action" summary="action to perform"/>
</event>
<!-- There are two ways to trigger shortcuts: let the compositor do it or make the application do it.
Keyboard shortcuts are what historically gave trouble, so it deserves special attention.
Better let compositors detect and trigger keyboard shortcuts:
- Compositors are likely to already have a robust shortcut assignment and detection (?).
- Not every application has a shortcut dialog.
- There's a lot more applications than compositors, meaning more work in total.
- No need to care if application recognizes the format.
- When replacing a shortcut client-side, there must be an easy way for application to end up with no conflicts.
Better let applications themselves detect keyboard shortcuts that were assigned by compositor:
- Detecting a shortcut in the compositor doesn't prevent the application from detecting some key combo from the same shortcut and acting on it.
- The application can persist keyboard shortcuts for itself without needing an app-identifying protocol. (but it can't persist shortcuts it can't parse/detect)
Compositor-side wins.
-->
<enum name="assignment_flags" bitfield="true">
<description summary="Control how shortcuts are assigned"/>
<entry name="none" value="0x0" summary="no special behavior"/>
<entry name="replace_keyboard" value="0x1">
<description summary="Set new keyboard combo">
The compositor takes over the responsibility for triggering the keyboard action.
This allows the compositor to resolve keyboard shortcut conflicts.
The client should use the assigned keyboard shortcut desctiption instead of its internal default.
The application must stop reacting to keyboard shortcuts triggering this action.
If an "xkb" trigger is not provided, the client must discard the old shortcut.
</desription>
</entry>
</enum>
<event name="assign_app_action_trigger">
<description summary="Assigns a trigger to the action">
Notifies the client which shortcut(s) the compositor will actually use to trigger the action.
This is intended to disable conflicting shortcuts and for the client to display user-readable hints in-application, e.g. when clicking "File" menu, there's "New Window: Ctrl+Shift+N".
The compositor may assign multiple shortcuts to the action, see {number} in the description of the trigger argument.
The client must ignore this request if it didn't advertise the handle in the most recently submitted "app_action" set.
Sending this request always replaces the old value.
The value persists until the client disconnects.
<!-- Resolving shortcut conflicts should happen only once.
It would be bad if our shortcut was stolen (by compositor itself? by a global shortcut?) while our window was out of focus. The user shouldn't need to check every time.-->
Once assigned, the compositor should save the assignments and use them the next time the client connects.
Examples:
If the shortcut needs to be triggered for an input method or an automation, setting flags to none is appropriate. The compositor can then put a description in "text", which the client could display where appropriate.
If the shortcut is assigned to get rid of a shortcut conflict, the compositor will want the replace_keyboard flag on all the conflicts to disable previous combos. The action then remains triggerable by an input method.
</description>
<arg name="handle" type="uint" summary="handle to action to describe"/>
<arg name="flags" type="uint" enum="assignment_flags" summary="flags for this assignment"/>
<arg name="trigger" type="string">
<description summary="Trigger description">
The description follows the format:
{number}:{format}:{trigger}
where {number} is the ordinal number of the shortcut.
An additional {format} is available:
"text", where trigger is a textual, human-readable description.
If this string is empty, the old shortcut remains unmodified.
</description>
</event>
<event name="assign_action_finish">
<description summary="Finish assigning triggers">
This marks all unassigned actions with an empty trigger string and none flag..
In practice, any compositor which resolves shortcut conflicts is going to (re)assign all actions with a default shortcut, to make sure the application doesn't react to something already triggering an action on the compositor side.
The compositor must send this in order for shortcut assignments to become active.
</description>
</event>
<event name="perform_app_specific_action">
<description summary="app-specific action requested">
The compositor requested an app-specific action to be performed.
The application must ignore this request if it didn't advertise the handle in the most recently submitted "app_action" set.
The application must ignore this request if the compositor didn't assign the shortcuts using .assign_action_finish.
</description>
<arg name="handle" type="uint" summary="handle to action to perform"/>
</event>
<request name="destroy" type="destructor">
<description summary="destroy the actions object">
Destroys the desktop_actions object.
</description>
</request>
</interface>
<interface name="xx_desktop_actions_manager_v1" version="1">
<description summary="desktop actions manager">
A factory for desktop actions objects. This object is a global singleton.
</description>
<request name="get_desktop_actions">
<description summary="create a new desktop actions object">
Creates a new desktop actions object for a given seat.
</description>
<arg name="id" type="new_id" interface="xx_desktop_actions_v1"/>
<arg name="seat" type="object" interface="wl_seat"/>
</request>
<request name="destroy" type="destructor">
<description summary="destroy the actions manager">
Destroys the xx_desktop_actions_manager_v1 object.
The xx_desktop_actions_v1 objects originating from it remain unaffected.
</description>
</request>
</interface>
</protocol>