libei/README.md

273 lines
12 KiB
Markdown
Raw Normal View History

libei
=====
**libei** is a library for Emulated Input, primarily aimed at the Wayland
stack. It provides three parts:
- 🥚 EI (Emulated Input) for the client side (`libei`)
- 🍦 EIS (Emulated Input Server) for the server side (`libeis`)
- 🍚 REIS (Restrictions for the EIS) for the portal in between (`libreis`)
The communication between these parts is an implementation detail, neither
client nor server need to care about the details. Let's call it the BRidge
for EI, or 🥣 brei.
For the purpose of this document, **libei** refers to the project,
`libei`/`libeis`/`libreis` to the libraries provided.
In the Wayland stack, the EIS server component is part of the
compositor, the EI client component is part of the Wayland client.
```
+--------------------+ +------------------+
| Wayland compositor |---wayland---| Wayland client B |
+--------------------+\ +------------------+
| libinput | libeis | \_wayland______
+----------+---------+ \
| | +-------+------------------+
/dev/input/ +---brei----| libei | Wayland client A |
+-------+------------------+
```
The use-cases **libei** attempts to solve are:
- on-demand input device emulation, e.g. `xdotool` or more generically the
XTEST extension
- input forwarding, e.g. `synergy`
**libei** provides three benefits:
- separation
- distinction
- control
**libei** provides **separation** of emulated input from normal input.
Emulated input is a distinct channel for the compositor and can thus be
handled accordingly. For example, the compositor may show a warning sign in
the task bar while emulated input is active.
The second benefit is **distinction**. Each **libei** client has its own
input device set, the server is always aware of which client is requesting
input at any time. It is possible for the server to treat input from
different emulated input devices differently.
The server is in **control** of emulated input - it can filter input or
discard at will. For example, if the current focus window is a password
prompt, the server can simply discard any emulated input. If the screen is
locked, the server can suspend all emulated input devices.
For the use-case of fowarding input (e.g. `synergy`) **libei** provides
capability monitoring. As with input emulation same benefits apply -
input can only be forwarded if the compositor explicitly does so.
Why not $foo?
-------------
We start from the baseline of: "there is no emulated input in Wayland (the
protocol)".
There is emulated input in X through XTEST but it provides neither
separation, distinction nor control in a useful manner. There are however
many X clients that require XTEST to work.
There are several suggestions that overlap with **libei**, with the main
proposals being:
- a Wayland protocol for virtual input
- a (compositor-specific) DBus interface for virtual input
Emulated input is not specifically Wayland-y. Clients that emulate input
generally don't care about Wayland itself. It's not needed to emulate
events on their own surfaces and Wayland does not provide global state. The
only connection to Wayland is merely that input events are *received*
through the Wayland protocol. So a Wayland protocol for emulating input is
not a great fit, it merely ticks the convenient box of "we already have IPC
through the wayland protocol, why not just do it there".
DBus is the most prevalent generic IPC channel on the Linux desktop but it's
not available in some compositors. Any other specific side-channel requires
an IPC mechanism to be implemented in the sender and receiver.
The current situation looks like that neither proposal will be universally
available. Wayland clients (including Xwayland) would need to support any
combination of methods.
**libei** side-steps this issue by making the *communication* itself a
an implementation detail and providing different *negotiation* backends.
A client can attempt to establish a **libei** context through a Flatpak
Portal first and all back onto a public DBus interface and fall back onto
e.g. a named UNIX socket. All with a few lines of code only. There is only
one spot the client has to care about this, the actual emulation of input is
identical regardless of backend.
High-level summary
------------------
A pseudo-code implementation for server and client are available in
the [`examples/`](https://gitlab.freedesktop.org/whot/libei/-/tree/master/examples)
directory.
The server starts a `libeis` context (which can be integrated with flatpak
portals) and uses the `libeis` file descriptor to monitor for
client requests.
A client starts a `libei` context and connects to the server - either
directly, via DBus or via a portal. The server (or the portal) approves or
denies the client. After successful authentications the client can request
the creation of a device with capabilities `pointer`, `keyboard` or `touch`.
The client triggers input events on this device, the server receives those
as events through `libeis` and can forwards them as if they were libinput
events. The server has control of the client stream. If the stream is
paused, events from the client are discarded. If the stream is resumed, the
server will receive the events (but may discard them anyway depending on
local state).
The above caters for the `xdotool` use-case.
The client may request to monitor a capability. When the server deems the
client to be in-focus, it forwards events from real devices to the client.
The decision of what constitutes logical focus and what events to forward
are up to the server.
For a `synergy` use-case, the setup requires:
- `synergy-client` on host A monitoring the mouse and keyboard capabilities
- `synergy-server` on host B requesting a mouse/keyboard capability device
- when `synergy-client` receives events via `libei` from compositor A it
forwards those to the remote `synergy-server` which sends them via `libei`
to the compositor B.
The compositor may choose to implement a hotkey to start/stop the events or
it may implement the screen edges to be the hot key.
Using REIS
----------
`libreis` is designed to allow a third-party that does not have a full
context to manage restrictions. This is aimed at portals that only have the
file descriptor to the EIS implementation but cannot initiate a full EI
context.
`libreis` works so that this third-party can configure the EIS
implementation to restrict the abilities of an EI client (later) connected
to this implementation. For example, a portal can set the client name
using `libreis` based on the app-id. A client cannot override this name
later.
Open questions
--------------
### Flatpak integration
Where flatpak portals are in use, `libei` can communicate with
the portal through a custom backend. The above diagram modified for
Flatpak would be:
```
+--------------------+
| Wayland compositor |_
+--------------------+ \
| libinput | libeis | \_wayland______
+----------+---------+ \
| [eis-0.socket] \
/dev/input/ / \\ +-------+------------------+
| ======>| libei | Wayland client A |
| after +-------+------------------+
initial| handover /
connection| / initial request
| / dbus[org.freedesktop.portal.EmulatedInput]
+--------------------+
| xdg-desktop-portal |
+--------------------+
```
The current approach works so that
- the compositor starts an `libeis` socket backend at `$XDG_RUNTIME_DIR/eis-0`
- `xdg-desktop-portal` provides `org.freedesktop.portal.EmulatedInput`
- a client connects to the `xdg-desktop-portal` to request emulated input
- `xdg-desktop-portal` authenticates a client and opens the initial
connection to the `libeis` socket. It restricts the capabilities available
on that socket (e.g. sets the client name based on `app-id` using
`libreis`).
- `xdg-desktop-portal` hands over the file descriptor to the client which
can initialize a `libei` context
- from then on, `libei` and `libeis` talk directly to each other, the portal
has no further influence.
This describes the **current** implementation. Changes to this approach are
likely, e.g. the portal **may** control suspending/resuming devices (in addition to the
server). The UI for this is not yet sorted.
### Authentication
Sandboxing is addressed via flatpak portals but a further level is likely
desirable, esp. outside flatpak. The simplest solution is the client
announcing the name so the UI can be adjusted accordingly. API wise-maybe an
opaque key/value system so the exact auth can be left to the implementation.
### Triggers
For `synergy` we need capability monitoring started by triggers, e.g. the
client requests a pointer capability monitoring when the real pointer hits
the screen edge. Or in response to a keyboard shortcut.
### Keyboard layouts
The emulated input may require a specific keyboard layout, for example
for softtokens (usually: constant layout "us") or for the `synergy` case
where the remote keyboard should have the same keymap as the local one, even
where the remote host is configured otherwise.
libei provides keymap negotation: the client can pick a keymap, the server
can accept it, refuse it, or override it with its own. In the latter two
cases it is up to the client to handle the result.
Modifier state handling, group handling, etc. is still a private
implementation so even where the server supports individual keymaps. So it
remains to be seen if this approach is sufficient.
### Xwayland and XTEST
There are PoC implementations of using `libei` within Xwayland and
connecting it to a `libeis` context in the compositor (PoC with Weston).
This allows Xwayland to intercept XTEST events and route those through
the compositor instead.
```
+--------------------+ +------------------+
| Wayland compositor |---wayland---| Wayland client B |
+--------------------+\ +------------------+
| libinput | libeis | \_wayland______
+----------+---------+ \
| | +-------+------------------+
/dev/input/ +---brei----| libei | XWayland |
+-------+------------------+
|
| XTEST
|
+-----------+
| X client |
+-----------+
```
Of course, XWayland is just another Wayland compositor, so the connection
between libei and libeis could be handled through a portal.
### Short-lived applications
**libei** is not designed for short-lived fire-and-forget-type applications
like `xdotool`. It provides context and device negotiation between the
server and the client - the latter must be able to adjust to limitations the
server imposes.
The current implemtation of the protocol does not allow for a `libei` client
to send all requests in bulk and exit. The decision on whether to accept a
device is ultimately made by the caller implementation and
non-deterministic. For **libei** to support a batch request, *someone* would
have to wait. It cannot be the server as the exact requirements are unknown: do
we pause processing on the client altogether? We may miss a disconnect
event? Do we pause processing for one device only? But then we may be
re-ordering input events and cause havoc.
It could be `libei` itself to implement these event queues but this too can
mess with the input order. And implementing an event queue is not hard, so
this issue is punted to the caller instead. XWayland in its current
implementation already does this.