We currently use the systemd LLDP client, which we consume by forking
systemd code. That is a maintenance burden, because it's not a
self-contained, stable library that we use. Hence there is a need for an
individual library or properly integrating the fork in our tree.
Optimally, we would create a new nettools project with an LLDP library.
That was not done because:
- nettools may want to be dual licensed with LGPL-2.1+ and Apache.
Systemd code is LGPL-2.1+ so it is fine for NetworkManager but
possibly not for nettools.
- nettools provides independent librares, as such they don't have an
event loop, instead they expose an epoll file descriptor and the user
needs to integrate it. Systemd and NetworkManager on the other hand
have their established event loop (sd_event and GMainContext,
respectively). It's simpler to implement the library on those terms,
in particular porting the systemd library from sd_event to
GMainContext.
- NetworkManager uses glib and has various helper utils. While it's
possible to do without them, it's more work.
The main reason to not write a new NetworkManager-agnostic library from
scratch, is that it's much simpler to fork the systemd library and make
it part of NetworkManager, than making it a nettools library.
Do it.
The boottime argument might come from the system, and we should not
assert that it's reasonably small. It might be infinity. In that
case, keep it at infinity.
It belongs there, beside NMEtherAddr. Maybe NMEtherAddr should be moved to a
separate header, but it here for now.
The only oddity is that nm_ether_addr_zero actually aliases nm_ip_addr_zero,
which is in "libnm-glib-aux/nm-inet-utils.h". We can workaround that.
Taken from systemd's "Prioq".
Differences from Prioq:
- It is glib-ized, so certain operations cannot fail since g_malloc()
never fails.
- Unlike Prioq, this structure is stack allocated. I think that makes
sense, because we basically always want to embed the data structure
in another object. There is never a need for passing this around as a
pointer. And if you really want, you can box it yourself.
- The queue either accepts a GCompareFunc or a GComareDataFunc. This
is for convenience. The prioq_ensure_allocated() and
prioq_ensure_put() consequently are dropped, as they would be
cumbersome with this pattern and don't seem useful.
instead of always re-requesting secrets on authentication failure ask NMSetting
if this is really needed. Currently only for the case "802.1x with TLS" this
behaves differently, i.e. no re-request.
When an authentication attempt fails, NetworkManager re-requests new secrets
from agents before retrying. This is currently decided outside of the NMSetting
objects. With this change the decision if a re-request of new secrets is really
needed is moved down to the NMSetting implementations.
For the case "802.1x authentication with TLS" a certificate with password is
configured and the assumption is, that this can never be wrong and no re-request
is needed.
The pager_fallback() runs in the forked child process.
As such, it can only use functions from `man signal-safety`
or that are explicitly allowed.
We are mostly good, but g_printerr() is not allowed. It can deadlock.
Just avoid it. It's not very to print those error messages anyway.
setenv() cannot be called after fork, because it might allocate memory,
which can deadlock.
Instead, prepare the environment and use execvpe().
`man 2 fork` says:
After a fork() in a multithreaded program, the child can safely call
only async-signal-safe functions (see signal-safety(7)) until such time
as it calls execve(2).
This means, we are quite strongly limited what can be done in the child
process, before exec. setenv() is not listed as async-signal-safe, obviously
because it allocates memory, and malloc() isn't async-signal-safe either.
See also glib's documentation of GSpawnChildSetupFunc ([1]) about what
can be done in the child process.
[1] 08cb200aec/glib/gspawn.h (L124)
Currently, when performing DNS resolution with systemd-resolved,
NetworkManager tells systemd-resolved to consider only DNS configuration
for the network interface that the connectivity check request will be
routed through. But this is not correct because DNS and routing are
configured entirely separately. For example, say we have a VPN that
receives all DNS but only a subset of routing. NetworkManager will
configure systemd-resolved with no DNS servers on any interface except
for the VPN interface, but will still route traffic through other
interfaces. This is entirely legitimate and works fine in practice,
except for the connectivity check.
To fix this, we just drop the restriction and allow systemd-resolved to
consider its full configuration, which is what gets used normally
anyway. This allows our connectivity check to match the real
configuration instead of failing spuriously.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1107https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1415
Fix the following crash:
$ nmcli device monitor a
Error: Device 'a' not found.
Segmentation fault (core dumped)
Found by coverity:
1. NetworkManager-1.41.3/src/nmcli/devices.c:0: scope_hint: In function 'do_devices_monitor'
2. NetworkManager-1.41.3/src/nmcli/devices.c:2932:28: warning[-Wanalyzer-null-dereference]: dereference of NULL 'devices'
2930| }
2931|
2932|-> for (i = 0; i < devices->len; i++)
2933| device_watch(nmc, g_ptr_array_index(devices, i));
2934|
Fixes: 2074b28976 ('nmcli/devices: return GPtrArray instead of GSList from get_device_list()')
g_memdup()'s size argument is a guint. There was CVE-2021-27219
about an integer overflow, which results in a buffer overflow.
In response to that, g_memdup2() was introduced in 2.68.
We can't use g_memdup2(), because our currently required glib
version is still 2.40.
There was no bug at those two places where g_memdup() was used.
It's just that g_memdup() is a code smell. Prevent any questions that
a reader of the code might have regarding the correctness of g_memdup()
(w.r.t. integer/buffer overflow), by not using it.
Instead use our internal nm_memdup() variant, which exactly exists for
this reason.
See-also: https://gitlab.gnome.org/GNOME/glib/-/issues/2319
The "device ... not available because device is strictly unmanaged" is
almost certainly the least interesting of the reasons why connection
can't be activated on a device.
Invent a new error level for it and demote it.
Before:
Error: Connection activation failed: No suitable device found
for this connection (device lo not available because
device is strictly unmanaged).
After
Error: Connection activation failed: No suitable device found
for this connection (device eth0 not available because
profile is not compatible with device (...)).
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1433
When there are many VFs the default buffer size of 1 memory page is
not enough. Each VF can take up to ~120 bytes and so when the page
size is 4KiB at most ~34 VFs can be added.
Specify the buffer size when allocating the message.
Add a len argument to nlmsg_alloc() and nlmsg_alloc_simple(). After
that, nlmsg_alloc_size() can be dropped. Also, rename
nlmsg_alloc_simple() to nlmsg_alloc_new().
It is not possible to configure a VLAN interface on unmanaged NIC.
This forces users who only want to create a VLAN interface to take
ownership over possibly shared underlying NIC.
In OpenShift, the SR-IOV operator is currently not using
NetworkManager to configure VFs. When it starts working with a NIC,
it explicitly makes it unmanaged. Then, users cannot create a VLAN
interface on PFs managed by the operator.
This commit eliminates this issue by allowing configuring VLAN on
an interface without requesting it to be managed by NetworkManager.
This commit is part of a broader change that eliminates inheriting
the unmanaged condition from the parent of a device, for all device
types:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1418https://bugzilla.redhat.com/show_bug.cgi?id=2110307
ensure_teamd_connection() is called from multiple spots. Sometimes
we call opportunistically without having started teamd (e.g. when on
update_connection() when generating a connection for teaming device that
was created) and handle the failure to connect gracefully.
Let's not pollute the logs with things on ERROR level that are not
actually serious. Replace the logging statements with DEBUG or WARN
depending on whether we expect ensure_teamd_connection() to actually
succeed.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1422
Call teamdctl_port_config_update_raw() when we're attaching a port even
if all of team-slave setting properties are default.
This is done to ensure teamd "knows" about the port (that is,
"teamdctl ... port present" returns success) when we're done activating
the slave connection. It will pick it up anyway from netlink, but that
can happen after the activation is done, resulting in a possible race.
Fixes-test: @remove_active_team_profile
https://bugzilla.redhat.com/show_bug.cgi?id=2102375https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1421
Add a fire-and-forget function to wait for shutdown to be complete.
It's not entirely trivial to ensure all resources of NMClient are
cleaned up. That matters only if NMClient uses a temporary GMainContext
that the user wants to release while the application continues. For
example, to do some short-lived operations an a worker thread. It's
not trivial also because glib provides no convenient API to integrate
a GMainContext in another GMainContext. We have that code as
nm_utils_g_main_context_create_integrate_source(), so add a helper
function to allow the user to do this.
The function allows to omit the callback, in which case the caller
wouldn't know when shutdown is complete. That would still be useful
however, when integrating the client's context into the caller's
context, so that the client's context gets automatically iterated
until completion.
The following test script will run out of file descriptors,
when wait_shutdown() is not used:
#!/bin/python
import gi
gi.require_version("NM", "1.0")
from gi.repository import NM, GLib
for i in range(1200):
print(f">>>{i}")
ctx = GLib.MainContext()
ctx.push_thread_default()
nmc = NM.Client.new()
ctx.pop_thread_default()
def cb(unused, result, i):
try:
NM.Client.wait_shutdown_finish(result)
except Exception:
# cannot happen
assert False
else:
print(f">>>>> {i} complete")
nmc.wait_shutdown(True, None, cb, i)
while GLib.MainContext.default().iteration(False):
pass
When using async initialization with GAsyncInitable, the user usually can
only know that initialization is complete by passing a callback.
In simple cases, that can be cumbersome.
Also expose a flag that allows to poll that information.
Reuse the existing NM_CLIENT_INSTANCE_FLAGS for that. There is an
ugliness here, that suddenly there are instance flags that cannot be
set, but are still returned by the getter. But as this is a relatively
obscure feature, it seems more lightweight to implement it this way
(instead of adding a separate property and getter function).