OVS system interfaces can start to connect even before the ovsdb is
ready. However, the connection attempt is doomed to fail and the
NMSettingsConnection gets blocked with reason FAILED.
Unblock them once the ovsdb is ready.
Ideally, NMPolicy should subscribe to the NMOvsdb::ready signal,
however NMOvsdb is in a plugin, so it's easier if NMOvsdb directly
calls a function of the core.
Sometimes during boot the device starts an activation with reason
'assumed', then the ovs interfaces gets removed from the ovsdb. At
this point the device has an active-connection associated; it has the
activate_stage1_device_prepare() activation source scheduled, but it
is still in disconnected state.
Currently the factory doesn't recognizes that the device is activating
and allows the device to proceed even if the interface was removed
from the ovsdb. Fix this by checking the presence of an
active-connection instead.
During shutdown, NM always tries to remove from ovsdb all bridges,
ports, interfaces that it previously added. Currently NM doesn't run
the main loop during shutdown and so it's not possible to perform
asynchronous operations. In particular, the NMOvsdb singleton is
disposed in a destructor where it's not possible to send out all the
queued deletions.
The result is that NM deletes only one OVS interface, keeping the
others. This needs to be fixed, but requires a rework of the shutdown
procedure that involves many parts of NM.
Even when a better shutdown procedure will be implemented, we should
support an unclean shutdown caused by e.g. a kernel panic or a NM
crash. In these cases, the interfaces added by NM would still linger
in the ovsdb.
Delete all those interface at NM startup. If there are connections
profiles for them, NM will create them again.
Also, NMOvsdb now emits a NM_OVSDB_READY signal and provides a
nm_ovsdb_is_ready() to allow other parts of the daemon to order
actions after the initial OVS cleanup.
https://bugzilla.redhat.com/show_bug.cgi?id=1861296https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/700
If an OVS system interface that is active in NM gets removed
externally from the ovsdb, NM doesn't notice it and keeps the
connection active.
Let the OVS factory also handle the removal message for system
interfaces. In that case, we need to match the removed interface name
to the actual device, which can be of any kind.
I don't think that the way we track RA data is correct. For example,
all data (like DNSSL) is tracked regardless of their router. That means,
if a router sends an update, we cannot prematurely expire a DNSSL list
which is no longer announced by that router. Anyway, that should be
reviewed and considered another time.
However, we also must rate limit the number of elements we track in
general. So far we would only do that for the addresses.
The limit for max_addresses, is already read by sysctl. We only limit
it further to at most 300 addresses. The 300 is arbitrarily chosen.
The limit for DNSSL and RDNSS are taken from systemd-networkd
(NDISC_DNSSL_MAX, NDISC_RDNSS_MAX), which is 64.
The limit for gateways and routes are not based on anything and just
made up.
RFC4861 describes how to solicit routers, but this is later extended
by RFC7559 (Packet-Loss Resiliency for Router Solicitations).
Rework the scheduling of router solicitations to follow RFC7559.
Differences from RFC7559:
- the initial random delay before sending the first RS is only
up to 250 milliseconds (instead of 1 second).
- we never completely stop sending RS, even after we receive a valid
RA. We will still send RS every hour.
We no longer honor the sysctl variables
- /proc/sys/net/ipv6/conf/$IFNAME/router_solicitations
- /proc/sys/net/ipv6/conf/$IFNAME/router_solicitation_interval
I don't think having the autoconf algorithm configurable is useful.
At least not, if the configuration happens via sysctl variables.
https://tools.ietf.org/html/rfc4861#section-6.3.7https://tools.ietf.org/html/rfc7559#section-2.1
Elements of RAs have a lifetime. Previously we would track both
the timestamp (when we received the RA) and the lifetime.
However, we are mainly interested in the expiry time. So tracking the
expiry in form of timestamp and lifetime is redundant and cumbersome
to use.
Consider also the cases nm_ndisc_add_address() were we mangle the expiry.
In that case, the timestamp becomes meaningless or it's not clear what
the timestamp should be.
Also, there are no real cases where we actually need the receive timestamp.
Note that when we convert the times to NMPlatformIP6Address, we again need
to synthesize a base time stamp. But here too, it's NMPlatformIP6Address
fault of doing this pointless split of timestamp and lifetime.
While at it, increase the precision to milliseconds. As we receive
lifetimes with seconds precision, one might think that seconds precision
is enough for tracking the timeouts. However it just leads to ugly
uncertainty about rounding, when we can track times with sufficient
precision without downside. For example, before configuring an
address in kernel, we also need to calculate a remaining lifetime
with a lower precision. By having the exact values, we can do so
more accurately. At least, in theory. Of course NMPlatformIP6Address
itself has only precision of seconds, we already loose the information
before. However, NMNDisc no longer has that problem.
This was done since NDisc code was added to NetworkManager in
commit c3a4656a68 ('rdisc: libndp implementation').
Note what it does: in clean_dns_*() we will call solicit_router()
if the half-life of any entity is expired. That doesn't seem right.
Why only for dns_servers and dns_domains, but not routes, addresses
and gateways?
Also, why would the timings for when we solicit depend on when
elements expire. It is "normal" that some of them will expire.
We should solicit based on other parameters, like keeping track
of when and how to solicit.
Note that there is a change in behavior here: if we stopped
soliciting (either because we received our first RA or because
we run out of retries), then we now will never start again.
Previously this was a mechanism so that we would eventually
start soliciting again. This will be fixed in a follow-up
commit soon.
Otherwise, if there is a problem with the test they will run
indefinitely. Sure, meson will kill them after a while, but I
don't think autotools does, does it? Anyway, give a maximum
time to wait.
This locks NM into dhcpcd-9.3.3 as that is the first version to support
the --noconfigure option. Older versions are no longer supported by NM
because they do modify the host which is undesirable.
Due to the way dhcpcd-9 uses privilege separation and that it re-parents
itself to PID 1, the main process cannot be reaped or waited for.
So we rely on dhcpcd correctly cleaning up after itself.
A new function nm_dhcp_client_stop_watch_child() has been added
so that dhcpcd can perform similar cleanup to the equivalent stop call.
As part of this change, the STOP and STOPPED reasons are mapped to
NM_DHCP_STATE_DONE and PREINIT is mapped to a new state NM_DHCP_STATE_NOOP
which means NM should just ignore this state.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/668
nm_device_get_effective_ip_config_method() must called only on a
device with an applied connection. Fix assertion failure [1]:
nm_device_get_effective_ip_config_method: assertion 'NM_IS_CONNECTION(connection)' failed
[1] http://faf.lab.eng.brq.redhat.com/faf/reports/20217/
Fixes: 09c8387114 ('policy: use the hostname setting'):
Add _parse(), _parse_cons() and _parse_con() helper macros. These
already perform assertions that are common in those cases, and thus
reduce a lot of boiler plate code.
Also, _parse_cons() is exactly about parsing connections. The next
time we add an out parameter to nmi_cmdline_reader_parse() we won't
have to adjust all the call sites where this parameter doesn't matter.
Add support for `carrier-wait-timeout` setting from kernel cmdline.
This will create a new `15-carrier-timeout.conf` file in
/run/NetworkManager/conf.d with the parameter value as specified.
The setting also inserts `match-device` to `*`, matching all devices.
NB: The parameter on kernel cmdline is specified in seconds. This is
done to be backwards compatible with with network-legacy module. However
the generated setting will automatically multiply specified value by
1000 and store timeout value in ms.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/626https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/730
A device that is not IFF_UP does not have carrier. So we don't
know the real state before we bring it up.
On the other hand, during `nmcli connection up` we check whether the
device is available. So we are blocked. The solution is to optimistically
assume that the device has carrier if it is down. We may fail later.
$ nmcli connection add type veth con-name vv0 autoconnect no ifname vv0 peer vv1 ipv4.method shared ipv6.method shared
$ nmcli connection up vv0
$ nmcli device connect vv1
Error: Failed to add/activate new connection: Connection 'vv1' is not available on device vv1 because device has no carrier
Otherwise,
$ nmcli device connect veth0
fails with
Error: Failed to add/activate new connection: veth.peer: property is not specified
In complete_connection(), we should by default complete ethernet
connections, unless the caller already indicated to want a veth
profile.
Fixes: cd0cf9229d ('veth: add support to configure veth interfaces')
Currently, is retrieved by default only from the device with the
default route. This is done so that in presence of multiple
connections the choice is deterministic.
However, this limitation seems confusing for users, that expect to get
an hostname even for non-default devices. Change the default and allow
any device to obtain the hostname.
Note that when there is a default route, NM still prefers that device
and so the behavior doesn't change.
The only change in behavior is that when there is no default route and
the machine doesn't have a static hostname, NM will try to get
hostname from DHCP or reverse DNS.
https://bugzilla.redhat.com/show_bug.cgi?id=1766944
In case two devices have the same hostname-priority, prefer the one
with the best default route. In this way, even if
hostname.only-from-default is set to FALSE globally, the behavior is
similar to the past when there is a device with the default route.
Previously, NMPolicy considered only the hostname-priority and the
activation order to build the DeviceHostnameInfo list. Now it has to
consider also the presence of the default route, which depends on the
address family. Therefore, now there is a DeviceHostnameInfo for each
[device,address_family] combination.
We want to use this by "shared/nm-platform", which should have
no dependency on "libnm-core".
Move "libnm-core/nm-ethtool-utils.h" to "libnm/nm-ethtool-utils.h" so
that it is only used by libnm. This file contains the defines for
the option names.
Also, symlink "libnm/nm-ethtool-utils.h" as "shared/nm-base/nm-ethtool-utils-base.h".
We want to use the same defines also internally. Since they are both
public API (must be in libnm) and should be in "shared/nm-base", this
is the way.
We want to use these defines for option names also in "shared/nm-base"
(and in turn in "shared/nm-platform), which cannot include "libnm-core".
However, they are also public API of libnm.
To get this done, in a first step, move these defines to a new header
"libnm-core/nm-ethtool-utils.h".
Since now the name "nm-ethtool-utils.h" is taken, also rename
nm-libnm-core-intern files.
There should be a clear hierarchie of dependency. That is,
"nm-platform.h" may use "nm-platform-utils.h", but not the
other way around.
Move nm_platform_link_duplex_type_to_string().
Currently src/platform depends on libnm-core. libnm-core is large
optimally we have a better separation between our code. That means
libnm-core does not depend on platform and vice versa.
However, nm-platform re-uses some enums from libnm-core for internal code.
To avoid that dependency, add _NMSettingWiredWakeOnLan as a duplicate to
nm-base/nm-base.h. nm-base can both be used by libnm-core, nm-platform
and src/platform.
The only problem is that NMSettingWiredWakeOnLan is also part of public
API of libnm. It means, we must duplicate the enum. But with several
static assertions in unit tests I think that is not a problem to do.
Our dependencies are complicated.
Currently "src/platform" uses parts of libnm-core and is relatively
strongly entangled with core. It would be nice to have that part
clearly independent from "src" and from "libnm-core".
Also, "src/platform/nm-platform-utils.h" uses NMEthtoolID enum, which
previously was defined in "libnm-core/nm-libnm-core-intern/nm-ethtool-utils.h".
Move that to a new place "shared/nm-base/nm-base.h".
Note that we have "libnm-core/nm-libnm-core-intern", which is
libnm/core related code which uses and is used by libnm-core.
There is a need for a library which is used by libnm-core, but
does not depend on libnm-core itself. Here comes "shared/nm-base".
Yes, many libraries. But the goal is to entangle the dependencies
and have a clear hierarchy of includes. And to have "shared/nm-platform"
independent of libnm-core.
We want to move platform code to "shared/nm-platform". However, platform
code uses the logging infrastructure from the daemon, there is thus
an odd circular dependency.
Solve that by moving the "src/nm-logging.[hc]" to a new helper library
in "shared/nm-log-core".
"src/nm-logging.c" should be independent of libnm-core. It almost
is, except the error domain and code.
Move NM_MANAGER_ERROR to "nm-glib-aux/nm-shared-utils.h" so that
"nm-logging.c" is independent of libnm-core.
NetworkManager core is huge. We should try to split out
parts that are independent.
Platform code is already mostly independent. But due to having it
under "src/", there is no strict separation/layering which determines
the parts that can work independently. So, while the code is mostly
independent (in practice), that is not obvious from looking at the
source tree. It thus still contributes to cognitive load.
Add a shared library "shared/nm-platform", which should have no
dependencies on libnm-core or NetworkManager core.
In a first step, move the netlink code there. More should follow.
This is the same as libnm's nm_utils_hwaddr_aton(), which however
is public API.
We want to use this function also without libnm(-core). Hence add
the helper to "shared/nm-glib-aux".