The ifindex of a virtual device is set when the kernel link
appears. For OVS interfaces, this happens after NM has added the
record to the ovsdb; since NM needs to know the related port and
bridge when it adds ovs-interface record, and since interfaces are
enslaved when they reach stage3, the ifindex is set during stage3.
This means that the first time
nm_device_activate_schedule_stage3_ip_config_start() is called, the
ifindex is unset. Previously we would just set the firewall state as
initialized and the zone would never be set again. Instead, allow the
zone to be set when re-entering stage3.
nm_device_activate_schedule_stage3_ip_config_start() now always check
for the ifindex. This guarantees that we don't try to change zone for
devices without a kernel link (for example, OVS bridges and ports).
Upon reaching state ip-check, the device now changes the zone also if
an ifindex is present and the zone was not set before. I'm not sure
this can actually happen, because if the device has an ifindex it
should be set during stage3. However I'm leaving this extra check for
completeness.
https://bugzilla.redhat.com/show_bug.cgi?id=1921107https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/737
Failing the system interface device is almost always the right thing
to do when the ovsdb entry is removed.
However, to avoid that a late device-removed signal tears down a
different, newly-activated connection, let's also check that we have a
master. Or in alternative, that the device is assumed/external: in
such case it's always fine to fail the device
Fixes: 8e55efeb9d ('ovs: fail OVS system interfaces when the db entry gets removed')
https://bugzilla.redhat.com/show_bug.cgi?id=1923248https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/741
Currently "src/" mostly contains the source code of the daemon.
I say mostly, because that is not true, there are also the device,
settings, wwan, ppp plugins, the initrd generator, the pppd and dhcp
helper, and probably more.
Also we have source code under libnm-core/, libnm/, clients/, and
shared/ directories. That is all confusing.
We should have one "src" directory, that contains subdirectories. Those
subdirectories should contain individual parts (libraries or
applications), that possibly have dependencies on other subdirectories.
There should be a flat hierarchy of directories under src/, which
contains individual modules.
As the name "src/" is already taken, that prevents any sensible
restructuring of the code.
As a first step, move "src/" to "src/core/". This gives space to
reorganize the code better by moving individual components into "src/".
For inspiration, look at systemd's "src/" directory.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/743
NMUdevClient does not actually implement ref-counting, because it's not
used. Still, the destroy function was named nm_udev_client_unref(),
because theoretically then we could later, as the need arises, make
the type ref-counted. Then unref function already had the right name.
However, NMUdevClient also has a callback function that emits monitor
events. Again for simplicity, this callback function cannot be reset, it
can only be set once (in the constructor) and can also not be unset nor
disabled.
When the user of NMUdevClient is done with the instance and calls
"unref", then it must be sure that the callback is no longer invoked
afterwards. In practice that is already the case, but "unref" makes it
sound as if somebody else could also still hold a reference -- in which
case the user would have to first unset/disable the callback.
Rename the function to "destroy()", so that it's clear that the instance
is gone afterwards and that the callback will not be invoked anymore.
In some cases the enslavement event of a device gets
processed by nm-platform before the master NMDevice is
created, in such cases the enslavement information is simply
lost by NM, to prevent this let NMDevice connect to the
'device-ifindex-changed' signal to be notified of when its master
NMDevice appears so that eventually the NMDevices will be consistent
with the kernel network state.
https://bugzilla.redhat.com/show_bug.cgi?id=1870691
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Bonds need to release all enslaved devices when
the bond mode is to be changed.
Also release slaves when they're external interfaces as
it means the master it's now managed by NetworkManager.
https://bugzilla.redhat.com/show_bug.cgi?id=1870691
Signed-off-by: Antonio Cardace <acardace@redhat.com>
This is the default state for new devices hence they should
return back to this state when unmanaged.
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Sometimes the option can't be displayed as is because it contains NULL
or non-ascii characters. Rename the 'hidden' argument of
nm_supplicant_config_add_option_with_type() to 'display_value' so that
callers can pass a preferred string representation of the value.
We have many static helper libraries, like libnm-glib-aux or libnm-core.
These can be statically linked in any end-binary as internal API. However, they
must only be linked once.
Also, we have various plugins (device, settings, ppp, wwan) which are
dlopened by NetworkManager. They should use the symbols from
NetworkManager core. It is important that they do not link with the
static libraries already, because also NetworkManager core links with
it, so these symbols will be duplicate.
As the symbols are internal, you might think that it is not a real
problem to duplicate them. However, there are also global variables,
like the hash tables for NMRefStr or the seed for NMHash. These global
variables must be only be used once, and hence also these symbols must
no be duplicated.
Fix that by adding a new dependency that is for the core plugins. This
dependency only has "include_directories" but not "link_with".
Naming for us is hard, because everything is an "nm". However, let's
standardize on the term "core" for the daemon, and not "daemon".
Eventually I would like to move the daemon from "src/" to "src/core/",
rename the dep in anticipation of that.
OVS system interfaces can start to connect even before the ovsdb is
ready. However, the connection attempt is doomed to fail and the
NMSettingsConnection gets blocked with reason FAILED.
Unblock them once the ovsdb is ready.
Ideally, NMPolicy should subscribe to the NMOvsdb::ready signal,
however NMOvsdb is in a plugin, so it's easier if NMOvsdb directly
calls a function of the core.
Sometimes during boot the device starts an activation with reason
'assumed', then the ovs interfaces gets removed from the ovsdb. At
this point the device has an active-connection associated; it has the
activate_stage1_device_prepare() activation source scheduled, but it
is still in disconnected state.
Currently the factory doesn't recognizes that the device is activating
and allows the device to proceed even if the interface was removed
from the ovsdb. Fix this by checking the presence of an
active-connection instead.
During shutdown, NM always tries to remove from ovsdb all bridges,
ports, interfaces that it previously added. Currently NM doesn't run
the main loop during shutdown and so it's not possible to perform
asynchronous operations. In particular, the NMOvsdb singleton is
disposed in a destructor where it's not possible to send out all the
queued deletions.
The result is that NM deletes only one OVS interface, keeping the
others. This needs to be fixed, but requires a rework of the shutdown
procedure that involves many parts of NM.
Even when a better shutdown procedure will be implemented, we should
support an unclean shutdown caused by e.g. a kernel panic or a NM
crash. In these cases, the interfaces added by NM would still linger
in the ovsdb.
Delete all those interface at NM startup. If there are connections
profiles for them, NM will create them again.
Also, NMOvsdb now emits a NM_OVSDB_READY signal and provides a
nm_ovsdb_is_ready() to allow other parts of the daemon to order
actions after the initial OVS cleanup.
https://bugzilla.redhat.com/show_bug.cgi?id=1861296https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/700
If an OVS system interface that is active in NM gets removed
externally from the ovsdb, NM doesn't notice it and keeps the
connection active.
Let the OVS factory also handle the removal message for system
interfaces. In that case, we need to match the removed interface name
to the actual device, which can be of any kind.
I don't think that the way we track RA data is correct. For example,
all data (like DNSSL) is tracked regardless of their router. That means,
if a router sends an update, we cannot prematurely expire a DNSSL list
which is no longer announced by that router. Anyway, that should be
reviewed and considered another time.
However, we also must rate limit the number of elements we track in
general. So far we would only do that for the addresses.
The limit for max_addresses, is already read by sysctl. We only limit
it further to at most 300 addresses. The 300 is arbitrarily chosen.
The limit for DNSSL and RDNSS are taken from systemd-networkd
(NDISC_DNSSL_MAX, NDISC_RDNSS_MAX), which is 64.
The limit for gateways and routes are not based on anything and just
made up.
RFC4861 describes how to solicit routers, but this is later extended
by RFC7559 (Packet-Loss Resiliency for Router Solicitations).
Rework the scheduling of router solicitations to follow RFC7559.
Differences from RFC7559:
- the initial random delay before sending the first RS is only
up to 250 milliseconds (instead of 1 second).
- we never completely stop sending RS, even after we receive a valid
RA. We will still send RS every hour.
We no longer honor the sysctl variables
- /proc/sys/net/ipv6/conf/$IFNAME/router_solicitations
- /proc/sys/net/ipv6/conf/$IFNAME/router_solicitation_interval
I don't think having the autoconf algorithm configurable is useful.
At least not, if the configuration happens via sysctl variables.
https://tools.ietf.org/html/rfc4861#section-6.3.7https://tools.ietf.org/html/rfc7559#section-2.1
Elements of RAs have a lifetime. Previously we would track both
the timestamp (when we received the RA) and the lifetime.
However, we are mainly interested in the expiry time. So tracking the
expiry in form of timestamp and lifetime is redundant and cumbersome
to use.
Consider also the cases nm_ndisc_add_address() were we mangle the expiry.
In that case, the timestamp becomes meaningless or it's not clear what
the timestamp should be.
Also, there are no real cases where we actually need the receive timestamp.
Note that when we convert the times to NMPlatformIP6Address, we again need
to synthesize a base time stamp. But here too, it's NMPlatformIP6Address
fault of doing this pointless split of timestamp and lifetime.
While at it, increase the precision to milliseconds. As we receive
lifetimes with seconds precision, one might think that seconds precision
is enough for tracking the timeouts. However it just leads to ugly
uncertainty about rounding, when we can track times with sufficient
precision without downside. For example, before configuring an
address in kernel, we also need to calculate a remaining lifetime
with a lower precision. By having the exact values, we can do so
more accurately. At least, in theory. Of course NMPlatformIP6Address
itself has only precision of seconds, we already loose the information
before. However, NMNDisc no longer has that problem.
This was done since NDisc code was added to NetworkManager in
commit c3a4656a68 ('rdisc: libndp implementation').
Note what it does: in clean_dns_*() we will call solicit_router()
if the half-life of any entity is expired. That doesn't seem right.
Why only for dns_servers and dns_domains, but not routes, addresses
and gateways?
Also, why would the timings for when we solicit depend on when
elements expire. It is "normal" that some of them will expire.
We should solicit based on other parameters, like keeping track
of when and how to solicit.
Note that there is a change in behavior here: if we stopped
soliciting (either because we received our first RA or because
we run out of retries), then we now will never start again.
Previously this was a mechanism so that we would eventually
start soliciting again. This will be fixed in a follow-up
commit soon.
Otherwise, if there is a problem with the test they will run
indefinitely. Sure, meson will kill them after a while, but I
don't think autotools does, does it? Anyway, give a maximum
time to wait.
This locks NM into dhcpcd-9.3.3 as that is the first version to support
the --noconfigure option. Older versions are no longer supported by NM
because they do modify the host which is undesirable.
Due to the way dhcpcd-9 uses privilege separation and that it re-parents
itself to PID 1, the main process cannot be reaped or waited for.
So we rely on dhcpcd correctly cleaning up after itself.
A new function nm_dhcp_client_stop_watch_child() has been added
so that dhcpcd can perform similar cleanup to the equivalent stop call.
As part of this change, the STOP and STOPPED reasons are mapped to
NM_DHCP_STATE_DONE and PREINIT is mapped to a new state NM_DHCP_STATE_NOOP
which means NM should just ignore this state.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/668
nm_device_get_effective_ip_config_method() must called only on a
device with an applied connection. Fix assertion failure [1]:
nm_device_get_effective_ip_config_method: assertion 'NM_IS_CONNECTION(connection)' failed
[1] http://faf.lab.eng.brq.redhat.com/faf/reports/20217/
Fixes: 09c8387114 ('policy: use the hostname setting'):
Add _parse(), _parse_cons() and _parse_con() helper macros. These
already perform assertions that are common in those cases, and thus
reduce a lot of boiler plate code.
Also, _parse_cons() is exactly about parsing connections. The next
time we add an out parameter to nmi_cmdline_reader_parse() we won't
have to adjust all the call sites where this parameter doesn't matter.
Add support for `carrier-wait-timeout` setting from kernel cmdline.
This will create a new `15-carrier-timeout.conf` file in
/run/NetworkManager/conf.d with the parameter value as specified.
The setting also inserts `match-device` to `*`, matching all devices.
NB: The parameter on kernel cmdline is specified in seconds. This is
done to be backwards compatible with with network-legacy module. However
the generated setting will automatically multiply specified value by
1000 and store timeout value in ms.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/626https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/730
A device that is not IFF_UP does not have carrier. So we don't
know the real state before we bring it up.
On the other hand, during `nmcli connection up` we check whether the
device is available. So we are blocked. The solution is to optimistically
assume that the device has carrier if it is down. We may fail later.
$ nmcli connection add type veth con-name vv0 autoconnect no ifname vv0 peer vv1 ipv4.method shared ipv6.method shared
$ nmcli connection up vv0
$ nmcli device connect vv1
Error: Failed to add/activate new connection: Connection 'vv1' is not available on device vv1 because device has no carrier
Otherwise,
$ nmcli device connect veth0
fails with
Error: Failed to add/activate new connection: veth.peer: property is not specified
In complete_connection(), we should by default complete ethernet
connections, unless the caller already indicated to want a veth
profile.
Fixes: cd0cf9229d ('veth: add support to configure veth interfaces')
Currently, is retrieved by default only from the device with the
default route. This is done so that in presence of multiple
connections the choice is deterministic.
However, this limitation seems confusing for users, that expect to get
an hostname even for non-default devices. Change the default and allow
any device to obtain the hostname.
Note that when there is a default route, NM still prefers that device
and so the behavior doesn't change.
The only change in behavior is that when there is no default route and
the machine doesn't have a static hostname, NM will try to get
hostname from DHCP or reverse DNS.
https://bugzilla.redhat.com/show_bug.cgi?id=1766944
In case two devices have the same hostname-priority, prefer the one
with the best default route. In this way, even if
hostname.only-from-default is set to FALSE globally, the behavior is
similar to the past when there is a device with the default route.
Previously, NMPolicy considered only the hostname-priority and the
activation order to build the DeviceHostnameInfo list. Now it has to
consider also the presence of the default route, which depends on the
address family. Therefore, now there is a DeviceHostnameInfo for each
[device,address_family] combination.