The DNS name can now also contain the DoT server name. It's not longer a
binary IP address only.
Extend NML3ConfigData to account for that. To track the additional
data, use the string representation. The alternative to have a separate
type that contains the parsed information would be cumbersome too.
The "device ... not available because device is strictly unmanaged" is
almost certainly the least interesting of the reasons why connection
can't be activated on a device.
Invent a new error level for it and demote it.
Before:
Error: Connection activation failed: No suitable device found
for this connection (device lo not available because
device is strictly unmanaged).
After
Error: Connection activation failed: No suitable device found
for this connection (device eth0 not available because
profile is not compatible with device (...)).
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1433
For connections with multi-connect property set to "multiple", the
autoconnect-retries should be tracked per device and not per connection.
That means, if autoconnect-retries is set to 2, each device using that
connection should retry to autoconnect 2 times.
The device autoconnect retries is -2 by default. This is a special
value, in NMPolicy context, if the connection used is multi-connect the
device value will be set to match the connection retries. Each time the
device picks a different connection, it will reset the device
autoconnect retries to -2 and if needed, sync. with the connection
retries.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1387https://bugzilla.redhat.com/show_bug.cgi?id=2039734
This partly reverts 1fe8166fc9 ('device: only deactivate when the master
we've enslaved to goes away').
If the controller fails while the port is not yet fully attached,
before this patch the following happened:
<info> [1664299566.1065] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299566.1073] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299566.1073] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (not enslaved) (configure)
<debug> [1664299566.1073] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<info> [1664299566.1080] device (eth1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Note that now eth1 has no controller, but it lingers in "ip-config" state indefinitely.
If we look at a case where the port is already attached we see:
<info> [1664299540.9661] device (bond0): state change: secondaries -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299540.9667] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299540.9667] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (enslaved) (configure)
<debug> [1664299540.9667] platform: (eth1) link: releasing 10 from master 'bond0' (80)
...
<info> [1664299540.9740] device (bond0): detached bond port eth1
...
<debug> [1664299540.9749] device[a9f10ea824bb1725] (eth1): Activation: connection 'eth1' master failed
...
<warn> [1664299540.9749] device (eth1): queue-state[secondaries, reason:none, id:520]: replace previously queued state change
...
<debug> [1664299540.9750] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: queue state change
<debug> [1664299540.9751] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664299541.0201] device[a9f10ea824bb1725] (eth1): enslaved to unknown device 0 (??)
...
<debug> [1664299541.0227] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: change state
<info> [1664299541.0228] device (eth1): state change: ip-check -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Fix that by not ignoring the nm_device_slave_notify_release() call. Now we get:
<info> [1664391684.9757] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<debug> [1664391684.9759] active-connection[69c2b12d61f5b171]: set state deactivated (was activating)
<debug> [1664391684.9760] active-connection[142bb8240f6a696d]: check-master-ready: already signalled (state activating, master 0x56116f1480a0 is in state deactivated)
...
<debug> [1664391684.9762] manager: ActivatingConnection now (none)
...
<warn> [1664391684.9763] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664391684.9763] device[142828814dec6e26] (bond0): master: release one slave 720791275fe8a68c/eth1 (not enslaved) (configure)
<debug> [1664391684.9763] device[720791275fe8a68c] (eth1): Activation: connection 'eth1' master failed
...
<debug> [1664391684.9764] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: queue state change
<debug> [1664391684.9765] device[720791275fe8a68c] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664391684.9797] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: change state
<info> [1664391684.9797] device (eth1): state change: config -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Commit 1fe8166fc9 ('device: only deactivate when the master we've
enslaved to goes away') added the "return", but it seems to also add it
in cases where we need to handle this. Restrict the return to cases if
we do "no-config".
https://bugzilla.redhat.com/show_bug.cgi?id=2130287
Fixes: 1fe8166fc9 ('device: only deactivate when the master we've enslaved to goes away')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1406
If we're notified of a master appearing, make sure there's actually an
ifindex we're waiting for.
Triger an assertion failure if that is not the case, cause that's pretty
messed up.
Since commit a1de6810df ('device: don't ignore external slave removals')
we don't leave device_recheck_slave_status() on un-eslaving (that is
plink->master = 0) early enough.
This results in hooking of NM_MANAGER_DEVICE_IFINDEX_CHANGED even
when we're not actually waiting for any master device to come up,
accompanied by a messed up log entry:
device[3fa7cfc200be4e84] (portXc): enslaved to unknown device 0 (??)
We also log nonsense when we see any device's link being removed:
device[a9a4b65bde851bcf] (br0): ifindex: set ifindex 0 (old-l3cfg: 05c6a4409f84d9d2)
device[45d34e95fb71cce0] (portXa): master br0 with ifindex 0 appeared
We don't do further damage afterwards, so this is purely a cosmetic
annoyance.
This is something that does happen.
Is that a bug? If so, this should not be a warning message but an
assertion failure. If it's not a bug, then this does not warrant warning
level, because the user wouldn't know what to do about this and it's
something that occasionally happens.
Granted, the state handling in NMDevice is complex, that it's unclear
whether this indicates a problem or not. In any case, having a warning
does only confuse the user.
We already have src/linux-headers, where we have complete copies of linux
user space headers. Of course that exists, because we want to use certain
features and don't depend on the installed kernel headers. Which works
well, because kernel user space API is stable, and we anyway want to
support compiling against a newer kernel and run against an older (e.g.
in a container). So having our copy of newer kernel headers is merely
as if we compiled against as newer kernel.
Add "src/nm-compat-headers" which has a similar purpose, but a different
approach. Instead of replacing the included header entirely, include
the system header and patch it with #define.
Use this for "linux/if_addr.h". Of course, the approach here is that we
no longer include <linux/if_addr.h> directly, but instead include
"nm-compat-headers/linux/if_addr.h".
Sync/blocking methods are ugly. Their name should highlight this.
Also, we may have an async variant, so we will need the "good" name
for apply() and apply_finish().
We've been outright ignoring master-slave checks if the link ended up
without a master since commit 2e22880894 ('device: don't remove the
device from master if its link has no master').
This was done to deal with OpenVSwitch port-interface relationship,
where the interface's platform link lacked an actual master in platform
(what matters there is the OVSDB entry), but the fix was too wide.
Let's limit the special case to devices whose were not enslaved to
masters that lack a platform link, which pretty much happens for
OpenVSwitch only.
Morale: Write better commit messages of future you is going to be upset
Fixes: 2e22880894 ('device: don't remove the device from master if its link has no master')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1358
The @dracut_NM_vlan_over_team_no_boot sometimes fails, among other
things, because it fails to assume an indicated connection after a
restart.
That seems to happen because after the decision to activate the
indicated connection, the device does not move from DISCONNECTED state
quickly enough. Another assumption recheck runs in between and decides
to generate a connection, because the assume state was already reset
in between.
First start, creates and activates b3a61b68-f744-4a4c-a513-61399c154a67
on vlan0017:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
...
settings: update[b3a61b68-f744-4a4c-a513-61399c154a67]: adding connection "vlan0017"
(45113870df0a4cfb/keyfile)
Second start:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(after a restart, asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
Assumption attempt successfully picks the right connection and thus
proceeds to reset the assume state:
manager: (vlan0017): assume: will attempt to assume matching connection 'vlan0017'
(b3a61b68-f744-4a4c-a513-61399c154a67) (indicated)
device[c7c5101cf0b73f5f] (vlan0017): assume-state: set guess-assume=0, connection=(null)
Everything great so far, activation of the right connection is enqueued
and the device moves away from unavailable state. However, the
activation can't proceed immediately:
device (vlan0017): state change: unmanaged -> unavailable
(reason 'connection-assumed', sys-iface-state: 'assume')
device (vlan0017): state change: unavailable -> disconnected
(reason 'connection-assumed', sys-iface-state: 'assume')
active-connection[0x55ba1162f1c0]: set device "vlan0017" [0x55ba1163c4f0]
device[c7c5101cf0b73f5f] (vlan0017): queue activation request waiting for carrier
Now another assumption attempt is done. The original assume state is
gone, so a connection is generated:
platform-linux: UDEV event: action 'add' subsys 'net' device 'vlan0017' (6); seqnum=1959
device[c7c5101cf0b73f5f] (vlan0017): queued link change for ifindex 6
manager: (vlan0017): assume: generated connection 'vlan0017' (57627119-8c20-4f9e-bf4d-4fc427b4a6a9)
keyfile: commit: 57627119-8c20-4f9e-bf4d-4fc427b4a6a9 (vlan0017) added as
"/run/NetworkManager/system-connections/vlan0017-57627119-8c20-4f9e-bf4d-4fc427b4a6a9.nmconnection"
(nm-generated,volatile,external)
I think this shouldn't have happened. We've picked the correct
connection already and it's enqueued for activation!
Change the check in nm_device_emit_recheck_assume() to also consider
any queued activation.
Fixes-test: @dracut_NM_vlan_over_team_no_boot
Co-authored-by: Lubomir Rintel <lkundrak@v3.sk>
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1351
If the MAC changes there is the possibility that the DHCP client will
not be able to renew the address because it uses the old MAC as
CHADDR. Depending on the implementation, the DHCP server might use
CHADDR (so, the old address) as the destination MAC for DHCP replies,
and those packets will be lost.
To avoid this problem, restart the DHCP client when the MAC changes.
https://bugzilla.redhat.com/show_bug.cgi?id=2110000
1) The "enabled-on-global-iface" flag was odd. Instead, have only
and "enabled" flag and skip (by default) endpoints on interface
that have no default route. With the new flag "also-without-default-route",
this can be overruled. So previous "enabled-on-global-default" now is
the same as "enabled", and "enabled" from before behaves now like
"enabled,also-without-default-route".
2) What was also odd, as that the fallback default value for the flags
depends on "/proc/sys/net/mptcp/enabled". There was not one fixed
fallback default, instead the used fallback value was either
"enabled-on-global-iface,subflow" or "disabled".
Usually that is not a problem (e.g. the default value for
"ipv6.ip6-privacy" also depends on use_tempaddr sysctl). In this case
it is a problem, because the mptcp-flags (for better or worse) encode
different things at the same time.
Consider that the mptcp-flags can also have their default configured in
"NetworkManager.conf", a user who wants to switch the address flags
could previously do:
[connection.mptcp]
connection.mptcp-flags=0x32 # enabled-on-global-iface,signal,subflow
but then the global toggle "/proc/sys/net/mptcp/enabled" was no longer
honored. That means, MPTCP handling was always on, even if the sysctl was
disabled. Now, "enabled" means that it's only enabled if the sysctl
is enabled too. Now the user could write to "NetworkManager.conf"
[connection.mptcp]
connection.mptcp-flags=0x32 # enabled,signal,subflow
and MPTCP handling would still be disabled unless the sysctl
is enabled.
There is now also a new flag "also-without-sysctl", so if you want
to really enable MPTCP handling regardless of the sysctl, you can.
The point of that might be, that we still can configure endpoints,
even if kernel won't do anything with them. Then you could just flip
the sysctl, and it would start working (as NetworkManager configured
the endpoints already).
Fixes: eb083eece5 ('all: add NMMptcpFlags and connection.mptcp-flags property')
- name things related to `in_addr_t`, `struct in6_addr`, `NMIPAddr` as
`nm_ip4_addr_*()`, `nm_ip6_addr_*()`, `nm_ip_addr_*()`, respectively.
- we have a wrapper `nm_inet_ntop()` for `inet_ntop()`. This name
of our wrapper is chosen to be familiar with the libc underlying
function. With this, also name functions that are about string
representations of addresses `nm_inet_*()`, `nm_inet4_*()`,
`nm_inet6_*()`. For example, `nm_inet_parse_str()`,
`nm_inet_is_normalized()`.
<<<<
R() {
git grep -l "$1" | xargs sed -i "s/\<$1\>/$2/g"
}
R NM_CMP_DIRECT_IN4ADDR_SAME_PREFIX NM_CMP_DIRECT_IP4_ADDR_SAME_PREFIX
R NM_CMP_DIRECT_IN6ADDR_SAME_PREFIX NM_CMP_DIRECT_IP6_ADDR_SAME_PREFIX
R NM_UTILS_INET_ADDRSTRLEN NM_INET_ADDRSTRLEN
R _nm_utils_inet4_ntop nm_inet4_ntop
R _nm_utils_inet6_ntop nm_inet6_ntop
R _nm_utils_ip4_get_default_prefix nm_ip4_addr_get_default_prefix
R _nm_utils_ip4_get_default_prefix0 nm_ip4_addr_get_default_prefix0
R _nm_utils_ip4_netmask_to_prefix nm_ip4_addr_netmask_to_prefix
R _nm_utils_ip4_prefix_to_netmask nm_ip4_addr_netmask_from_prefix
R nm_utils_inet4_ntop_dup nm_inet4_ntop_dup
R nm_utils_inet6_ntop_dup nm_inet6_ntop_dup
R nm_utils_inet_ntop nm_inet_ntop
R nm_utils_inet_ntop_dup nm_inet_ntop_dup
R nm_utils_ip4_address_clear_host_address nm_ip4_addr_clear_host_address
R nm_utils_ip4_address_is_link_local nm_ip4_addr_is_link_local
R nm_utils_ip4_address_is_loopback nm_ip4_addr_is_loopback
R nm_utils_ip4_address_is_zeronet nm_ip4_addr_is_zeronet
R nm_utils_ip4_address_same_prefix nm_ip4_addr_same_prefix
R nm_utils_ip4_address_same_prefix_cmp nm_ip4_addr_same_prefix_cmp
R nm_utils_ip6_address_clear_host_address nm_ip6_addr_clear_host_address
R nm_utils_ip6_address_same_prefix nm_ip6_addr_same_prefix
R nm_utils_ip6_address_same_prefix_cmp nm_ip6_addr_same_prefix_cmp
R nm_utils_ip6_is_ula nm_ip6_addr_is_ula
R nm_utils_ip_address_same_prefix nm_ip_addr_same_prefix
R nm_utils_ip_address_same_prefix_cmp nm_ip_addr_same_prefix_cmp
R nm_utils_ip_is_site_local nm_ip_addr_is_site_local
R nm_utils_ipaddr_is_normalized nm_inet_is_normalized
R nm_utils_ipaddr_is_valid nm_inet_is_valid
R nm_utils_ipx_address_clear_host_address nm_ip_addr_clear_host_address
R nm_utils_parse_inaddr nm_inet_parse_str
R nm_utils_parse_inaddr_bin nm_inet_parse_bin
R nm_utils_parse_inaddr_bin_full nm_inet_parse_bin_full
R nm_utils_parse_inaddr_prefix nm_inet_parse_with_prefix_str
R nm_utils_parse_inaddr_prefix_bin nm_inet_parse_with_prefix_bin
R test_nm_utils_ip6_address_same_prefix test_nm_ip_addr_same_prefix
./contrib/scripts/nm-code-format.sh -F
When only one of those connection.{lldp,mdns,llmnr,dns-over-tls}
settings changes, we still need to do a full restart of the IP
configuration to reapply the changes.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
Heavily inspired by systemd ([1]).
We now also have nm_random_get_bytes{,_full}() and
nm_random_get_crypto_bytes(), like systemd's random_bytes()
and crypto_random_bytes(), respectively.
Differences:
- instead of systemd's random_bytes(), our nm_random_get_bytes_full()
also estimates whether the output is of high quality. The caller
may find that interesting. Due to that, we will first try to call
getrandom(GRND_NONBLOCK) before getrandom(GRND_INSECURE). That is
reversed from systemd's random_bytes(), because we want to find
out whether we can get good random numbers. In most cases, kernel
should have entropy already, and it makes no difference.
Otherwise, heavily rework the code. It should be easy to understand
and correct.
There is also a major bugfix here. Previously, if getrandom() failed
with ENOSYS and we fell back to /dev/urandom, we would assume that we
have high quality random numbers. That assumption is not warranted.
Now instead poll on /dev/random to find out.
[1] a268e7f402/src/basic/random-util.c (L81)
The devices generally need to be IFF_UP and wait a little before the
carrier detection is reliable. Some devices, actually need to wait
more than a little -- r8169 needs up to 5 seconds.
For this reason, we delay startup complete while the carrier is down
after we bring the device up. We do this so that we don't reject
activations due to carrier down until we're sure it's really down.
This works well as long as it's us who brought the device up.
If we're restarting the daemon, the device is going to be already up
when we start up the daemon for the second time. There's, however, a
slim chance that the device was brought down and up very shortly before
the restart and therefore the carrier reporting is still not reliable.
As a matter of fact, we bring the devices down and back up on some
occassions, such as when enslaving to a team device.
Therefore, the following events in quick succession cause trouble:
# nmcli con up team-slave-eth0
[20099.205355] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.365641] nm-team: Port device eth0 added
[20099.370728] r8169 0000:03:00.0 eth0: Link is Down
[20099.436631] nm-team: Port device eth0 removed
[20099.463422] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.628505] r8169 0000:03:00.0 eth0: Link is Down
[20099.669425] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.833457] r8169 0000:03:00.0 eth0: Link is Down
[20099.838471] nm-team: Port device eth0 added
The device has been brought down, enslaved and brought up.
"Link is Down" indicates carrier not being detected.
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/7)
# systemctl restart NetworkManager
Now NM sees the device being up, but carrier down.
# nmcli con up testeth0
Error: Connection activation failed: No suitable device found for this connection (...).
Activation failed, because eth0 carrier still appears down.
# [20102.943464] r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Now it's up, but the party is already over. Shiet.
Let's wait whenever the device reaches unavailable state, whether we
bring it up at that point or not.
Fixes-test: @restart_L2_only_lacp
https://bugzilla.redhat.com/show_bug.cgi?id=2092361https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1316
NetworkManager primarily manages interfaces in an independent fashion.
That means, whenever possible, we want to have a interface specific
view. In many cases, the underlying kernel API also supports that view.
For example, when configuring IP addresses or unicast routes, we do so
per interfaces and don't need a holistic view.
However, that is not always sufficient. For routing rules and certain
route types (blackhole, unreachable, etc), we need a system wide view
of all the objects in the network namespace.
Originally, NMPRulesManager was added to track routing rules. Then, it
was extended to also track certain route types, and the API was renamed to
NMPRouteManager.
This will also be used to track MPTCP addresses.
So rename again, to give it a general name that is suitable for what it
does. Still, the name is not great (suggestion welcome), but it should
cover the purpose of the API well enough. And it's the best I came
up with.
Rename.
In the past, nmp_lookup_init_object() could both lookup all object for a
certain ifindex, and lookup all objects of a type. That fallback path
already leads to an assertion failure fora while now, so nobody should
be using this function to lookup all objects of a certain type (for
what, we have nmp_lookup_init_obj_type()).
Now, remove the fallback path, and rename the function to what it really
does.
It can be useful to choose a different "ipv6.addr-gen-mode". And it can be
useful to override the default for a set of profiles.
For example, in cloud or in a data center, stable-privacy might not be
the best choice. Add a mechanism to override the default via global defaults
in NetworkManager.conf:
# /etc/NetworkManager/conf.d/90-ipv6-addr-gen-mode-override.conf
[connection-90-ipv6-addr-gen-mode-override]
match-device=type:ethernet
ipv6.addr-gen-mode=0
"ipv6.addr-gen-mode" is a special property, because its default depends on
the component that configures the profile.
- when read from disk (keyfile and ifcfg-rh), a missing addr-gen-mode
key means to default to "eui64".
- when configured via D-Bus, a missing addr-gen-mode property means to
default to "stable-privacy".
- libnm's ip6-config::addr-gen-mode property defaults to
"stable-privacy".
- when some tool creates a profile, they either can explicitly
set the mode, or they get the default of the underlying mechanisms
above.
- nm-initrd-generator explicitly sets "eui64" for profiles it creates.
- nmcli doesn' explicitly set it, but inherits the default form
libnm's ip6-config::addr-gen-mode.
- when NM creates a auto-default-connection for ethernet ("Wired connection 1"),
it inherits the default from libnm's ip6-config::addr-gen-mode.
Global connection defaults only take effect when the per-profile
value is set to a special default/unset value. To account for the
different cases above, we add two such special values: "default" and
"default-or-eui64". That's something we didn't do before, but it seams
useful and easy to understand.
Also, this neatly expresses the current behaviors we already have. E.g.
if you don't specify the "addr-gen-mode" in a keyfile, "default-or-eui64"
is a pretty clear thing.
Note that usually we cannot change default values, in particular not for
libnm's properties. That is because we don't serialize the default
values to D-Bus/keyfile, so if we change the default, we change
behavior. Here we change from "stable-privacy" to "default" and
from "eui64" to "default-or-eui64". That means, the user only experiences
a change in behavior, if they have a ".conf" file that overrides the default.
https://bugzilla.redhat.com/show_bug.cgi?id=1743161https://bugzilla.redhat.com/show_bug.cgi?id=2082682
See-also: https://github.com/coreos/fedora-coreos-tracker/issues/907https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1213
The property wait-activation-delay will delay the activation of an
interface the specified amount of milliseconds. Please notice that it
could be delayed some milliseconds more due to other events in
NetworkManager.
This could be used in multiple scenarios where the user needs to define
an arbitrary delay e.g LACP bond configure where the LACP negotiation
takes a few seconds and traffic is not allowed, so they would like to
use nm-online and a setting configured with this new property to wait
some seconds. Therefore, when nm-online is finished, LACP bond should be
ready to receive traffic.
The delay will happen right before the device is ready to be activated.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1248https://bugzilla.redhat.com/show_bug.cgi?id=2008337
When we're deactivating an externally created device that has a master
because we're activating a connection on it, actually remove the device
from the master. Otherwise unpleasant things happen:
active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
device (dummy0): disconnecting for new activation request.
device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
Note the "no-config" above. We'set priv->master = NULL, but didn't
communicate the change to the platform. I believe this is not good.
This patch changes that.
device (br0): bridge port dummy0 was detached
device (dummy0): released from master device br0
active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
platform: (dummy0) emit signal link-changed changed: 102: dummy0
<NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
Now the platform sent us a new link, the "master" property is still set.
device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
device (br0): master: add one slave 0a458361f9fed8f5/dummy0
Amidst the new activation we're processing the netlink message we got.
We set priv->master back, effectively nullifying the release above. Sad.
device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
active-connection[0x55ed7ba78400]: set state activating (was unknown)
manager: NetworkManager state is now CONNECTING
active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
platform: (dummy0) link: releasing 102 from master 'br0' (101)
device (br0): detached bridge port dummy0
Now things go south. The stage1 cleans the device up, removing it from
the master and the device itself decides it should deactivate itself
because it lots its master regardless of the fact that it should not
have one and it's in fact an unwanted carryover from previous activation.
I believe this is also wrong.
device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
device[0a458361f9fed8f5] (dummy0): queue-state[deactivating, reason:connection-assumed, id:298]: queue state change
device[0a458361f9fed8f5] (dummy0): activation-stage: synchronously invoke activate_stage2_device_config
device (dummy0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Now things are really weird. We synchronously go to config, effectively
overriding the queued deactivation. We've really messed up.
Sometimes weird things happen.
Let dummy0 be an externally created device that has a master. We decide
to activate a connection that has no master on it:
active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
device (dummy0): disconnecting for new activation request.
device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
Note the "no-config" above. We'set priv->master = NULL, but didn't
communicate the change to the platform. I believe this is not good.
device (br0): bridge port dummy0 was detached
device (dummy0): released from master device br0
active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
platform: (dummy0) emit signal link-changed changed: 102: dummy0
<NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
Now the platform sent us a new link, the "master" property is still set.
device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
device (br0): master: add one slave 0a458361f9fed8f5/dummy0
Amidst the new activation we're processing the netlink message we got.
We set priv->master back, effectively nullifying the release above.
device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
active-connection[0x55ed7ba78400]: set state activating (was unknown)
manager: NetworkManager state is now CONNECTING
active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
platform: (dummy0) link: releasing 102 from master 'br0' (101)
device (br0): detached bridge port dummy0
Now stage1 cleans the device up, removing it from the master.
device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
We decide to deal with this by enqueuing a deactivation. That is not
great -- we shouldn't even have had this master!
This patch takes the deactivation path only if we were willingly
enslaved to the master in question.
The @bond_mode_8023ad test has been seen failing, with a log like this:
<debug> [...3.0484] device[...] (eth1): Activation: connection 'bond0.0' master deactivated
<debug> [...3.0484] device[...] (eth1): add_pending_action (2): 'queued-state-change-deactivating'
<debug> [...3.0484] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: queue state change
What happened is that eth1 has been activating. It was already enslaved
to a bond and was in an ip-config state when the bond was removed.
A change to "deactivating" state has been enqueued. But then this
happened:
<trace> [...3.0942] device[...] (eth1): ip4: check-state: state done => done, is_failed=0, is_pending=0,
is_started=0 temp_na=0, may-fail-4=1, may-fail-6=1; disabled4; manualip4=done; ignore6 manualip6=done
<trace> [...3.0942] device[...] (eth1): ip: check-state: (combined) state pending => done
<debug> [...3.0943] device[...] (eth1): ip: set (combined) state done (was pending, reason: check-ip-state)
<info> [...3.0943] device (eth1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
<debug> [...3.0943] device[...] (eth1): add_pending_action (3): 'in-state-change'
<debug> [...3.0943] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: clear queued state change
The IP config succeeded and the queued "deactivating" change was
overriden by the IP4 check result, prompting a change to "ip-check".
With the master still missing. Not good.
Let's terminate the appempts to check the IP state when we cancel the
activation, so that it doesn't override the enqueued state change.
Fixes-test: @bond_mode_8023ad
https://bugzilla.redhat.com/show_bug.cgi?id=2080928https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1245
Currently we call nm_device_update_dynamic_ip_setup() in
carrier_changed() every time the carrier goes up again and the device
is activating, to kick a restart of DHCP.
Since we process link events in a idle handler, it can happen that the
handler is called only once for different events; in particular
device_link_changed() might be called once for a link-down/link-up
sequence.
carrier_changed() is "level-triggered" - it cares only about the
current carrier state. nm_device_update_dynamic_ip_setup() should
instead be "edge-triggered" - invoked every time the link goes from
down to up. We have a mechanism for that in device_link_changed(), use
it.
Fixes-test: @ipv4_spurious_leftover_route
https://bugzilla.redhat.com/show_bug.cgi?id=2079406https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1250
l3cd instances must be removed from the old l3cfg before calling
_cleanup_ip_pre(). Otherwise, _cleanup_ip_pre() unregisters them from
the device, and later _dev_l3_register_l3cds(self, l3cfg_old, FALSE,
FALSE) does nothing because the device doesn't have any l3cd.
Previously the l3cds would linger in the l3cfg, keeping a reference to
it and causing a memory leak; the leak was not detected by valgrind
because the l3cfg was still referenced by the NMNetns.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
Fixes-test: @stable_mem_consumption2
https://bugzilla.redhat.com/show_bug.cgi?id=2083453https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1252
This was working for internal plugin in the past, but broken by l3cfg
rework with 1.36. Re-add it. Not it also works with dhclient. For other
plugins, it's not really working, because we can't decline.
Now NMDhcpClient does ACD (using NML3Cfg) and abstracts that from
the caller (NMDevice).
It is complicated. Because there is state involved, meaning, we need
to remember the current state for ACD and react on and handle a
multitude of events. Getting this right, is non-trivial.
What we want is that if ACD fails, we decline the lease (and don't use
it).
https://bugzilla.redhat.com/show_bug.cgi?id=1713380
I think the previous was technically correct in any case too.
Still change it, because I feel with union and struct initialization,
we should always explicitly pick one union member that we fully
initialize.