The "lease" mode is unusual, because it means to prefer the DUID
configuration from the DHCP plugin over the explicit configuration in
NetworkManager. It is only for the DHCPv6 DUID and not for the IPv4
client-id. It also is only special for the "dhclient" plugin, because
with the internal plugin, this always corresponds to a generated, stable
DUID.
Commit 58287cbcc0 ('core: rework IP configuration in NetworkManager
using layer 3 configuration') broke this. The commit refactored the code
to track the effective-client-id separately. Previously, the client-id which
was read from the dhclient lease, was overwriting NMDhcpClient.client_id. But
with the refactor, it broke because nm_dhcp_client_get_effective_client_id()
was never called.
Fix that.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit bea72c3d6d)
Note that there are no callers of nm_dhcp_client_get_effective_client_id(),
hence calling the setter had no effect. This is a bug, that we will fix
later.
But before fixing the bug, change how this works. Drop the get_duid() hook.
It's only confusing and backward.
We will keep the nm_dhcp_client_[gs]et_effective_client_id() functions.
They will be used later.
(cherry picked from commit 28d7f9b7c4)
The "effective-client-id" is handled wrongly. Step 1 to clean this up.
Note that NMDhcpClientPrivate.effective_client_id is only ever get/set
via the nm_dhcp_client_[gs]et_effective_client_id() functions.
Note that only a NMDhcpDhclient instance ever calls
nm_dhcp_client_set_effective_client_id().
Hence, for NMDhcpSystemd the effective-client-id is really just the DUID
from the config. Clean this up by not calling nm_dhcp_client_get_effective_client_id()
but use the config directly. There is no change in behavior here.
(cherry picked from commit 05ae48d64e)
The current implementation only checks that a device with name equal
to veth.peer exists and it has a parent device; it doesn't check that
its parent is actually the device we want to create. So for example,
if the profile specifies interface-name A and peer B, while in
platform we have a veth pair {B,C}, we'll skip the interface creation
and the device will remain without a ifindex, leading to a crash
later. Fix this by adding the missing check.
While at it, don't implement the check by inspecting NMDevices but
look directly at the platform cache; that seems more robust because
devices are often updated from platform events via idle handlers and
so the information there could be outdated.
Fixes: 07e0ab48d1 ('veth: drop iface peer check during create_and_realize()')
https://bugzilla.redhat.com/show_bug.cgi?id=2129829
(cherry picked from commit 50f738bde5)
For MACsec interfaces, kernel announces the parent ifindex in the
generic IFLA_LINK netlink attribute, which we save in
NMPlatformLink.parent. There is no need to have a dedicate member in
NMPlatformLnkMacsec.
The dedicate member was never set and during a restart of
NetworkManager the parent of the MACsec device could be unset leading
to a failed assertion:
act_stage2_config: assertion 'parent' failed
Fixes: 85103656e9 ('platform: add support for macsec links')
https://bugzilla.redhat.com/show_bug.cgi?id=2122564https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1481
(cherry picked from commit cf11884a85)
nm_utils_get_ipv6_interface_identifier() has non-obvious requirements on
the hardware address. If the caller passes a wrong length, it will
trigger an assertion or even cause out of bound read. This would mean
that the caller needs to carefully check the length. Such requirements
on the caller are wrong.
Also, in practice the hardware length comes from platform/kernel. We
don't want to trust that what kernel tells us always has the required
address length, so the caller would always have to double check before
calling the function.
Instead, handle unexpected address lengths.
Fixes: e2270040c0 ('core: use Interface Identifiers for IPv6 SLAAC addresses')
Fixes: 1d396e9972 ('core-utils: use 64-bit WPAN address for a 6LoWPAN IID')
(cherry picked from commit 5d86db699b)
For link type NM_LINK_TYPE_6LOWPAN, nm_utils_get_ipv6_interface_identifier()
expects 8 bytes hardware address. It even just accesses the buffer
without checking (that needs to be fixed too).
For 6lowpan devices, the caller might construct a fake ethernet MAC
address, which is only 6 bytes long. So wrong.
Fixes: 49844ea55f ('device: generate pseudo 48-bit address from the WPAN short one')
(cherry picked from commit 53d1d8ba91)
The fields "l3cfg" and "l3cfg_" are union aliases. One of them is const,
the other is not. The idea is that all places that modify the field need
to use the special name "l3cfg_", and grepping for that will lead you to
all the relevant places.
This mistake happened, because g_clear_object() casts constness away.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit 8cb739031d)
"connection" variable might be NULL, which fails an assertion in
g_dbus_connection_flush_sync(). Consequently, "error_flush" is also
NULL which leads to a crash of "nm-dhcp-helper".
Reported-by: Jules Maselbas <jmaselbas@zdiv.net>
Fixes: 240ec7f891 ('dhcp: implement ACD (address collision detection) for DHCPv4')
(cherry picked from commit 37e130232d)
See wpa_supplicant commit [1]:
macsec: Make pre-shared CKN variable length
IEEE Std 802.1X-2010, 9.3.1 defines following restrictions for
CKN:
"MKA places no restriction on the format of the CKN, save that it
comprise an integral number of octets, between 1 and 32
(inclusive), and that all potential members of the CA use the same
CKN. No further constraints are placed on the CKNs used with PSKs,
..."
Hence do not require a 32 octet long CKN but instead allow a
shorter CKN to be configured.
This fixes interoperability with some Aruba switches, that do not
accept a 32 octet long CKN (only support shorter ones).
[1] https://w1.fi/cgit/hostap/commit/?id=b678ed1efc50e8da4638d962f8eac13312a4048f
(cherry picked from commit df999d1fca)
When called with update_carrier=TRUE, nm_device_bring_up_full() checks
for carrier changes and it may queue a transition to DISCONNECTED
through the following call chain:
-> nm_device_bring_up_full()
-> nm_device_set_carrier_from_platform()
-> nm_device_set_carrier()
-> carrier_changed()
-> nm_device_queue_state()
In _set_state_full(state=UNAVAILABLE) after bringing the interface up
we also call nm_device_cleanup() which clears the enqueued state
change to DISCONNECTED. When this happens, the device remains in
UNAVAILABLE and never gets activated even if it was ready.
This was observed with macsec interfaces, but in theory can happen
with all those interfaces that get carrier immediately after being
brought up.
Avoid this issue by not checking the carrier synchronously from
_set_state_full(). The carrier change event will be processed in the
next asynchronous invocation of device_link_changed().
https://bugzilla.redhat.com/show_bug.cgi?id=2122564
(cherry picked from commit 07bc5121a7)
In some situations we need to avoid updating the carrier status
synchronously from nm_device_bring_up_full(). Add a flag for that.
(cherry picked from commit 9fd9eaf276)
In the next commit nm_device_bring_up() will be extended with a new
argument. Most callers just want to bring up the device synchronously
and don't care about the "no_firmware" argument. Introduce a
nm_device_bring_up_full() for callers that need special behavior.
(cherry picked from commit 861934a510)
Call teamdctl_port_config_update_raw() when we're attaching a port even
if all of team-slave setting properties are default.
This is done to ensure teamd "knows" about the port (that is,
"teamdctl ... port present" returns success) when we're done activating
the slave connection. It will pick it up anyway from netlink, but that
can happen after the activation is done, resulting in a possible race.
Fixes-test: @remove_active_team_profile
https://bugzilla.redhat.com/show_bug.cgi?id=2102375https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1421
(cherry picked from commit 6897f6e6df)
If teamd crashes, we restore it. That's very nice, but if it really
crashed then it left ports attached and the slave connections are not
going to fail and the port configuration (e.g. priority or link watcher) in
teamd's memory will be gone.
This will restore the port configuration when the teamd connection is
re-established. This probably also fixes a race where a slave connection
would be enslaved (only possible externally and manually?) while we
didn't establish a connection to teamd yet. We'll just send the port
configuration in once're connected.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1361
(cherry picked from commit f3327835c1)
When creating NMPlatformRoutingRule from NMIPRouteRule object, the
protocol is being set to RTPROT_UNSPEC. According to linux kernel
documentation FRA_PROTOCOL indicates the originator of the rule.
In this case the route rule is coming from a connection and therefore
the originator of the rule is the user. The correct value is
RTPROT_STATIC which means the rule is installed by the administrator.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1457
Fixes: 3f9347745b ('core: add handling of IP routing rules to NMDevice')
(cherry picked from commit 82009e21d2)
NMSettingWired does not reject invalid flags. Filter them out in wake_on_lan_enable().
In practice, it makes no difference, the unknown flags were ignored anyway.
(cherry picked from commit c593834842)
If there were any pause options and any non-pause options,
the created setting was invalid.
I don't think it's reasonably possible to parse the broken settings.
So there is no workaround trying to read the existing broken settings
from disk. Luckily, the broken setting was just silently ignored by
the parser, so you simply could not persist certain settings.
https://bugzilla.redhat.com/show_bug.cgi?id=2134569
Fixes: 652ddca04c ('ethtool: Introducing PAUSE support')
(cherry picked from commit 21661c6f71)
[thaller@redhat.com: modified original patch.]
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit 61e99ed715)
Currently, when performing DNS resolution with systemd-resolved,
NetworkManager tells systemd-resolved to consider only DNS configuration
for the network interface that the connectivity check request will be
routed through. But this is not correct because DNS and routing are
configured entirely separately. For example, say we have a VPN that
receives all DNS but only a subset of routing. NetworkManager will
configure systemd-resolved with no DNS servers on any interface except
for the VPN interface, but will still route traffic through other
interfaces. This is entirely legitimate and works fine in practice,
except for the connectivity check.
To fix this, we just drop the restriction and allow systemd-resolved to
consider its full configuration, which is what gets used normally
anyway. This allows our connectivity check to match the real
configuration instead of failing spuriously.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1107https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1415
(cherry picked from commit e6dac4f0b6)
For connections with multi-connect property set to "multiple", the
autoconnect-retries should be tracked per device and not per connection.
That means, if autoconnect-retries is set to 2, each device using that
connection should retry to autoconnect 2 times.
The device autoconnect retries is -2 by default. This is a special
value, in NMPolicy context, if the connection used is multi-connect the
device value will be set to match the connection retries. Each time the
device picks a different connection, it will reset the device
autoconnect retries to -2 and if needed, sync. with the connection
retries.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1387https://bugzilla.redhat.com/show_bug.cgi?id=2039734
(cherry picked from commit 1656d82045)
At startup, we remove from ovsdb any existing interface created by NM
and later an interface with the same name might be readded. This can
cause race conditions. Consider this series of events:
1. at startup NM removes the entry from ovsdb;
2. ovsdb reports success;
3. NM inserts an interface with the same name again;
4. ovs-vswitch monitors ovsdb changes, and gets events for removal and
insertion. Depending on how those events are split in different
batches, it might decide:
4a. to delete the link and add it back, or
4b. to keep the existing link because the delete and insertion
cancel out each other.
When NM sees the link staying in platform, it doesn't know if it's
because of 4b or because 4a will happen eventually.
To avoid this ambiguity, after ovsdb reports the successful deletion
NM should also wait that the link disappears from platform.
Unfortunately, this means that ovsdb gets a dependency to the platform
code.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1386
(cherry picked from commit 4f60fe293c)
In nm_dns_manager_set_ip_config() we try to avoid calling update_dns()
unless something changes, because updating DNS is expensive and can
trigger other actions such as a new hostname resolution.
When we add a new ip_data, even if the new element is equivalent to
the old one that was removed, we need to sort the list again.
Fixes: ce0a36d20f ('dns: better track l3cd changes')
https://bugzilla.redhat.com/show_bug.cgi?id=2098574
(cherry picked from commit 3cc7801779)
The dhclient plugin already supports sending a decline when IPv4 ACD
fails. Also implement support for IPv6 DAD.
See-also: 156d84217c ("dhcp/dhclient: implement accept/decline (ACD) for dhclient plugin")
(cherry picked from commit e4aefbc556)
Currently we accept the DHCPv6 just after addresses are configured on
kernel, without waiting DAD result. Instead, wait that DAD completes
and decline the lease if all addresses are detected as duplicate.
Note that when an address has non-infinite lifetime and fails DAD,
kernel removes it automatically. With iproute2 we see something like:
602: testX6 inet6 2620:🔢5678/128 scope global tentative dynamic noprefixroute
valid_lft 7500sec preferred_lft 7200sec
Deleted 602: testX6 inet6 2620:🔢5678/128 scope global dadfailed tentative dynamic noprefixroute
valid_lft 7500sec preferred_lft 7200sec
Since the address gets removed from the platform cache, at the moment
we don't have a way to check the flags of the removal
message. Therefore, we assume that any address that goes away in
tentative state was detected as duplicate.
https://bugzilla.redhat.com/show_bug.cgi?id=2096386
(cherry picked from commit a7eb77260a)
This partly reverts 1fe8166fc9 ('device: only deactivate when the master
we've enslaved to goes away').
If the controller fails while the port is not yet fully attached,
before this patch the following happened:
<info> [1664299566.1065] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299566.1073] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299566.1073] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (not enslaved) (configure)
<debug> [1664299566.1073] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<info> [1664299566.1080] device (eth1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Note that now eth1 has no controller, but it lingers in "ip-config" state indefinitely.
If we look at a case where the port is already attached we see:
<info> [1664299540.9661] device (bond0): state change: secondaries -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299540.9667] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299540.9667] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (enslaved) (configure)
<debug> [1664299540.9667] platform: (eth1) link: releasing 10 from master 'bond0' (80)
...
<info> [1664299540.9740] device (bond0): detached bond port eth1
...
<debug> [1664299540.9749] device[a9f10ea824bb1725] (eth1): Activation: connection 'eth1' master failed
...
<warn> [1664299540.9749] device (eth1): queue-state[secondaries, reason:none, id:520]: replace previously queued state change
...
<debug> [1664299540.9750] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: queue state change
<debug> [1664299540.9751] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664299541.0201] device[a9f10ea824bb1725] (eth1): enslaved to unknown device 0 (??)
...
<debug> [1664299541.0227] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: change state
<info> [1664299541.0228] device (eth1): state change: ip-check -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Fix that by not ignoring the nm_device_slave_notify_release() call. Now we get:
<info> [1664391684.9757] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<debug> [1664391684.9759] active-connection[69c2b12d61f5b171]: set state deactivated (was activating)
<debug> [1664391684.9760] active-connection[142bb8240f6a696d]: check-master-ready: already signalled (state activating, master 0x56116f1480a0 is in state deactivated)
...
<debug> [1664391684.9762] manager: ActivatingConnection now (none)
...
<warn> [1664391684.9763] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664391684.9763] device[142828814dec6e26] (bond0): master: release one slave 720791275fe8a68c/eth1 (not enslaved) (configure)
<debug> [1664391684.9763] device[720791275fe8a68c] (eth1): Activation: connection 'eth1' master failed
...
<debug> [1664391684.9764] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: queue state change
<debug> [1664391684.9765] device[720791275fe8a68c] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664391684.9797] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: change state
<info> [1664391684.9797] device (eth1): state change: config -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Commit 1fe8166fc9 ('device: only deactivate when the master we've
enslaved to goes away') added the "return", but it seems to also add it
in cases where we need to handle this. Restrict the return to cases if
we do "no-config".
https://bugzilla.redhat.com/show_bug.cgi?id=2130287
Fixes: 1fe8166fc9 ('device: only deactivate when the master we've enslaved to goes away')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1406
(cherry picked from commit 2be9c693d9)
Previously, we used _nm_utils_ascii_str_to_bool(). That can accept any
kind of input (like "true"), so one might think that this is better to
use on user-input. However, NMSettingBond already validates the these
options are integers (either "0" or "1"). So a value like "true"
could never be here.
Use _nm_setting_bond_opt_value_as_intbool() because that asserts that
the option if of the expected type (integer).
(cherry picked from commit b1a72d0f21)
Sync/blocking methods are ugly. Their name should highlight this.
Also, we may have an async variant, so we will need the "good" name
for apply() and apply_finish().
(cherry picked from commit dc66fb7d04)