This setting allows the user to remove the local route rule that is
autogenerated for both IPv4 and IPv6. By default, NetworkManager won't
touch the local route rule.
(cherry picked from commit d2ca44ffc6)
A tentative IPv6 address can still fail DAD, so don't use it to
resolve the hostname via DNS. Furthermore, tentative addresses can't
be used to contact the nameserver and so the resolution will fail if
there is no other valid IPv6 address. Wait that the address becomes
non-tentative.
(cherry picked from commit 4138be6a5a)
Improve logging:
- log only when something changes
- print the new resolver state, instead of the old one
- rename state "in-progress" to "started"
- log when the resolver state is reset due to DNS changes
(cherry picked from commit 7037aa66c6)
This is also the format that we will use to expose it in the lease
information. It's the format that dhclient uses.
(cherry picked from commit 2fe4313b92)
There should be one function for parsing the string. Use it everywhere.
Also, because we will accept specifying the IAID as hex string so the
same parsing code should be used everywhere.
(cherry picked from commit 69106d0aef)
When a software device is deactivated, normally we schedule a idle
task to unrealize the device (delete_on_deactivate). However, if a new
activation is enqueued on the same device (and that implies that the
new profile is compatible with the device), then the idle task is not
scheduled and the device will normally transition to the different
states (disconnected, prepare, config, etc.).
For ovs-interfaces, we remove the db entry on disconnect and that
makes the link go away; however, we don't clear the hw_addr* fields of
the device struct.
When the new link appears, we try to set the new cloned MAC but the
stale hw_addr field indicates that it's already set. Avoid this
problem by updating the address as soon as the link appears.
https://bugzilla.redhat.com/show_bug.cgi?id=2168477https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1532
(cherry picked from commit d403ac3d40)
Trigger a dispatcher event when a connection is reapplied on a NM device.
Some devices such as phones have already a DHCP client running for accepting
connections when they are plugged into USB to transfer data over SSH.
When NetworkManager switches the connection IP method to shared,
it spawns a dnsmasq process to handle DHCP and DNS for that connection.
However, a dispatcher event is needed to disable the external DHCP server
for these USB connections as NetworkManager's dnsmasq handles them now.
Moreover, when the connection method is switched to a different mode,
the external DHCP server needs to be spawned again to make sure that
SSH connections are still possible to the device.
To achieve this, add a new NetworkManager Dispatcher event
'reapply' which is triggered when a connection is reapplied on a NM
device. This way, a dispatcher script can handle the case above by
inspecting the IP method in the dispatcher script.
(cherry picked from commit cef880c66f)
There are two callers of available_connections_add(). One from
cp_connection_added_or_updated() (which is when a connection
gets added/modified) and one from nm_device_recheck_available_connections().
They both call first nm_device_check_connection_available() to see
whether the profile is available on the device. They certainly
need to pass the same check flags, otherwise a profile might
be available in some cases, and not in others.
I didn't actually test this, but I think this could result
in a profile wrongly not being listed as an available-connection.
Moreover, that might mean, that `nmcli connection up $PROFILE`
might work to find the device/profile, but `nmcli device up $DEVICE`
couldn't find the suitable profile (because the latter calls
nm_device_get_best_connection(), which iterates the
available-connections). I didn't test this, because regardless of
that, it seems obvious that the conditions for when we call
available_connections_add() must be the same from both places.
So the only question is what is the right condition, and it would
seem that _NM_DEVICE_CHECK_CON_AVAILABLE_FOR_USER_REQUEST is the right
flag.
Fixes: 02dbe670ca ('device: for available connections check whether they are available for user-request')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1496
When the connection setting changes at the first place, then calling
the device reapply, the ip address got temporarily removed when DHCP
restarted. To avoid the ip address got temporarily removed, we should
preserve the previous lease and keep using it until the new lease comes
along.
With multi-connect enabled, this can cause infinite retries to autoconnect,
see [1].
That has bad consequences for example in initrd, where
nm-wait-online-initrd.service would wait up to one hour before failing
and blocking boot.
This reverts commit 1656d82045.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2039734#c5
Fixes: 1656d82045 ('policy: track the autoconnect retries in devices for multi-connect')
Reapply() is supposed to make sure that the system (the interface)
is configured as indicated by the applied-connection. That means,
it will remove/add configuration to make the system match the requested
configuration.
Add a flag "preserve-external-ip" which relaxes this. During reapply,
IP addresses/routes that exist on the interface and which are not known
(or added) by NetworkManager will be left alone.
This will be used by nm-cloud-setup, so that it can reconfigure the
interface in a less destructive way, which does not conflict with
external `ip addr/route` calls.
Note that the previous commit just adds "VersionInfo" and the
possibility to expose capabilities (patch-level). This is not used
for the new reapply flag, because, while we might backport the
reapply flag, we won't backport the "VersionInfo" property. Exposing
new capabilities via the "VersionInfo" property will only become useful
in the future, where we can backport a capability to older NM versions
(but those that have "VersionInfo" too).
Changing an error code is an API change. But, so far no flags existed,
so it's unlikely that somebody would send invalid flags or care about
the return code.
For link type NM_LINK_TYPE_6LOWPAN, nm_utils_get_ipv6_interface_identifier()
expects 8 bytes hardware address. It even just accesses the buffer
without checking (that needs to be fixed too).
For 6lowpan devices, the caller might construct a fake ethernet MAC
address, which is only 6 bytes long. So wrong.
Fixes: 49844ea55f ('device: generate pseudo 48-bit address from the WPAN short one')
The fields "l3cfg" and "l3cfg_" are union aliases. One of them is const,
the other is not. The idea is that all places that modify the field need
to use the special name "l3cfg_", and grepping for that will lead you to
all the relevant places.
This mistake happened, because g_clear_object() casts constness away.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
In some scenarios, autoconnect should not be blocked if the device is
activated on the external connection (e.g. autoconnect on the loopback
device).
Adding the `allow_autoconnect_on_external` flag to support such
behavior.
Support managing the loopback interface through NM as the users want to
set the proper mtu for loopback interface when forwarding the packets.
Additionally, the IP addresses, DNS, route and routing rules are also
allowed to configure for the loopback connection profiles.
https://bugzilla.redhat.com/show_bug.cgi?id=2060905
When called with update_carrier=TRUE, nm_device_bring_up_full() checks
for carrier changes and it may queue a transition to DISCONNECTED
through the following call chain:
-> nm_device_bring_up_full()
-> nm_device_set_carrier_from_platform()
-> nm_device_set_carrier()
-> carrier_changed()
-> nm_device_queue_state()
In _set_state_full(state=UNAVAILABLE) after bringing the interface up
we also call nm_device_cleanup() which clears the enqueued state
change to DISCONNECTED. When this happens, the device remains in
UNAVAILABLE and never gets activated even if it was ready.
This was observed with macsec interfaces, but in theory can happen
with all those interfaces that get carrier immediately after being
brought up.
Avoid this issue by not checking the carrier synchronously from
_set_state_full(). The carrier change event will be processed in the
next asynchronous invocation of device_link_changed().
https://bugzilla.redhat.com/show_bug.cgi?id=2122564
In the next commit nm_device_bring_up() will be extended with a new
argument. Most callers just want to bring up the device synchronously
and don't care about the "no_firmware" argument. Introduce a
nm_device_bring_up_full() for callers that need special behavior.
The DNS name can now also contain the DoT server name. It's not longer a
binary IP address only.
Extend NML3ConfigData to account for that. To track the additional
data, use the string representation. The alternative to have a separate
type that contains the parsed information would be cumbersome too.
The "device ... not available because device is strictly unmanaged" is
almost certainly the least interesting of the reasons why connection
can't be activated on a device.
Invent a new error level for it and demote it.
Before:
Error: Connection activation failed: No suitable device found
for this connection (device lo not available because
device is strictly unmanaged).
After
Error: Connection activation failed: No suitable device found
for this connection (device eth0 not available because
profile is not compatible with device (...)).
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1433
For connections with multi-connect property set to "multiple", the
autoconnect-retries should be tracked per device and not per connection.
That means, if autoconnect-retries is set to 2, each device using that
connection should retry to autoconnect 2 times.
The device autoconnect retries is -2 by default. This is a special
value, in NMPolicy context, if the connection used is multi-connect the
device value will be set to match the connection retries. Each time the
device picks a different connection, it will reset the device
autoconnect retries to -2 and if needed, sync. with the connection
retries.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1387https://bugzilla.redhat.com/show_bug.cgi?id=2039734
This partly reverts 1fe8166fc9 ('device: only deactivate when the master
we've enslaved to goes away').
If the controller fails while the port is not yet fully attached,
before this patch the following happened:
<info> [1664299566.1065] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299566.1073] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299566.1073] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (not enslaved) (configure)
<debug> [1664299566.1073] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<info> [1664299566.1080] device (eth1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Note that now eth1 has no controller, but it lingers in "ip-config" state indefinitely.
If we look at a case where the port is already attached we see:
<info> [1664299540.9661] device (bond0): state change: secondaries -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299540.9667] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299540.9667] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (enslaved) (configure)
<debug> [1664299540.9667] platform: (eth1) link: releasing 10 from master 'bond0' (80)
...
<info> [1664299540.9740] device (bond0): detached bond port eth1
...
<debug> [1664299540.9749] device[a9f10ea824bb1725] (eth1): Activation: connection 'eth1' master failed
...
<warn> [1664299540.9749] device (eth1): queue-state[secondaries, reason:none, id:520]: replace previously queued state change
...
<debug> [1664299540.9750] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: queue state change
<debug> [1664299540.9751] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664299541.0201] device[a9f10ea824bb1725] (eth1): enslaved to unknown device 0 (??)
...
<debug> [1664299541.0227] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: change state
<info> [1664299541.0228] device (eth1): state change: ip-check -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Fix that by not ignoring the nm_device_slave_notify_release() call. Now we get:
<info> [1664391684.9757] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<debug> [1664391684.9759] active-connection[69c2b12d61f5b171]: set state deactivated (was activating)
<debug> [1664391684.9760] active-connection[142bb8240f6a696d]: check-master-ready: already signalled (state activating, master 0x56116f1480a0 is in state deactivated)
...
<debug> [1664391684.9762] manager: ActivatingConnection now (none)
...
<warn> [1664391684.9763] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664391684.9763] device[142828814dec6e26] (bond0): master: release one slave 720791275fe8a68c/eth1 (not enslaved) (configure)
<debug> [1664391684.9763] device[720791275fe8a68c] (eth1): Activation: connection 'eth1' master failed
...
<debug> [1664391684.9764] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: queue state change
<debug> [1664391684.9765] device[720791275fe8a68c] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664391684.9797] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: change state
<info> [1664391684.9797] device (eth1): state change: config -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Commit 1fe8166fc9 ('device: only deactivate when the master we've
enslaved to goes away') added the "return", but it seems to also add it
in cases where we need to handle this. Restrict the return to cases if
we do "no-config".
https://bugzilla.redhat.com/show_bug.cgi?id=2130287
Fixes: 1fe8166fc9 ('device: only deactivate when the master we've enslaved to goes away')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1406
If we're notified of a master appearing, make sure there's actually an
ifindex we're waiting for.
Triger an assertion failure if that is not the case, cause that's pretty
messed up.
Since commit a1de6810df ('device: don't ignore external slave removals')
we don't leave device_recheck_slave_status() on un-eslaving (that is
plink->master = 0) early enough.
This results in hooking of NM_MANAGER_DEVICE_IFINDEX_CHANGED even
when we're not actually waiting for any master device to come up,
accompanied by a messed up log entry:
device[3fa7cfc200be4e84] (portXc): enslaved to unknown device 0 (??)
We also log nonsense when we see any device's link being removed:
device[a9a4b65bde851bcf] (br0): ifindex: set ifindex 0 (old-l3cfg: 05c6a4409f84d9d2)
device[45d34e95fb71cce0] (portXa): master br0 with ifindex 0 appeared
We don't do further damage afterwards, so this is purely a cosmetic
annoyance.
This is something that does happen.
Is that a bug? If so, this should not be a warning message but an
assertion failure. If it's not a bug, then this does not warrant warning
level, because the user wouldn't know what to do about this and it's
something that occasionally happens.
Granted, the state handling in NMDevice is complex, that it's unclear
whether this indicates a problem or not. In any case, having a warning
does only confuse the user.
We already have src/linux-headers, where we have complete copies of linux
user space headers. Of course that exists, because we want to use certain
features and don't depend on the installed kernel headers. Which works
well, because kernel user space API is stable, and we anyway want to
support compiling against a newer kernel and run against an older (e.g.
in a container). So having our copy of newer kernel headers is merely
as if we compiled against as newer kernel.
Add "src/nm-compat-headers" which has a similar purpose, but a different
approach. Instead of replacing the included header entirely, include
the system header and patch it with #define.
Use this for "linux/if_addr.h". Of course, the approach here is that we
no longer include <linux/if_addr.h> directly, but instead include
"nm-compat-headers/linux/if_addr.h".
Sync/blocking methods are ugly. Their name should highlight this.
Also, we may have an async variant, so we will need the "good" name
for apply() and apply_finish().
We've been outright ignoring master-slave checks if the link ended up
without a master since commit 2e22880894 ('device: don't remove the
device from master if its link has no master').
This was done to deal with OpenVSwitch port-interface relationship,
where the interface's platform link lacked an actual master in platform
(what matters there is the OVSDB entry), but the fix was too wide.
Let's limit the special case to devices whose were not enslaved to
masters that lack a platform link, which pretty much happens for
OpenVSwitch only.
Morale: Write better commit messages of future you is going to be upset
Fixes: 2e22880894 ('device: don't remove the device from master if its link has no master')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1358
The @dracut_NM_vlan_over_team_no_boot sometimes fails, among other
things, because it fails to assume an indicated connection after a
restart.
That seems to happen because after the decision to activate the
indicated connection, the device does not move from DISCONNECTED state
quickly enough. Another assumption recheck runs in between and decides
to generate a connection, because the assume state was already reset
in between.
First start, creates and activates b3a61b68-f744-4a4c-a513-61399c154a67
on vlan0017:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
...
settings: update[b3a61b68-f744-4a4c-a513-61399c154a67]: adding connection "vlan0017"
(45113870df0a4cfb/keyfile)
Second start:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(after a restart, asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
Assumption attempt successfully picks the right connection and thus
proceeds to reset the assume state:
manager: (vlan0017): assume: will attempt to assume matching connection 'vlan0017'
(b3a61b68-f744-4a4c-a513-61399c154a67) (indicated)
device[c7c5101cf0b73f5f] (vlan0017): assume-state: set guess-assume=0, connection=(null)
Everything great so far, activation of the right connection is enqueued
and the device moves away from unavailable state. However, the
activation can't proceed immediately:
device (vlan0017): state change: unmanaged -> unavailable
(reason 'connection-assumed', sys-iface-state: 'assume')
device (vlan0017): state change: unavailable -> disconnected
(reason 'connection-assumed', sys-iface-state: 'assume')
active-connection[0x55ba1162f1c0]: set device "vlan0017" [0x55ba1163c4f0]
device[c7c5101cf0b73f5f] (vlan0017): queue activation request waiting for carrier
Now another assumption attempt is done. The original assume state is
gone, so a connection is generated:
platform-linux: UDEV event: action 'add' subsys 'net' device 'vlan0017' (6); seqnum=1959
device[c7c5101cf0b73f5f] (vlan0017): queued link change for ifindex 6
manager: (vlan0017): assume: generated connection 'vlan0017' (57627119-8c20-4f9e-bf4d-4fc427b4a6a9)
keyfile: commit: 57627119-8c20-4f9e-bf4d-4fc427b4a6a9 (vlan0017) added as
"/run/NetworkManager/system-connections/vlan0017-57627119-8c20-4f9e-bf4d-4fc427b4a6a9.nmconnection"
(nm-generated,volatile,external)
I think this shouldn't have happened. We've picked the correct
connection already and it's enqueued for activation!
Change the check in nm_device_emit_recheck_assume() to also consider
any queued activation.
Fixes-test: @dracut_NM_vlan_over_team_no_boot
Co-authored-by: Lubomir Rintel <lkundrak@v3.sk>
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1351
If the MAC changes there is the possibility that the DHCP client will
not be able to renew the address because it uses the old MAC as
CHADDR. Depending on the implementation, the DHCP server might use
CHADDR (so, the old address) as the destination MAC for DHCP replies,
and those packets will be lost.
To avoid this problem, restart the DHCP client when the MAC changes.
https://bugzilla.redhat.com/show_bug.cgi?id=2110000