We've had a few rare instances where a modem stopped retrying
to autoconnect because it briefly didn't have an operator code.
This isn't a permanent failure, so we shouldn't abort completely
for it.
We must do different cleanups depending on the CleanupType. Document the
meaning of the different types as it was very confusing to work on new
code without having very clear what do they mean.
The flag is used for both sleeping and networking disabled conditions.
This is because internally they share logic, but it's not obvious for
users and it has caused confusion in the past when investigating why
devices didn't become managed. Make it explicit that it can be because
of either reason.
It would be better to create two separate flags, actually, and it
doesn't seem complex, but better not to risk introducing bugs for that
little benefit.
Logs before:
device (enp4s0): state change: disconnected -> unmanaged (reason 'unmanaged-sleeping' ...
Logs before:
device (enp4s0): state change: disconnected -> unmanaged (reason 'unmanaged-nm-disabled' ...
When we disable networking with `nmcli networking off` the reason that
is logged is "sleeping". Explain instead that networking is disabled.
Before:
device (lo): state change: activated -> deactivating (reason 'sleeping' ...
After:
device (lo): state change: activated -> deactivating (reason 'networking-off' ...
When we do `nmcli networking off` it's shown as state "sleeping". This
is confusing, and the only reason is that we share internally code to
handle both situations in a similar way.
Rename the state to the more generic name "disabled", situation that can
happen either because of sleeping or networking off.
Clients cannot differentiate the exact reason only with the NMState value,
but better that they show "network off" as this is the most common reason
that they will be able to display. If the system is suspending, there will
be only a short period of time that they can show the state, and showing
"network off" is not wrong because that's what NM has done as a response
to suspend.
In the logs, let's make explicit the exact reason why state is changing
to DISABLED: sleeping or networking off.
Logs before:
manager: disable requested (sleeping: no enabled: yes)
manager: NetworkManager state is now ASLEEP
Logs after:
manager: disable requested (sleeping: no enabled: yes)
manager: NetworkManager state is now DISABLED (NEWORKING OFF)
State before:
$ nmcli general
STATE ...
asleep ...
State after:
$ nmcli general
STATE ...
network off ...
When reading NetworkManager.conf and NetworkManager-intern.conf we might
need to know if a group is defined or not, even if it's empty. This is
the case, for example, for [global-dns]. If [global-dns] is defined in
NM.conf overwrites the config from NM-intern, and if it's defined in any
of them they overwrite the configs from connections.
Before this patch, defining it as an empty group was ignored:
```
[global-dns]
```
Instead, it was necessary to add at least one key-value to the group.
Otherwise the group was silently ignored.
```
[global-dns]
searches=
```
Keep empty groups so we can take better decissions about overwritting
configs from other sources.
Clients like nmstate needs to know if the [global-dns] section is
defined or not, so they know if DNS configs from connections are
relevant or not. Expose it in D-Bus by always exposing "searches"
and "options" if it's defined, maybe as empty lists.
According to the documentation, settings from [global-dns] (searches and
options) are always merged with those from connections. However this was
not happening if no [global-dns-domain-*] exists, in which case
connections were ignored. This happened because in the past both global
sections must de defined or undefined. When this was changed to allow
defining only [global-dns], allowing it in the function that generates
the resolv.conf file was forgotten. Fix that now.
Anyway, merging these configs doesn't make much sense. The searches and
options defined in connections probably make sense only for the nameservers
defined in that same connection.
Because of this, make the following change: if global nameservers are
defined, use searches and options from [global-dns] only, because those
defined in connections may not make sense for the global nameservers. If
[global-dns] is missing, assume an empty [global-dns] section.
Also, if no global nameservers are defined, but [global-dns] is, make
that it overwrites the searches and options defined in connections. This
is not ideal, but none of the alternatives is better and at least this
is easy to remember.
So, the resulting rules from above are:
- If [global-dns] is defined, it always overwrite searches and options
from connections.
- If [global-dns-domain-*] is defined, it always overwrite nameservers
from connections. It overwrites searches and options too.
Fixes: 1f0d1d78d2 ('dns-manager: always apply options from [global-dns]')
Fixes: f57a848da5 ('man: update documentation about global DNS configuration')
Since 1.44 we accept a global-dns section without any global-dns-domain
section, so users can define searches and options without defining any
global DNS servers.
When set from the D-Bus API it was still rejected. Fix it.
Fixes: 1f0d1d78d2 ('dns-manager: always apply options from [global-dns]')
NM_SETTING_BOND_OPTION_LACP_ACTIVE is flagged as BOND_OPTFLAG_IFDOWN in
the kernel and hence should not be in OPTIONS_REAPPLY_SUBSET.
Authored-by: Mohith Kumar Thummaluru <mohith.k.kumar.thummaluru@oracle.com>
Signed-off-by: Mohith Kumar Thummaluru <mohith.k.kumar.thummaluru@oracle.com>
Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
During disposal we're calling to remove_all_aps that in turns schedules
an auto-activate recheck. As the device is removed, this triggers an
assertion when trying to do the recheck.
Fix that by not scheduling the recheck.
Example of backtrace that this commits fix:
0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
1 0xf746e270 in __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=<optimized out>) at pthread_kill.c:43
2 0xf743fbc6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
3 0xf7431614 in __GI_abort () at abort.c:79
4 0xf775afea in g_assertion_message (domain=domain@entry=0x209a9f "nm", file=file@entry=0x1f7d59 "../NetworkManager-1.43.7/src/core/nm-policy.c", line=line@entry=1665,
func=func@entry=0x1f94d9 <__func__.6> "nm_policy_device_recheck_auto_activate_schedule",
message=message@entry=0x1d3e950 "assertion failed: (g_signal_handler_find(device, G_SIGNAL_MATCH_DATA, 0, 0, NULL, NULL, NM_POLICY_GET_PRIVATE(self)) != 0)")
at ../glib-2.72.3/glib/gtestutils.c:3253
5 0xf775b05e in g_assertion_message_expr (domain=0x209a9f "nm", file=0x1f7d59 "../NetworkManager-1.43.7/src/core/nm-policy.c", line=1665,
func=0x1f94d9 <__func__.6> "nm_policy_device_recheck_auto_activate_schedule",
expr=0x1f8afc "g_signal_handler_find(device, G_SIGNAL_MATCH_DATA, 0, 0, NULL, NULL, NM_POLICY_GET_PRIVATE(self)) != 0") at ../glib-2.72.3/glib/gtestutils.c:3279
6 0x0005f27a in nm_policy_device_recheck_auto_activate_schedule (self=0x1d3e950, device=0x209a9f) at ../NetworkManager-1.43.7/src/core/nm-policy.c:1679
7 0x000548ae in nm_manager_device_recheck_auto_activate_schedule (self=<optimized out>, device=<optimized out>) at ../NetworkManager-1.43.7/src/core/nm-manager.c:3113
8 0x00070622 in nm_device_recheck_auto_activate_schedule (self=<optimized out>) at ../NetworkManager-1.43.7/src/core/devices/nm-device.c:9249
9 0xf693aa8c in ap_add_remove (self=self@entry=0x1ceb0b0, is_adding=0, ap=<optimized out>, recheck_available_connections=0)
at ../NetworkManager-1.43.7/src/core/devices/wifi/nm-device-wifi.c:846
10 0xf693bcda in remove_all_aps (self=self@entry=0x1ceb0b0) at ../NetworkManager-1.43.7/src/core/devices/wifi/nm-device-wifi.c:863
11 0xf693f83c in dispose (object=0x1ceb0b0) at ../NetworkManager-1.43.7/src/core/devices/wifi/nm-device-wifi.c:3809
12 0xf7806e72 in g_object_unref (_object=<optimized out>) at ../glib-2.72.3/gobject/gobject.c:3636
13 g_object_unref (_object=0x1ceb0b0) at ../glib-2.72.3/gobject/gobject.c:3553
14 0x000f7fa4 in _nm_dbus_object_clear_and_unexport (location=location@entry=0xffa50644) at ../NetworkManager-1.43.7/src/core/nm-dbus-object.c:203
15 0x000576e4 in remove_device (self=self@entry=0x1c9c900, device=<optimized out>, quitting=quitting@entry=1) at ../NetworkManager-1.43.7/src/core/nm-manager.c:2289
16 0x0005a864 in nm_manager_stop (self=self@entry=0x1c9c900) at ../NetworkManager-1.43.7/src/core/nm-manager.c:7784
17 0x00023438 in main (argc=<optimized out>, argv=<optimized out>) at ../NetworkManager-1.43.7/src/core/main.c:530
Fixes: 96f40dcdcd ('wifi/ap: explicitly unexport AP and refactor add/remove AP')
Fixes: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1791
When a software device becomes deactivated, we check whether it can
be unrealized (= deleted in kernel), by calling function
delete_on_deactivate_check_and_schedule().
The function returns without doing anything if there is a new
activation enqueued on the device (priv->queued_act_request), because
in that case the device will be reused for the next activation.
This commit fixes a problem seen in NMCI test
@ovs_delete_connecting_interface: sometimes the device is not
unrealized after deleting the connection. That happens because if the
queued activation fails, we never try again to unrealize the device.
Fix that by calling delete_on_deactivate_check_and_schedule() when
there is a failure starting the queued activation.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2258
Unrealized software devices are always available for activation,
hardware devices never.
In nm_manager_get_best_device_for_activation() we call
nm_device_is_available() on candidate devices. Without this fix, any
unrealized software device would be not considered ready for
activation, which is wrong.
A software device can override the default implementation of
is_available(). For example NMDeviceOvsInterface does that and only
checks the OVSDB is ready.
Fixes: ba86c208e0 ('Revert "core: prevent the activation of unavailable OVS interfaces only"')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2253
With this configuration:
[Interface]
...
Address = 172.16.110.116/28,172.16.111.21/28
[Peer]
...
AllowedIPs = 172.16.110.112/28
[Peer]
...
AllowedIPs = 172.16.111.16/28
NetworkManager currently creates the following routes
(1) 172.16.110.112/28 dev wg0 proto static scope link metric 50 <-- peer route
(2) 172.16.110.112/28 dev wg0 proto kernel scope link src 172.16.110.116 metric 50 <-- prefix route
(3) 172.16.111.16/28 dev wg0 proto static scope link metric 50 <-- peer route
(4) 172.16.111.16/28 dev wg0 proto kernel scope link src 172.16.111.21 metric 50 <-- prefix route
If we try to reach a host in the second peer subnet, route (4)
matches. Route (4) doesn't specify a source IP and so the kernel will
use the first IP set on the interface (172.16.110.116), which is the
wrong one.
# ip route get 172.16.111.17
172.16.111.17 dev wg0 src 172.16.110.116 uid 0
To fix this problem, if the AllowedIP subnet is already reachable on
the interface via the prefix route of a static IP address, we should
skip adding the peer route.
wg-quick does something similar here:
https://git.zx2c4.com/wireguard-tools/tree/src/wg-quick/linux.bash?h=v1.0.20250521#n177
The condition in wg-quick is a bit different because it checks that no
duplicate route exists on the interface. We can't do exactly the same
because in NMDeviceWireGuard we don't have visibility on all the
platform routes.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1790https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2254
Adds support for reapplying the `sriov.vfs` property. Note this
does not include `num_vfs`, as the configuration needs to be reset
and reconfigured from scratch in that case.
Previously, if an existing VF is modified (e.g. if we change the `trust`
flag), we reset all VF configurations, and started from scratch. But in
some cases, this is unnecessarily disruptive.
Resolves: https://issues.redhat.com/browse/RHEL-95844
After resuming from suspend, devices with wake-on-lan enabled are
temporarily set as unmanaged, and then managed again. At the beginning
of this process, an active device goes from state ACTIVATED to
UNMANAGED and is deconfigured via
"nm_device_cleanup(cleanup_type=CLEANUP_TYPE_DECONFIGURE)".
If the device is attached to a controller, the cleanup doesn't detach
it. Later when the device is managed again, NetworkManager tries to
create an assumed connection. Normally, this would fail because we
detect that the device is not configured. However, if there is a
controller-port relationship, the assumed connection generation
succeeds and the persistent connection doesn't go up.
As this is wrong, prevent the generation of the assumed connection by
detaching the port during a cleanup.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1766https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2242
The "notify::controller" signal must be emitted on the port, not on
the controller.
Fixes: 1f05526ed7 ('core: drop NMDevice master and introduce controller')
After ACD_WAIT_PROBING_EXTRA_TIME_MSEC has elapsed,
_l3_acd_data_timeout_schedule_probing_restart() keeps rescheduling the
timer with a zero interval, resulting in 100% CPU usage. This
continues until the probe is destroyed after
ACD_WAIT_PROBING_EXTRA_TIME2_MSEC.
When computing the interval, we need to use
(ACD_WAIT_PROBING_EXTRA_TIME_MSEC + ACD_WAIT_PROBING_EXTRA_TIME2_MSEC)
as the expiry time.
acd_data->probing_timestamp_msec indicates when the probing
started. It is used in different places to calculate the timeout for
certain operations. In particular, it is used to detect that the probe
creation took too long when handling the ACD_STATE_CHANGE_MODE_TIMEOUT
event.
If we reset this timestamp at every timer event, we'll never hit the
probe creation timeout. Therefore, the l3cfg will keep trying forever
to create the probe.
See: https://lists.freedesktop.org/archives/networkmanager/2025-July/000418.html
Fix this by not updating the timestamp during a timeout event.
Fixes: a09f9cc616 ('l3cfg: ensure the probing timeout is initialized on probe start')
When resolving the system hostname from DNS lookup, we use
nm_utils_validate_hostname() which checks that the result is a valid
hostname. A valid hostname is at most 64 characters on Linux. Anything
longer is discarded.
However, the reverse DNS lookup doesn't return a hostname, it returns
a DNS name. The DNS name can have multiple labels, each limited to 63
characters. The maximum length of the DNS name is 253 characters.
If the result is longer than 64 characters because it has multiple
labels, we should still accept it, provided that it is a valid DNS
name. Then when setting the hostname in the system, only the first
label will be kept.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2243
Resolves: https://issues.redhat.com/browse/RHEL-104357
Fix the following error seen when running the build_clean.sh script
with LTO disabled:
In file included from ../src/libnm-glib-aux/nm-default-glib.h:66,
from ../src/libnm-glib-aux/nm-default-glib-i18n-prog.h:13,
from ../src/core/nm-default-daemon.h:11,
from ../src/core/platform/tests/test-link.c:6:
In function ‘_nm_auto_freev’,
inlined from ‘test_link_get_bridge_fdb’ at ../src/core/platform/tests/test-link.c:2732:33:
../src/libnm-glib-aux/nm-macros-internal.h:166:8: error: ‘addrs’ may be used uninitialized [-Werror=maybe-uninitialized]
166 | if (*p) {
| ^
../src/core/platform/tests/test-link.c: In function ‘test_link_get_bridge_fdb’:
../src/core/platform/tests/test-link.c:2732:33: note: ‘addrs’ was declared here
2732 | nm_auto_freev NMEtherAddr **addrs;
| ^~~~~
cc1: all warnings being treated as errors
Fixes: 16ef33d380 ('bond-slb: fix memory leak')
Commit c5d1e35f99 ('device: support reapplying bridge-port VLANs')
didn't update can_reapply_change() to accept the "bridge-port.vlans"
property during a reapply. So, it was only possible to change the
bridge port VLANs by updating the "bridge.vlan-default-pvid" property
and doing a reapply. Fix that.
Fixes: c5d1e35f99 ('device: support reapplying bridge-port VLANs')
If the bridge default-pvid is zero, it means that the default PVID is
disabled. That is, the bridge PVID is not propagated to ports.
Currently NM tries to merge the existing bridge VLANs on the port with
the default PVID from the bridge, even when the PVID is zero. This
causes an error when setting the new VLAN list in the kernel, because
it rejects VLAN zero.
Skip the merge of the default PVID when zero.
Fixes: c5d1e35f99 ('device: support reapplying bridge-port VLANs')
If sendto() fails, the function returns and the remaining entries are
not deallocated. Use nm_auto_freev instead to free the array and the
pointer it contains.
Add a test to check that nm_auto_freev does the right thing on the
value returned by nm_linux_platform_get_bridge_fdb().
Fixes: 3f2f922dd9 ('bonding: send ARP announcement on bonding-slb link/carrier down')
Rename nm_linux_platform_get_link_fdb_table() to
nm_linux_platform_get_bridge_fdb(). The new name better indicates that
the function returns the bridge FDB entries.
The DHCP search list option (119) can use the "message compression"
algorithm specified in RFC 1035 section 4.1.4 to reduce the size of
the message in presence of subdomains that appear multiple times.
When using the compression a label starts with:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1 1| OFFSET |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
where the offset points to a previous domain.
Previously, the parsing code was taking the lower 6 bits of the first
byte, shifting them left 16 bits, and adding the next byte. Instead,
the shift should be of 8 bits.
The effect of this bug was that when the offset was greater than 255,
it was incorrectly parsed as a number larger than the message size,
and the parsing failed.
Note that while a single DHCP option can be at most 255 bytes, a DHCP
message can contain multiple instances of the same option. The
receiver must concatenate all the occurrences according to RFC 3396
and parse the resulting buffer.
Fixes: 6adade6f21 ('dhcp: add nettools dhcp4 client')
The validation of embedded NUL character was skipped due to the wrong
order of arguments to memchr(). Fix it.
Fixes: 4043f82790 ('lldp: cleanup converting binary LLDP fields to string')
Currently the bug is hidden because the macro is only called with
NM_SETTING_BOND_OPTION_ARP_IP_TARGET.
Fixes: 45c95e9314 ('device/bond: rework setting of arp_ip_target bond options')