If we're notified of a master appearing, make sure there's actually an
ifindex we're waiting for.
Triger an assertion failure if that is not the case, cause that's pretty
messed up.
Since commit a1de6810df ('device: don't ignore external slave removals')
we don't leave device_recheck_slave_status() on un-eslaving (that is
plink->master = 0) early enough.
This results in hooking of NM_MANAGER_DEVICE_IFINDEX_CHANGED even
when we're not actually waiting for any master device to come up,
accompanied by a messed up log entry:
device[3fa7cfc200be4e84] (portXc): enslaved to unknown device 0 (??)
We also log nonsense when we see any device's link being removed:
device[a9a4b65bde851bcf] (br0): ifindex: set ifindex 0 (old-l3cfg: 05c6a4409f84d9d2)
device[45d34e95fb71cce0] (portXa): master br0 with ifindex 0 appeared
We don't do further damage afterwards, so this is purely a cosmetic
annoyance.
This is something that does happen.
Is that a bug? If so, this should not be a warning message but an
assertion failure. If it's not a bug, then this does not warrant warning
level, because the user wouldn't know what to do about this and it's
something that occasionally happens.
Granted, the state handling in NMDevice is complex, that it's unclear
whether this indicates a problem or not. In any case, having a warning
does only confuse the user.
We already have src/linux-headers, where we have complete copies of linux
user space headers. Of course that exists, because we want to use certain
features and don't depend on the installed kernel headers. Which works
well, because kernel user space API is stable, and we anyway want to
support compiling against a newer kernel and run against an older (e.g.
in a container). So having our copy of newer kernel headers is merely
as if we compiled against as newer kernel.
Add "src/nm-compat-headers" which has a similar purpose, but a different
approach. Instead of replacing the included header entirely, include
the system header and patch it with #define.
Use this for "linux/if_addr.h". Of course, the approach here is that we
no longer include <linux/if_addr.h> directly, but instead include
"nm-compat-headers/linux/if_addr.h".
Sync/blocking methods are ugly. Their name should highlight this.
Also, we may have an async variant, so we will need the "good" name
for apply() and apply_finish().
Previously, we used _nm_utils_ascii_str_to_bool(). That can accept any
kind of input (like "true"), so one might think that this is better to
use on user-input. However, NMSettingBond already validates the these
options are integers (either "0" or "1"). So a value like "true"
could never be here.
Use _nm_setting_bond_opt_value_as_intbool() because that asserts that
the option if of the expected type (integer).
These variants provide additional nm_assert() checks, and are thus
preferable.
Note that we cannot just blindly replace &g_array_index() with
&nm_g_array_index(), because the latter would not allow getting a
pointer at index [arr->len]. That might be a valid (though uncommon)
usecase. The correct replacement of &g_array_index() is thus
nm_g_array_index_p().
I checked the code manually and replaced uses of nm_g_array_index_p()
with &nm_g_array_index(), if that was a safe thing to do. The latter
seems preferable, because it is familar to &g_array_index().
Bond option netlink support requires primary property to be a ifindex
instead of the interface name. This is a workaround for supporting
specifying a primary that does not exist yet.
```
nmcli con add type bond ifname mybond0 bond.options "mode=active-backup,primary=veth1"
Connection 'bond-mybond0' (38100ef9-11e2-4003-aff9-cb2d152ce34f) successfully added.
nmcli con add type ethernet ifname veth1 master mybond0
cat /sys/class/net/mybond0/bonding/primary
veth1
```
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1362
Fixes: e064eb9d13 ('bond: use netlink to set bond options')
We've been outright ignoring master-slave checks if the link ended up
without a master since commit 2e22880894 ('device: don't remove the
device from master if its link has no master').
This was done to deal with OpenVSwitch port-interface relationship,
where the interface's platform link lacked an actual master in platform
(what matters there is the OVSDB entry), but the fix was too wide.
Let's limit the special case to devices whose were not enslaved to
masters that lack a platform link, which pretty much happens for
OpenVSwitch only.
Morale: Write better commit messages of future you is going to be upset
Fixes: 2e22880894 ('device: don't remove the device from master if its link has no master')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1358
The @dracut_NM_vlan_over_team_no_boot sometimes fails, among other
things, because it fails to assume an indicated connection after a
restart.
That seems to happen because after the decision to activate the
indicated connection, the device does not move from DISCONNECTED state
quickly enough. Another assumption recheck runs in between and decides
to generate a connection, because the assume state was already reset
in between.
First start, creates and activates b3a61b68-f744-4a4c-a513-61399c154a67
on vlan0017:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
...
settings: update[b3a61b68-f744-4a4c-a513-61399c154a67]: adding connection "vlan0017"
(45113870df0a4cfb/keyfile)
Second start:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(after a restart, asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
Assumption attempt successfully picks the right connection and thus
proceeds to reset the assume state:
manager: (vlan0017): assume: will attempt to assume matching connection 'vlan0017'
(b3a61b68-f744-4a4c-a513-61399c154a67) (indicated)
device[c7c5101cf0b73f5f] (vlan0017): assume-state: set guess-assume=0, connection=(null)
Everything great so far, activation of the right connection is enqueued
and the device moves away from unavailable state. However, the
activation can't proceed immediately:
device (vlan0017): state change: unmanaged -> unavailable
(reason 'connection-assumed', sys-iface-state: 'assume')
device (vlan0017): state change: unavailable -> disconnected
(reason 'connection-assumed', sys-iface-state: 'assume')
active-connection[0x55ba1162f1c0]: set device "vlan0017" [0x55ba1163c4f0]
device[c7c5101cf0b73f5f] (vlan0017): queue activation request waiting for carrier
Now another assumption attempt is done. The original assume state is
gone, so a connection is generated:
platform-linux: UDEV event: action 'add' subsys 'net' device 'vlan0017' (6); seqnum=1959
device[c7c5101cf0b73f5f] (vlan0017): queued link change for ifindex 6
manager: (vlan0017): assume: generated connection 'vlan0017' (57627119-8c20-4f9e-bf4d-4fc427b4a6a9)
keyfile: commit: 57627119-8c20-4f9e-bf4d-4fc427b4a6a9 (vlan0017) added as
"/run/NetworkManager/system-connections/vlan0017-57627119-8c20-4f9e-bf4d-4fc427b4a6a9.nmconnection"
(nm-generated,volatile,external)
I think this shouldn't have happened. We've picked the correct
connection already and it's enqueued for activation!
Change the check in nm_device_emit_recheck_assume() to also consider
any queued activation.
Fixes-test: @dracut_NM_vlan_over_team_no_boot
Co-authored-by: Lubomir Rintel <lkundrak@v3.sk>
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1351
If teamd crashes, we restore it. That's very nice, but if it really
crashed then it left ports attached and the slave connections are not
going to fail and the port configuration (e.g. priority or link watcher) in
teamd's memory will be gone.
This will restore the port configuration when the teamd connection is
re-established. This probably also fixes a race where a slave connection
would be enslaved (only possible externally and manually?) while we
didn't establish a connection to teamd yet. We'll just send the port
configuration in once're connected.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1361
Add option to set ofport_request when configuring ovs interface. When
connection with ofport_request configured is activated ovsdb will first
try to activated on the port set by ofport_request.
The user might still want to see the scan list, to decide whether to
stop the hotspot/ADHOC connection and connect to something else.
Allow explicit scans.
If the MAC changes there is the possibility that the DHCP client will
not be able to renew the address because it uses the old MAC as
CHADDR. Depending on the implementation, the DHCP server might use
CHADDR (so, the old address) as the destination MAC for DHCP replies,
and those packets will be lost.
To avoid this problem, restart the DHCP client when the MAC changes.
https://bugzilla.redhat.com/show_bug.cgi?id=2110000
1) The "enabled-on-global-iface" flag was odd. Instead, have only
and "enabled" flag and skip (by default) endpoints on interface
that have no default route. With the new flag "also-without-default-route",
this can be overruled. So previous "enabled-on-global-default" now is
the same as "enabled", and "enabled" from before behaves now like
"enabled,also-without-default-route".
2) What was also odd, as that the fallback default value for the flags
depends on "/proc/sys/net/mptcp/enabled". There was not one fixed
fallback default, instead the used fallback value was either
"enabled-on-global-iface,subflow" or "disabled".
Usually that is not a problem (e.g. the default value for
"ipv6.ip6-privacy" also depends on use_tempaddr sysctl). In this case
it is a problem, because the mptcp-flags (for better or worse) encode
different things at the same time.
Consider that the mptcp-flags can also have their default configured in
"NetworkManager.conf", a user who wants to switch the address flags
could previously do:
[connection.mptcp]
connection.mptcp-flags=0x32 # enabled-on-global-iface,signal,subflow
but then the global toggle "/proc/sys/net/mptcp/enabled" was no longer
honored. That means, MPTCP handling was always on, even if the sysctl was
disabled. Now, "enabled" means that it's only enabled if the sysctl
is enabled too. Now the user could write to "NetworkManager.conf"
[connection.mptcp]
connection.mptcp-flags=0x32 # enabled,signal,subflow
and MPTCP handling would still be disabled unless the sysctl
is enabled.
There is now also a new flag "also-without-sysctl", so if you want
to really enable MPTCP handling regardless of the sysctl, you can.
The point of that might be, that we still can configure endpoints,
even if kernel won't do anything with them. Then you could just flip
the sysctl, and it would start working (as NetworkManager configured
the endpoints already).
Fixes: eb083eece5 ('all: add NMMptcpFlags and connection.mptcp-flags property')
- name things related to `in_addr_t`, `struct in6_addr`, `NMIPAddr` as
`nm_ip4_addr_*()`, `nm_ip6_addr_*()`, `nm_ip_addr_*()`, respectively.
- we have a wrapper `nm_inet_ntop()` for `inet_ntop()`. This name
of our wrapper is chosen to be familiar with the libc underlying
function. With this, also name functions that are about string
representations of addresses `nm_inet_*()`, `nm_inet4_*()`,
`nm_inet6_*()`. For example, `nm_inet_parse_str()`,
`nm_inet_is_normalized()`.
<<<<
R() {
git grep -l "$1" | xargs sed -i "s/\<$1\>/$2/g"
}
R NM_CMP_DIRECT_IN4ADDR_SAME_PREFIX NM_CMP_DIRECT_IP4_ADDR_SAME_PREFIX
R NM_CMP_DIRECT_IN6ADDR_SAME_PREFIX NM_CMP_DIRECT_IP6_ADDR_SAME_PREFIX
R NM_UTILS_INET_ADDRSTRLEN NM_INET_ADDRSTRLEN
R _nm_utils_inet4_ntop nm_inet4_ntop
R _nm_utils_inet6_ntop nm_inet6_ntop
R _nm_utils_ip4_get_default_prefix nm_ip4_addr_get_default_prefix
R _nm_utils_ip4_get_default_prefix0 nm_ip4_addr_get_default_prefix0
R _nm_utils_ip4_netmask_to_prefix nm_ip4_addr_netmask_to_prefix
R _nm_utils_ip4_prefix_to_netmask nm_ip4_addr_netmask_from_prefix
R nm_utils_inet4_ntop_dup nm_inet4_ntop_dup
R nm_utils_inet6_ntop_dup nm_inet6_ntop_dup
R nm_utils_inet_ntop nm_inet_ntop
R nm_utils_inet_ntop_dup nm_inet_ntop_dup
R nm_utils_ip4_address_clear_host_address nm_ip4_addr_clear_host_address
R nm_utils_ip4_address_is_link_local nm_ip4_addr_is_link_local
R nm_utils_ip4_address_is_loopback nm_ip4_addr_is_loopback
R nm_utils_ip4_address_is_zeronet nm_ip4_addr_is_zeronet
R nm_utils_ip4_address_same_prefix nm_ip4_addr_same_prefix
R nm_utils_ip4_address_same_prefix_cmp nm_ip4_addr_same_prefix_cmp
R nm_utils_ip6_address_clear_host_address nm_ip6_addr_clear_host_address
R nm_utils_ip6_address_same_prefix nm_ip6_addr_same_prefix
R nm_utils_ip6_address_same_prefix_cmp nm_ip6_addr_same_prefix_cmp
R nm_utils_ip6_is_ula nm_ip6_addr_is_ula
R nm_utils_ip_address_same_prefix nm_ip_addr_same_prefix
R nm_utils_ip_address_same_prefix_cmp nm_ip_addr_same_prefix_cmp
R nm_utils_ip_is_site_local nm_ip_addr_is_site_local
R nm_utils_ipaddr_is_normalized nm_inet_is_normalized
R nm_utils_ipaddr_is_valid nm_inet_is_valid
R nm_utils_ipx_address_clear_host_address nm_ip_addr_clear_host_address
R nm_utils_parse_inaddr nm_inet_parse_str
R nm_utils_parse_inaddr_bin nm_inet_parse_bin
R nm_utils_parse_inaddr_bin_full nm_inet_parse_bin_full
R nm_utils_parse_inaddr_prefix nm_inet_parse_with_prefix_str
R nm_utils_parse_inaddr_prefix_bin nm_inet_parse_with_prefix_bin
R test_nm_utils_ip6_address_same_prefix test_nm_ip_addr_same_prefix
./contrib/scripts/nm-code-format.sh -F
Try work around the issue documented by Emil Velikov in
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1264
When we mirror an 802.1x connection to an IWD config file and there's an
AP in range with matching SSID, that connection should become available
for activation. In IWD terms when an 802.1x network becomes a Known
Network, it can be connected to using the .Connect D-Bus method.
However there's a delay between writing the IWD config file and receiving
the InterfaceAdded event for the Known Network so we don't immediately
find out that the network can now be used. If an NM client creates a
new connection for an 802.1x AP and tries to activate it quickly enough,
NMDeviceIWD will not allow it to because it won't know the network is
known yet. To work around this, we save the SSIDs of 802.1x connections
we recently mirrored to IWD config files, for an arbitrary 2 seconds
period, and we treat them as Known Networks in that period since in
theory activations should succeed.
The alternative proposed in the !1264 is to drop NMDeviceIWD checks that
there's a Known Network for the 802.1x connection being activated since
IWD will eventually perform the same checks and IWD is the ultimate
authority on whether the profile is IWD-connectable.
The IWD backend would originally use .Disconnect() on IWD dbus "Station"
objects to make sure IWD is out of autoconnect or that it isn't
connecting to a network that NM didn't command. Later the default became
to let IWD run autoconnect so now most of the time the backend just
mirrors IWD's state to NMDevice's state.
Now sometimes when NMDevice still seems to have an active connection but
IWD has gone through one or more state changes (which we may see after a
delay due to D-Bus) and is now connected to or connecting to a different
network, NMDevice would first have to go through .deactivate to mirror
the fact the original connection is no longer active, and it'd use
.Disconnect() which could break the new connection, so check for this
situation.
When only one of those connection.{lldp,mdns,llmnr,dns-over-tls}
settings changes, we still need to do a full restart of the IP
configuration to reapply the changes.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
Heavily inspired by systemd ([1]).
We now also have nm_random_get_bytes{,_full}() and
nm_random_get_crypto_bytes(), like systemd's random_bytes()
and crypto_random_bytes(), respectively.
Differences:
- instead of systemd's random_bytes(), our nm_random_get_bytes_full()
also estimates whether the output is of high quality. The caller
may find that interesting. Due to that, we will first try to call
getrandom(GRND_NONBLOCK) before getrandom(GRND_INSECURE). That is
reversed from systemd's random_bytes(), because we want to find
out whether we can get good random numbers. In most cases, kernel
should have entropy already, and it makes no difference.
Otherwise, heavily rework the code. It should be easy to understand
and correct.
There is also a major bugfix here. Previously, if getrandom() failed
with ENOSYS and we fell back to /dev/urandom, we would assume that we
have high quality random numbers. That assumption is not warranted.
Now instead poll on /dev/random to find out.
[1] a268e7f402/src/basic/random-util.c (L81)
sysfs is deprecated and kernel people will not add new bond options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
The devices generally need to be IFF_UP and wait a little before the
carrier detection is reliable. Some devices, actually need to wait
more than a little -- r8169 needs up to 5 seconds.
For this reason, we delay startup complete while the carrier is down
after we bring the device up. We do this so that we don't reject
activations due to carrier down until we're sure it's really down.
This works well as long as it's us who brought the device up.
If we're restarting the daemon, the device is going to be already up
when we start up the daemon for the second time. There's, however, a
slim chance that the device was brought down and up very shortly before
the restart and therefore the carrier reporting is still not reliable.
As a matter of fact, we bring the devices down and back up on some
occassions, such as when enslaving to a team device.
Therefore, the following events in quick succession cause trouble:
# nmcli con up team-slave-eth0
[20099.205355] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.365641] nm-team: Port device eth0 added
[20099.370728] r8169 0000:03:00.0 eth0: Link is Down
[20099.436631] nm-team: Port device eth0 removed
[20099.463422] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.628505] r8169 0000:03:00.0 eth0: Link is Down
[20099.669425] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.833457] r8169 0000:03:00.0 eth0: Link is Down
[20099.838471] nm-team: Port device eth0 added
The device has been brought down, enslaved and brought up.
"Link is Down" indicates carrier not being detected.
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/7)
# systemctl restart NetworkManager
Now NM sees the device being up, but carrier down.
# nmcli con up testeth0
Error: Connection activation failed: No suitable device found for this connection (...).
Activation failed, because eth0 carrier still appears down.
# [20102.943464] r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Now it's up, but the party is already over. Shiet.
Let's wait whenever the device reaches unavailable state, whether we
bring it up at that point or not.
Fixes-test: @restart_L2_only_lacp
https://bugzilla.redhat.com/show_bug.cgi?id=2092361https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1316
Don't mix <net/ethernet.h> and <linux/if_ether.h>.
Fixes the following build error with musl libc:
In file included from /usr/include/net/ethernet.h:10,
from ../src/libnm-platform/nm-linux-platform.c:17:
/usr/include/netinet/if_ether.h:115:8: error: redefinition of 'struct ethhdr'
115 | struct ethhdr {
| ^~~~~~
In file included from ../src/linux-headers/ethtool.h:19,
from ../src/libnm-std-aux/nm-linux-compat.h:22,
from ../src/libnm-platform/nm-linux-platform.c:10:
/usr/include/linux/if_ether.h:169:8: note: originally defined here
169 | struct ethhdr {
| ^~~~~~
Fixes: dc98ab807c ('platform: include "linux-headers" via "libnm-std-aux/nm-linux-compat.h"')