When a software device becomes deactivated, we check whether it can
be unrealized (= deleted in kernel), by calling function
delete_on_deactivate_check_and_schedule().
The function returns without doing anything if there is a new
activation enqueued on the device (priv->queued_act_request), because
in that case the device will be reused for the next activation.
This commit fixes a problem seen in NMCI test
@ovs_delete_connecting_interface: sometimes the device is not
unrealized after deleting the connection. That happens because if the
queued activation fails, we never try again to unrealize the device.
Fix that by calling delete_on_deactivate_check_and_schedule() when
there is a failure starting the queued activation.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2258
(cherry picked from commit 0b03614b68)
(cherry picked from commit 1f23bb18ad)
Commit c5d1e35f99 ('device: support reapplying bridge-port VLANs')
didn't update can_reapply_change() to accept the "bridge-port.vlans"
property during a reapply. So, it was only possible to change the
bridge port VLANs by updating the "bridge.vlan-default-pvid" property
and doing a reapply. Fix that.
Fixes: c5d1e35f99 ('device: support reapplying bridge-port VLANs')
(cherry picked from commit 261fa8db33)
(cherry picked from commit c647c060d6)
If the bridge default-pvid is zero, it means that the default PVID is
disabled. That is, the bridge PVID is not propagated to ports.
Currently NM tries to merge the existing bridge VLANs on the port with
the default PVID from the bridge, even when the PVID is zero. This
causes an error when setting the new VLAN list in the kernel, because
it rejects VLAN zero.
Skip the merge of the default PVID when zero.
Fixes: c5d1e35f99 ('device: support reapplying bridge-port VLANs')
(cherry picked from commit bf79fbd678)
(cherry picked from commit 956f9ba365)
When using the netdev datapath, we wait that the tun link appears, we
call nm_device_set_ip_ifindex() (which also brings the link up) and
then we check that the link is ready, i.e. that udev has announced the
link and the MAC address is correct. After that, we schedule stage3
(ip-config).
In this, there is a race condition that occurs sometimes in NMCI test
ovs_datapath_type_netdev_with_cloned_mac. In rare conditions,
nm_device_set_ip_ifindex() bring the interface up but then ovs-vswitch
changes again the flags of the interface without IFF_UP. The result is
that the interface stays down, breaking communications.
To fix this, we need to always call nm_device_bring_up() after the tun
device is ready. The problem is that we can't do it in
_netdev_tun_link_cb() because that function is already invoked
synchronously from platform code.
Instead, simplify the handling of the netdev datapath. Every
"link-changed" event from platform is handled by
_netdev_tun_link_cb(), which always schedule a delayed function
_netdev_tun_link_cb_in_idle(). This function just assigns the
ip-ifindex to the device if missing, and starts stage3 if the link is
ready. While doing so, it also bring the interface up.
Fixes: 99a6c6eda6 ('ovs, dpdk: fix creating ovs-interface when the ovs-bridge is netdev')
https://issues.redhat.com/browse/RHEL-17358https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2218
(cherry picked from commit 46e0d2b4e4)
(cherry picked from commit dd0ca122e3)
Fix the following:
../src/core/nm-connectivity.c:958:1: warning: ‘check_platform_config’ defined but not used [-Wunused-function]
958 | check_platform_config(NMConnectivity *self,
| ^~~~~~~~~~~~~~~~~~~~~
Fixes: 91d447df19 ('device: don't start connectivity check on unconfigured devices')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2224
(cherry picked from commit 1253cbad5a)
(cherry picked from commit c1d94d7081)
A device has the "external-down" unmanaged flag when:
!is-created-by-nm AND (!is-up OR (!has-address AND !is-controller))
When the "is-up" or the "has-address" conditions change, we properly update
the unmanaged flag by calling _dev_unmanaged_check_external_down() in
_dev_l3_cfg_notify_cb(PLATFORM_CHANGE_ON_IDLE).
The "is-controller" condition changes when another link indicates the
current device as controller. We currently don't update the unmanaged flag
when that happens and so it's possible that the device stays unmanaged even
if it has a port. This can be easily reproduced by running this commands:
ip link add veth0 type veth peer name veth1
ip link add vrf0 type vrf table 10
ip link set vrf0 up
ip link set veth0 master vrf0
Sometimes, the device shows as "unmanaged" instead of "connected
(externally)".
Fix this by re-evaluating the "external-down" unmanaged flags on the
controller when a port is attached or detached.
Fixes: c3586ce01a ('device: consider a device with slaves configured')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2209
(cherry picked from commit fd3eccfb16)
When calling to nm_device_is_available, the device types that requires a
parent like VLAN or MACVLAN checks that their parent exists.
nm_device_is_available is a function to check if the device is available
to activate a connection, so it makes sense that if the parent is not
present it can't be activated.
However, this is wrong for 2 reasons:
1. Most of they are virtual devices that might be unrealized when
checking its availability. If they're unrealized, their parent hasn't
been set yet.
2. Even if they're realized, their current parent might not be the one
that is defined in the connection that is being activated.
This is causing that unrealized devices are not being activated as ports
because nm_manager_get_best_device_for_connection thinks that they are
not available.
Get rid of these checks for the parent in the is_available callbacks.
Fixes: ba86c208e0 ('Revert "core: prevent the activation of unavailable OVS interfaces only"')
Fixes: 774badb151 ('core: prevent the activation of unavailable devices')
(cherry picked from commit 94595332c4)
Make sure nm_device_update_dynamic_ip_setup is called every time a carrier was down before and the link is now up again.
Previously the dhcp lease was not renewed if the carrier went down and then up again quickly enough.
This led to cases where an old IP was retained even though the device was connected to a different network with a different DHCP server.
This commit introduces device_link_carrier_changed_down
Fixes: d6429d3ddb ('device: ensure DHCP is restarted every time the link goes up')
(cherry picked from commit 163c2574d8)
Detected by coverity, the ping_op pointers are used after being freed in
cleanup_ping_operations. Although calling to g_list_remove is probably
safe because it only needs the value of the pointer, not to dereference
it, better to follow best practices. One of the use after free was
actually an error because we dereference ping_op->log_domain.
Fixes: 658aef0fa1 ('connection: Support connection.ip-ping-addresses')
(cherry picked from commit ae7de5b353)
When a device in IPv6 shared mode obtains a prefix, it adds a new l3cd
of type L3_CONFIG_DATA_TYPE_PD_6 for that prefix. However, that l3cd
is never removed later and so the address lingers on the interface
even after the connection goes down. Remove the l3cd on cleanup.
(cherry picked from commit 4a8bedcd89)
When a WG connection is connecting to an IPv6 endpoint, configures a
default route, and firewalld is active with IPv6_rpfilter=yes, it never
handshakes and doesn't pass traffic. This is because firewalld has a
IPv6 reverse path filter which is discarding these packets.
Thus, we add some firewall rules whenever a WG connection is brought up
that ensure the conntrack mark and packet mark are copied over.
These rules are largely inspired by wg-quick:
https://git.zx2c4.com/wireguard-tools/tree/src/wg-quick/linux.bash?id=17c78d31c27a3c311a2ff42a881057753c6ef2a4#n221
(cherry picked from commit db557908a2)
Break the loop as soon as we've found the value.
Fixes: 19bed3121f ('ethtool: support Forward Error Correction(fec)')
(cherry picked from commit 245f0e0b35)
If we cannot get current FEC value probably we won't be able to set it a
few lines later. Also, if it fails to set, we try to use the value of
the old one that we tried to retrieve without success. In that case, the
variable old_fec_mode would be uninitialized. Fix it by returning early
if we cannot get the current value.
Fixes: 19bed3121f ('ethtool: support Forward Error Correction(fec)')
(cherry picked from commit cbdd0d9cca)
DBus errors were not properly handled after DBus calls and
that caused SIGSEGV. Now they are checked.
Fixes#1738
Fixes: b8714e86e4 ('dns: introduce configuration_serial support to the dnsconfd plugin')
(cherry picked from commit 4ad20787bb)
This is done so that AddAndActivate() will return sensible errors in a
future patch that makes it support creating virtual devices.
In effect, all errors are logged in one place, therefore the log levels
are different. I don't think we're losing anything of value by being
a little less verbose here.
(cherry picked from commit 45d82f720c)
The variable 'change' may be used uninitialized.
Fixes: 7acc66699a ('policy: always reset retries when unblocking children or ports')
(cherry picked from commit af6aca3527)
An IPv6 endpoint is not usable until the address is non-tentative. Add
a mechanism to wait until the address is ready.
(cherry picked from commit 227cd6307b)
Skip the configuration of the MPTCP endpoint when the address is in
the l3cd but is not yet configured in the platform. This typically
happens when IPv4 DAD is enabled and the address is being probed.
If we configure the endpoint without the address set, the kernel will
try to use the endpoint immediately but it will fail. Then, the
endpoint will not be used ever again after the address is added.
(cherry picked from commit 6bf859af79)
The name suggests that the function always removes all the watchers
with the given tag; instead it removes only "dirty" ones when the
"all" parameter is FALSE. Split the function in two variants.
(cherry picked from commit b6e67c6abc)
When Dnsconfd service is enabled but not started, NetworkManager
should attempt to start it through DBus at least once.
Fixes: c6e1925dec ('dns: Add dnsconfd DNS plugin')
(cherry picked from commit 1463b1c0a3)
Preventing the activation of unavailable devices for all device types is
too aggresive and leads to race conditions, e.g when a non-virtual bond
port gets a carrier, preventing the device to be a good candidate for
the connection.
Instead, enforce this check only on OVS interfaces as NetworkManager
just makes sure that ovsdb->ready is set to TRUE.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2139
Fixes: 774badb151 ('core: prevent the activation of unavailable devices')
(cherry picked from commit a1c05d2ce6)
When calling activate_port_or_children_connections() we are unblocking
the ports and children but we are not resetting the number of retries if
it is an internal activation.
This is wrong as even if it's an internal activation the number of
retries should be reset. It won't interferfe with other blocking reasons
like USER_REQUESTED or MISSING_SECRETS.
(cherry picked from commit 7acc66699a)
Stop passing "connection-*" entries in the update method to
dnsconfd. The plugin tries to determine the connection from the
ifindex, but it's not possible to do it right at the moment because
the same ifindex can be used at the same time e.g. by a policy-based
VPN like ipsec and a normal device. Instead, it should be NM that
explicitly passes the information about the connection to the DNS
plugin. Anyway, these variables are not used at the moment by
dnsconfd.
Fixes: c6e1925dec ('dns: Add dnsconfd DNS plugin')
(cherry picked from commit 4d84e6cddf)
After every state change of the plugin there should be an invocation
of _nm_dns_plugin_update_pending_maybe_changed() to re-evaluate
whether we are waiting for an update. send_dnsconfd_update() doesn't
change the state and so there is need to check again afterwards.
(cherry picked from commit 8ff1cbf38b)
When autoconnecting ports of a controller, we look for all candidate
(device,connection) tuples through the following call trace:
-> autoconnect_ports()
-> find_ports()
-> nm_manager_get_best_device_for_connection()
-> nm_device_check_connection_available()
-> _nm_device_check_connection_available()
The last function checks that a specific device is available to be
activated with the given connection. For virtual devices, it only
checks that the device is compatible with the connection based on the
device type and characteristics, without considering any live network
information.
For OVS interfaces, this doesn't work as expected. During startup, NM
performs a cleanup of the ovsdb to remove entries that were previously
added by NM. When the cleanup is terminated, NMOvsdb sets the "ready"
flag and is ready to start the activation of new OVS interfaces. With
the current mechanism, it is possible that a OVS-interface connection
gets activated via the autoconnect-ports mechanism without checking
the "ready" flag.
Fix that by also checking that the device is available for activation.
Rename "unavailable_devices" to "exclude_devices", as the
"unavailable" term has a specific, different meaning in NetworkManager
(i.e. the device is in the UNAVAILABLE state). Also, use
nm_g_hash_table_contains() when needed.
The current mess of code seems like a hodgepodge of complex ideas,
partially copied from systemd, but then subtly different, and it's a
mess. Let's simplify this drastically.
First, assume that getrandom() is always available. If the kernel is too
old, we have an unoptimized slowpath for still supporting ancient
kernels, a path that should be removed at some point. If getrandom()
isn't available and the fallback path doesn't work, the system has much
larger problems, so just crash. This should basically never happen.
getrandom() and having randomness available in general is a critical
system API that should be expected to be available on any functioning
system.
Second, assume that the rng is initialized, so that asking for random
numbers should never block. This is virtually always true on modern
kernels. On ancient kernels, it usually becomes true. But, more
importantly, this is not the responsibility of various daemons, even
ones that run at boot. Instead, this is something for the kernel and/or
init to ensure.
Putting these together, we adopt new behavior:
- First, try getrandom(..., ..., 0). The 0 flags field means that this
call will only return good random bytes, not insecure ones.
- If this fails for some reason that isn't ENOSYS, crash.
- If this fails due to ENOSYS, poll on /dev/random until 1 byte is
available, suggesting that subsequent reads from the rng will almost
have good random bytes. If this fails, crash. Then, read from
/dev/urandom. If this fails, crash.
We don't bother caching when getrandom() returns ENOSYS. We don't apply
any other fancy optimizations to the slow fallback path. We keep that as
barebones and minimal as we can. It works. It's for ancient kernels. It
should be removed soon. It's not worth spending cycles over. Instead,
the goal is to eventually reduce all of this down to a simple boring
call to getrandom(..., ..., 0).
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2127
Add the DNS routing rules explicitly instead of tracking them via the
NMGlobalTracker mechanism. Since we do not plan to ever remove them,
there is no reason to track the rules. Also, the current
implementation is buggy because in some situations the rules are
wrongly removed when they should not.
Fixes: bf3ecd9031 ('l3cfg: fix DNS routes')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2125
"configuration_serial" dbus property ensures that the plugin
can mark update 'not pending' when the update is trully finished.
This mechanism exists because of underlying problem of having
to restart, or perform similarly time consuming operation, to change
certain configuration parameters of resolver. If Dnsconfd would
block the update call until the update is finished, we could not
respond to any other requests until the call is finished.
Resolve-mode allows user to specify way how the global-dns domains
and DNS connection information should be merged and used.
Certification-authority allows user to specify certification
authority that should be used to verify certificates of encrypted
DNS servers.