When a bond in balance-slb is created, the ports are enabled or disabled
based on carrier and link state. If the link/carrier goes down, the port
becomes disabled and we must make sure the MAC tables of the switches
are updated properly so the traffic is redirected.
In order to solve this, we send a GARP or RARP broadcast packet on the
bond. This fix cover 3 different balance-slb scenarios.
Scenario 1: The bond in balance-slb mode has IPv4 address configured and
some ports connected. Here the bond is acting like active-backup as the
packets will always have as source MAC the address of the bond
interface. When a port goes down, NetworkManager will send a GARP
broadcast announcing the address configured on the bond with the MAC
address configured on the port.
Scenario 2: The bond in balance-slb mode is connected to a bridge and has
some ports connected. The bridge has IPv4 configured. When a port goes
down, NetworkManager will send a GARP broadcast announcing the address
configured on the bridge with the MAC address configured on the port.
Scenario 3: The bond in balance-slb mode is connected to a bridge and
has some ports connected. The bridge does not have IP configuration and
therefore everything is L2. When a port goes down, NetworkManager will
query the FDB table and filter the entries by the ones belonging to the
bridge and the bond ifindexes. Then, it will send a RARP broadcast
announcing every learned MAC address from FDB.
Fixes: e9268e3924 ('firewall: add mlag firewall utils for multi chassis link aggregation (MLAG) for bonding-slb')
(cherry picked from commit 3f2f922dd9)
(cherry picked from commit e9e1768c37)
The function introduced queries the FDB table via netlink socket. It
accepts a list of ifindexes to filter out the FDB content not related to
it. It returns an array of MAC addresses.
To cltarify this function is unusually exposed directly on
nm-linux-platform.h as we don't want this be part of the whole
NMPlatform object or cache. This, is an exception to the rule to
simplify the integration of this functionality on NetworkManager.
In addition, it also doesn't use the async mechanism that is widely used
on netlink communication across nm-linux-platform. Again, the reason is
to simplify its use, as async communication won't provide a benefit to
the use cases we have planned for this, i.e balance-slb RARP announcing.
(cherry picked from commit 00f47efcb2)
(cherry picked from commit 8af7493627)
Add a hash generation helper for NMEtherAddr struct. This can be used
for HashTables containing pointers to NMEtherAddr structs.
(cherry picked from commit a63eec924c)
(cherry picked from commit 6371802087)
This function would be useful when performing operations related to the
IPv4 addresses configured on the l3cfg. E.g this function will be used
for getting the IPv4 to announce on a GARP on bonding-slb when one of
the ports failover.
(cherry picked from commit 69f3493670)
(cherry picked from commit bfe2047acc)
Calling c_list_link_tail() on a list entry that already belongs to
another list corrupts the other list, in this case 'old_lst_head';
this is explained in the documentation of c_list_link_before():
* @what is not inspected prior to being linked. Hence, it better not
* be linked into another list, or the other list will be corrupted.
This can be reproduced by invoking "nmcli device wifi rescan ssid x"
multiple times; in this way, _scan_request_ssids_track() reuses the
previous SSID data, the list gets corrupted and this causes a crash.
Fixes: 7500e90b53 ('wifi: rework scanning of Wi-Fi device')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2076
(cherry picked from commit 3b75577871)
(cherry picked from commit 3917235a2d)
NetworkManager current code will refuse to activate a connection if its
interface has no SRIOV capacity but holding a empty SRIOV settings.
This patch only valid SRIOV capacity when it is enabled(total_vfs > 0).
Resolves: https://issues.redhat.com/browse/RHEL-58397
Signed-off-by: Gris Ge <fge@redhat.com>
(cherry picked from commit 421ccf8b4c)
(cherry picked from commit c9e31e70cb)
This was causing test_nml_dbus_meta() unit test to fail and also it was
affecting the notification when the object changed.
Fixes: 5426bdf4a1 ('HSR: add support to HSR/PRP interface')
(cherry picked from commit 1e70f24378)
(cherry picked from commit 622f188621)
The setting was missing from the script. The patch is adding it and also
regenerates the docs.
Fixes: 5426bdf4a1 ('HSR: add support to HSR/PRP interface')
(cherry picked from commit a0696e27b8)
(cherry picked from commit f38dcdf57b)
The HSR DBus metadata was defined properly but not exported on the libnm
library properly. This was causing that clients were not showing the HSR
devices.
Fixes: 5426bdf4a1 ('HSR: add support to HSR/PRP interface')
(cherry picked from commit 5e4696a693)
(cherry picked from commit 19929fdc9a)
When the attach_port()/detach_port() methods do not return immediately
(currently, only for OVS ports), the following situation can arise:
- nm_device_controller_attach_port() starts the attachment by sending
the command to ovsdb. Note that here we don't set
`PortInfo->port_is_attached` to TRUE yet; that happens only after
the asynchronous command returns;
- the activation of the port gets interrupted because the connection
is deleted;
- the port device enters the deactivating state, triggering function
port_state_changed()
- the function calls nm_device_controller_release_port() which checks
whether the port is already attached; since
`PortInfo->port_is_attached` is not set yet, it assumes the port
doesn't need to be detached;
- in the meantime, the ovsdb operation succeeds. As a consequence,
the kernel link is created even if the connection no longer exists.
Fix this by turning `port_is_attached` into a tri-state variable that
also tracks when the port is attaching. When it is, we need to perform
an explicit detach during deactivation.
Fixes: 9fcbc6b37d ('device: make attach_port() asynchronous')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2043
Resolves: https://issues.redhat.com/browse/RHEL-58026
(cherry picked from commit a8329587c8)
(cherry picked from commit d809ca6db2)
When handling event TIMEOUT, "acd_data->probing_timeout_msec" needs to
be always initialized before jumping to "handle_start_probing:";
otherwise, an assertion failure is triggered at:
static void
_l3_acd_data_timeout_schedule_probing_restart(AcdData *acd_data, gint64 now_msec)
{
...
nm_assert(acd_data->probing_timeout_msec > 0);
Even if the ACD data is already in state PROBE, that doesn't mean that
the timeout is already initialized because the PROBE state can also be
reached from a INSTANCE_RESET event; and depending on the previous
state "acd_data->probing_timeout_msec" could be uninitialized.
Fixes-test: @iptunnel_restart
Fixes: b8f9d7b5dd ('l3cfg: rework ACD handling in NML3Cfg to support handling conflicts')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2023
(cherry picked from commit a09f9cc616)
(cherry picked from commit 4dde5aa787)
Previously, the "edit" or "delete" buttons were clickable even
if there were no available connections, which was not expected
and caused an assertion to fail when clicked. This is because
the connections list could contain connections that were later
filtered out and not displayed in the final list, but the check
did not take this into account.
Make it so that the buttons are clickable only if we *actually*
have any available connections to edit or delete.
Fixes: 3bda3fb60c ('nmtui: initial import of nmtui')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1991
(cherry picked from commit c9fefcd095)
(cherry picked from commit f6e4d1b2e0)
The "StartLimitIntervalSec" and "StartLimitBurst" directives should be
in the [Unit] section instead of the [Service] one.
Fixes: 927cff9f17 ('cloud-setup: allow bigger restart bursts')
(cherry picked from commit a531458456)
(cherry picked from commit e34c7cd5a2)
When using the netdev datapath, we wait for the link to appear in
different steps:
1. initially, in act_stage3_ip_config() connects to platform's
"link-changed" signal to detect when the TUN interface appears;
2. when the interface appears, _netdev_tun_link_cb() schedules
_set_ip_ifindex_tun() in a idle handler;
3. _set_ip_ifindex_tun() checks if the link is ready (e.g. if the MAC
address is correct) and in that case it reschedules stage3, which
will move forward with the activation;
4. if the link is not ready in _set_ip_ifindex_tun(), the function
connects again to platform's "link-changed" signal to react to link
changes;
5. after the link changes and it is ready, _netdev_tun_link_cb()
reschedules stage3, which moves forward with the activation;
With the current implementation it is possible that after step 2, if
act_stage3_ip_config() runs because it was already scheduled, it
registers again to the "link-changed" event; then when
_set_ip_ifindex_tun() is invoked it will hit assertion:
nm_assert(!priv->wait_link.tun_link_signal_id);
Fix this by preventing that the signal gets registered again after
step 2.
Fixes-test: @ovs_datapath_type_netdev_with_cloned_mac
Fixes: acf485196c ('ovs-interface: wait that the cloned MAC changes instead of setting it')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2024
(cherry picked from commit b6e69f3467)
(cherry picked from commit 50da988182)
When the lease expires, the DHCP client emits a LEASE_UPDATE event
with a NULL l3cd. After returning from the handler, it sends
immediately a DHCP DISCOVER message to try to get a new lease.
It is important that when the DISCOVER gets sent the address is no
longer configured on the interface. Otherwise, the server could see
that it is already in use and assign a different one. Therefore,
remove the address synchronously when handling the event.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1532
(cherry picked from commit 514a3cb610)
Don't enforce IP cleanup when devices are in deactivating state, to
make sure that network connection is still available for pre-down
dispatcher phase.
Fixes ac4e63ddda ('ip: support dhcp-send-release in NMSettingIpConfig')
https://bugzilla.suse.com/show_bug.cgi?id=1228154
(cherry picked from commit c61c552077)
The primary address is that placed at position 0 of all the IP Addresses
of the interface. Sometimes we put it in a different position in the
ipv4s array because we insert them in the order we receive, but it might
happen that the HTTP responses comes back in wrong order.
In order to solve this, we pass the index of the IPv4 address to the
callback and the address is added in the right position directly.
Co-authored-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
(cherry picked from commit 72014db629)
On daemon startup, we may end up enqueueing many nm-cloud-setup.service
restarts in very a short time. That is perfectly fine, just bump the
thresholds so that systemd doesn't get in the way too quickly.
100 requests in 1 seconds seem like a fair choice -- little bit on the
conservative side, yet still giving the service manager some room to
interfere on a chance things really go awry.
https://issues.redhat.com/browse/RHEL-49694
(cherry picked from commit 927cff9f17)
In case the user selects a setting/property with "goto" command, and
then attempts to tab-complete a setting/property pair, the original sett
and prop strings are overriden without freeing:
nmcli > goto 802-1x.pac-file
nmcli 802-1x.pac-file> set 802-1.lal<TAB>
Fixes: 79bc271685 ('cli: TAB-completion for enum-style property values (rh #1034126)')
(cherry picked from commit ca47fd882e)
Currently if the system hostname can't be determined, NetworkManager
only retries when something changes: a new address is added, the DHCP
lease changes, etc.
However, it might happen that the current failure in looking up the
hostname is caused by an external factor, like a temporary outage of
the DNS server.
Add a mechanism to retry the resolution with an increasing timeout.
https://issues.redhat.com/browse/RHEL-17972
(cherry picked from commit 04ad4c86d0)
If the socket's RX buffer is full it's probably because other
process is doing lot of changes very quickly, faster than we
can process them. Let's give the writer a small time to finish:
1. Avoid contending the kernel's RTNL lock, so we don't make
the whole situation even worse and it can finish earlier.
2. Avoid having to resync again and again due to trying to
resync while the writer is still doing quick changes, so
we are unable to catch up yet.
This won't help if this situation takes a long time or is
continuous, but that's unlikely to happen, and if it does,
it's the writer's fault for starving the whole system.
There is no need to progresively increase the backoff time
for the same reason: if this situation takes lot of time,
it's the writer's fault. It's neither a good idea because the whole NM
process will end being sleeping long times, not doing anything at all,
without being able to react when the Netlink messages burst stops.
(cherry picked from commit 830dd4ad9c)
Add a function to compare two arrays of NMPlatformBridgeVlan. It will
be used in the next commit to compare the VLANs from platform to the
ones we want to set.
To compare in a performant way, the vlans need to be normalized (no
duplicated VLANS, ranges into their minimal expression...). Add the
function nmp_utils_bridge_vlan_normalize.
Co-authored-by: Íñigo Huguet <ihuguet@redhat.com>
(cherry picked from commit 1c43fe5235)
For now, always reapply the VLANs unconditionally, even if they didn't
change in kernel.
To set again the VLANs on the port we need to clear all the existing
one before. However, this deletes also the VLAN for the default-pvid
on the bridge. Therefore, we need some additional logic to inject the
default-pvid in the list of VLANs.
Co-authored-by: Íñigo Huguet <ihuguet@redhat.com>
(cherry picked from commit c5d1e35f99)
Currently, nm_platform_link_set_bridge_vlans() accepts an array of
pointers to vlan objects; to avoid multiple allocations,
setting_vlans_to_platform() creates the array by piggybacking the
actual data after the pointers array.
In the next commits, the array will need to be manipulated and
extended, which is difficult with the current structure. Instead, pass
separately an array of objects and its size.
(cherry picked from commit e00c81b153)
During nm_lldp_neighbor_parse(), the NMLldpNeighbor is not yet added to
the NMLldpRX instance. Consequently, n->lldp_rx is NULL.
Note how we use lldp_x for logging, because we need it for the context
for which interface the logging statement is.
Thus, those debug logging statements will follow a NULL pointer and lead
to a crash.
Fixes: 630de288d2 ('lldp: add libnm-lldp as fork of systemd's sd_lldp_rx')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1550
(cherry picked from commit c2cddd3241)
If we add multiple default routes with the same metric and different
preferences, kernel merges them into a single ECMP route, with overall
preference equal to the preference of the first route
added. Therefore, the preference of individual routes is not
respected.
To avoid that, add routes with different metrics if they have
different preferences, so that they are not merged together.
We could configure only the route(s) with highest preference ignoring
the others, and the effect would be the same. However, it is better to
add all routes so that users can easily see from "ip route" that there
are multiple routers available.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1468https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1983
Fixes: 032b4e4371 ('core: use router preference for IPv6 routes')
(cherry picked from commit c437625a76)
As part of the conscious language effort we must provide an alternative
option to configure autoconnect-ports system-wide on NetworkManager
configuration file.
(cherry picked from commit ad68b28843)
As part of the conscious language efforts we are not writing offensive
terms into keyfiles anymore. This won't break users upgrading as we
still read such values if they are present into the keyfile.
For existing profiles, NetworkManager will remove the offensive terms
when editing the keyfile.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2009
(cherry picked from commit 9f6ecbae69)
It is possible that we learn the link is ready on stage3_ip_config
rather than in link_changed event due to a stage3_ip_config scheduled by
another component. In such cases, we proceed with IP configuration
without allocating the resources needed like initializing DHCP client.
In order to avoid that, if we learn during stage3_ip_config that the
link is now ready, we need to schedule another stage3_ip_config to
allocate the resources we might need.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2004
Fixes: 83bf7a8cdb ('ovs: wait for the link to be ready before activating')
(cherry picked from commit 40d51b9104)
We are currently asserting that the list of devices waiting for
auto-activation in NMPolicy is not empty. This condition is always
false because:
- NMDevice holds a reference to NMManager
- NMManager holds a reference to NMPolicy
- on dispose, NMDevice asserts that it's not in NMPolicy's
auto-activate list
Therefore if there is any NMDevice alive, NMPolicy must be alive as
well. Instead, if there is no NMDevice alive the list must be empty.
The assertion could fail only when the NMPolicy instance gets
disposed, which usually doesn't happen because it's still referenced
at shutdown.
Fixes: aede228974 ('core: assert that devices are not registered when disposing NMPolicy')
(cherry picked from commit 27b646cfa1)
When activating an ovs-interface we already wait for the cloned MAC
address to be set, ifindex is present and platform link also present but
in some cases this is not enough.
If an udev rule is in place it might modify the interface when it is in
a later stage of the activation causing some race conditions or
problems. In order to solve that, we must wait until the link is fully
initialized.
(cherry picked from commit 83bf7a8cdb)
When activating a port with its controller deactivating by new
activation, NM will register `state-change` signal waiting controller to
have new active connections. Once controller got new active connection,
the port will invoke `nm_active_connection_set_controller()` which lead
to assert error on
g_return_if_fail(!nm_dbus_object_is_exported(NM_DBUS_OBJECT(self)))
because this active connection is already exposed as DBUS object.
To fix the problem, we remove the restriction on controller been
write-only and notify DBUS object changes for controller property.
Signed-off-by: Gris Ge <fge@redhat.com>
(cherry picked from commit 83a2595970)
Currently, when the agent manager is sent a registration request
containing UTF-8 characters, it will form an invalid error message
using only one of the bytes from the UTF-8 sequence, which causes
an assertion in glib to fail, which replaces the returned error message
with "[Invalid UTF-8]". It will also print an assertion failure to the
console, or crash NetworkManager on non-release builds.
This commit makes it so that it instead prints out the character in
hexadecimal form if it isn't normally printable, so that it is once
again a valid UTF-8 string.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1965
Fixes: a30cf19858 ('agent: add agent manager and minimal agent class')
(cherry picked from commit c9327b2e8b)
It might happen that write() returns -1, but the errno is not EINTR.
In that case, the length would be incremented by 1, and the data pointer
to the data being written would be moved back by 1 byte on every error.
Make it so that the function exits with an error if it indicates an error.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1971
Fixes: 3bda3fb60c ('nmtui: initial import of nmtui')
(cherry picked from commit 13317bd536)
Before introducing the hostname lookup via nm-daemon-helper and
systemd-resolved, we used GLib's GResolver which internally relies on
the libc resolver and generally also returns results from /etc/hosts.
With the new mechanism we only ask to systemd-resolved (with
NO_SYNTHESIZE) or perform the lookup via the "dns" NSS module. In both
ways, /etc/hosts is not evaluated.
Since users relied on having the hostname resolved via /etc/hosts,
restore that behavior. Now, after trying the resolution via
systemd-resolved and the "dns" NSS module, we also try via the "files"
NSS module which reads /etc/hosts.
Fixes: 27eae4043b ('device: add a nm_device_resolve_address()')
(cherry picked from commit 410afccb32)
Introduce a new argument to specify a comma-separated list of NSS
services to use for the "resolve-address" command. For now only accept
"dns" and "files"; the latter can be used to do a lookup into
/etc/hosts.
Note that previously the command failed in presence of extra
arguments. Therefore, when downgrading NetworkManager without
restarting the service, the previously-installed version of the daemon
(newer) would spawn the helper with the extra argument, and the
newly-installed version of the helper (older) would fail. This issue
only impacts hostname resolution and can be fixed by just restarting
the daemon.
In the upgrade path everything works as before, with the only
difference that the helper will use by default both "dns" and "files"
services.
Don't strictly check for the absence of extra arguments, so that in
the future we can introduce more arguments without necessarily break
the downgrade path.
(cherry picked from commit 229bebfae9)
The OVS interface can be matched via MAC address; in that case, the
"connection.interface-name" property of the connection is empty.
When populating the ovsdb, we need to pass the actual interface name
from the device, not the one from the connection.
Fixes: 830a5a14cb ('device: add support for OpenVSwitch devices')
https://issues.redhat.com/browse/RHEL-34617
(cherry picked from commit be28a11735)