Currently if the system hostname can't be determined, NetworkManager
only retries when something changes: a new address is added, the DHCP
lease changes, etc.
However, it might happen that the current failure in looking up the
hostname is caused by an external factor, like a temporary outage of
the DNS server.
Add a mechanism to retry the resolution with an increasing timeout.
https://issues.redhat.com/browse/RHEL-17972
(cherry picked from commit 04ad4c86d0)
If the socket's RX buffer is full it's probably because other
process is doing lot of changes very quickly, faster than we
can process them. Let's give the writer a small time to finish:
1. Avoid contending the kernel's RTNL lock, so we don't make
the whole situation even worse and it can finish earlier.
2. Avoid having to resync again and again due to trying to
resync while the writer is still doing quick changes, so
we are unable to catch up yet.
This won't help if this situation takes a long time or is
continuous, but that's unlikely to happen, and if it does,
it's the writer's fault for starving the whole system.
There is no need to progresively increase the backoff time
for the same reason: if this situation takes lot of time,
it's the writer's fault. It's neither a good idea because the whole NM
process will end being sleeping long times, not doing anything at all,
without being able to react when the Netlink messages burst stops.
(cherry picked from commit 830dd4ad9c)
Add a function to compare two arrays of NMPlatformBridgeVlan. It will
be used in the next commit to compare the VLANs from platform to the
ones we want to set.
To compare in a performant way, the vlans need to be normalized (no
duplicated VLANS, ranges into their minimal expression...). Add the
function nmp_utils_bridge_vlan_normalize.
Co-authored-by: Íñigo Huguet <ihuguet@redhat.com>
(cherry picked from commit 1c43fe5235)
For now, always reapply the VLANs unconditionally, even if they didn't
change in kernel.
To set again the VLANs on the port we need to clear all the existing
one before. However, this deletes also the VLAN for the default-pvid
on the bridge. Therefore, we need some additional logic to inject the
default-pvid in the list of VLANs.
Co-authored-by: Íñigo Huguet <ihuguet@redhat.com>
(cherry picked from commit c5d1e35f99)
Currently, nm_platform_link_set_bridge_vlans() accepts an array of
pointers to vlan objects; to avoid multiple allocations,
setting_vlans_to_platform() creates the array by piggybacking the
actual data after the pointers array.
In the next commits, the array will need to be manipulated and
extended, which is difficult with the current structure. Instead, pass
separately an array of objects and its size.
(cherry picked from commit e00c81b153)
During nm_lldp_neighbor_parse(), the NMLldpNeighbor is not yet added to
the NMLldpRX instance. Consequently, n->lldp_rx is NULL.
Note how we use lldp_x for logging, because we need it for the context
for which interface the logging statement is.
Thus, those debug logging statements will follow a NULL pointer and lead
to a crash.
Fixes: 630de288d2 ('lldp: add libnm-lldp as fork of systemd's sd_lldp_rx')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1550
(cherry picked from commit c2cddd3241)
If we add multiple default routes with the same metric and different
preferences, kernel merges them into a single ECMP route, with overall
preference equal to the preference of the first route
added. Therefore, the preference of individual routes is not
respected.
To avoid that, add routes with different metrics if they have
different preferences, so that they are not merged together.
We could configure only the route(s) with highest preference ignoring
the others, and the effect would be the same. However, it is better to
add all routes so that users can easily see from "ip route" that there
are multiple routers available.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1468https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1983
Fixes: 032b4e4371 ('core: use router preference for IPv6 routes')
(cherry picked from commit c437625a76)
As part of the conscious language effort we must provide an alternative
option to configure autoconnect-ports system-wide on NetworkManager
configuration file.
(cherry picked from commit ad68b28843)
As part of the conscious language efforts we are not writing offensive
terms into keyfiles anymore. This won't break users upgrading as we
still read such values if they are present into the keyfile.
For existing profiles, NetworkManager will remove the offensive terms
when editing the keyfile.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2009
(cherry picked from commit 9f6ecbae69)
It is possible that we learn the link is ready on stage3_ip_config
rather than in link_changed event due to a stage3_ip_config scheduled by
another component. In such cases, we proceed with IP configuration
without allocating the resources needed like initializing DHCP client.
In order to avoid that, if we learn during stage3_ip_config that the
link is now ready, we need to schedule another stage3_ip_config to
allocate the resources we might need.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2004
Fixes: 83bf7a8cdb ('ovs: wait for the link to be ready before activating')
(cherry picked from commit 40d51b9104)
We are currently asserting that the list of devices waiting for
auto-activation in NMPolicy is not empty. This condition is always
false because:
- NMDevice holds a reference to NMManager
- NMManager holds a reference to NMPolicy
- on dispose, NMDevice asserts that it's not in NMPolicy's
auto-activate list
Therefore if there is any NMDevice alive, NMPolicy must be alive as
well. Instead, if there is no NMDevice alive the list must be empty.
The assertion could fail only when the NMPolicy instance gets
disposed, which usually doesn't happen because it's still referenced
at shutdown.
Fixes: aede228974 ('core: assert that devices are not registered when disposing NMPolicy')
(cherry picked from commit 27b646cfa1)
When activating an ovs-interface we already wait for the cloned MAC
address to be set, ifindex is present and platform link also present but
in some cases this is not enough.
If an udev rule is in place it might modify the interface when it is in
a later stage of the activation causing some race conditions or
problems. In order to solve that, we must wait until the link is fully
initialized.
(cherry picked from commit 83bf7a8cdb)
When activating a port with its controller deactivating by new
activation, NM will register `state-change` signal waiting controller to
have new active connections. Once controller got new active connection,
the port will invoke `nm_active_connection_set_controller()` which lead
to assert error on
g_return_if_fail(!nm_dbus_object_is_exported(NM_DBUS_OBJECT(self)))
because this active connection is already exposed as DBUS object.
To fix the problem, we remove the restriction on controller been
write-only and notify DBUS object changes for controller property.
Signed-off-by: Gris Ge <fge@redhat.com>
(cherry picked from commit 83a2595970)
Currently, when the agent manager is sent a registration request
containing UTF-8 characters, it will form an invalid error message
using only one of the bytes from the UTF-8 sequence, which causes
an assertion in glib to fail, which replaces the returned error message
with "[Invalid UTF-8]". It will also print an assertion failure to the
console, or crash NetworkManager on non-release builds.
This commit makes it so that it instead prints out the character in
hexadecimal form if it isn't normally printable, so that it is once
again a valid UTF-8 string.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1965
Fixes: a30cf19858 ('agent: add agent manager and minimal agent class')
(cherry picked from commit c9327b2e8b)
It might happen that write() returns -1, but the errno is not EINTR.
In that case, the length would be incremented by 1, and the data pointer
to the data being written would be moved back by 1 byte on every error.
Make it so that the function exits with an error if it indicates an error.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1971
Fixes: 3bda3fb60c ('nmtui: initial import of nmtui')
(cherry picked from commit 13317bd536)
Before introducing the hostname lookup via nm-daemon-helper and
systemd-resolved, we used GLib's GResolver which internally relies on
the libc resolver and generally also returns results from /etc/hosts.
With the new mechanism we only ask to systemd-resolved (with
NO_SYNTHESIZE) or perform the lookup via the "dns" NSS module. In both
ways, /etc/hosts is not evaluated.
Since users relied on having the hostname resolved via /etc/hosts,
restore that behavior. Now, after trying the resolution via
systemd-resolved and the "dns" NSS module, we also try via the "files"
NSS module which reads /etc/hosts.
Fixes: 27eae4043b ('device: add a nm_device_resolve_address()')
(cherry picked from commit 410afccb32)
Introduce a new argument to specify a comma-separated list of NSS
services to use for the "resolve-address" command. For now only accept
"dns" and "files"; the latter can be used to do a lookup into
/etc/hosts.
Note that previously the command failed in presence of extra
arguments. Therefore, when downgrading NetworkManager without
restarting the service, the previously-installed version of the daemon
(newer) would spawn the helper with the extra argument, and the
newly-installed version of the helper (older) would fail. This issue
only impacts hostname resolution and can be fixed by just restarting
the daemon.
In the upgrade path everything works as before, with the only
difference that the helper will use by default both "dns" and "files"
services.
Don't strictly check for the absence of extra arguments, so that in
the future we can introduce more arguments without necessarily break
the downgrade path.
(cherry picked from commit 229bebfae9)
The OVS interface can be matched via MAC address; in that case, the
"connection.interface-name" property of the connection is empty.
When populating the ovsdb, we need to pass the actual interface name
from the device, not the one from the connection.
Fixes: 830a5a14cb ('device: add support for OpenVSwitch devices')
https://issues.redhat.com/browse/RHEL-34617
(cherry picked from commit be28a11735)
The daemon is now capable of understanding and removing these prefix
tags by itself. It is better than this is not a responsibility of the
secret agent because it requires changes in all secret agents to work
properly (see https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1536).
If the secret agent knows what these prefix tags are, it can remove them
only in the text that is displayed in the UI, but maintaining the
original string as the secret name that is returned to the daemon.
Secret agents that doesn't know what these prefix tags are won't do
anything with them, and they will also return the same string as secret
name, as expected. The only drawback is that they might display the full
string to the user, which is not a nice UX but it will at least work.
Also, allow to translate the secret name for the UI in libnmc.
(cherry picked from commit 18240bb72d)
Commit 345bd1b187 ('libnmc: fix secrets request on 2nd stage of 2FA
authentication') and commit 27c701ebfb ('libnmc: allow user input in
ECHO mode for 2FA challenges') introduced 2 new tags that hints for the
secret agents can have as prefix.
These tags were processed (and removed) in the secret agents, not in the
daemon. This is wrong because a system with an updated VPN plugin but a
not yet updated secret agent (like nm-plasma) will fail: it won't remove
the prefix and the daemon will save the secret with the prefix, i.e.
"x-dynamic-challenge:challenge-response" instead of just
"challenge-response". Then, VPN plugins doesn't recognize it, failing the
profile's activation. This is, in fact, an API break.
Also, if the VPN connection already existed before updating NM and the
VPN plugin, the secret flags are not added to the profile (they are only
added when the profile is created or modified). This causes the user's
first time response is saved to the profile, so the activation fails the
second and next times.
See:
- https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1536
- https://gitlab.gnome.org/GNOME/NetworkManager-openvpn/-/issues/142
Anyway, in a good design the daemon should contain almost all the logic
and the clients should keep as simple as possible. Fix above's problems
by letting the daemon to receive the secret names with the prefix
already included. The daemon will strip it and will know what it means.
Note that this is done only in the functions that saves the secrets from
the data received via D-Bus. For example, nm_setting_vpn_add_secret
doesn't need to do it because this value shouldn't come from VPN
plugin's hints.
(cherry picked from commit 0583e1f843)
Connection timestamps are updated (saved to disk) on connection up and
down. This way, the last used connection will take precedence for
autoconnect if they have the same priority.
But as we don't actually do connection down when NM stops, the last
connection timestamp of all active connections is the timestamp of when
they were brought up. Then, the activation order might be wrong on next
start.
One case where timestamps are wrong (although it is not clear how
important it is because the connections are activated on different
interfaces):
1. Activate con1 <- timestamp updated
2. Activate con2 <- timestamp updated
3. Deactivate con2 <- timestamp updated
4. Stop NM <- timestamp of con2 is higher than con1, but con1 was still
active when con2 was brought down.
Other case that is reproducible (from
https://issues.redhat.com/browse/RHEL-35539):
1. Activate con1
2. Activate con2 on same interface:
- As a consequence con1 is deactivated and its timestamp updated
- The timestamp of con2 is also updated
3. Stop NM <- timestamp of con1 and con2 is the same, next activation
order will be undefined.
Fix by saving the timestamps on NM shutdown.
Resolves: https://issues.redhat.com/browse/RHEL-35539
(cherry picked from commit 4bf11b7d66)
Problem:
Given a OVS port with `autoconnect-ports` set to default or false,
when reactivation required for checkpoint rollback,
previous activated OVS interface will be in deactivate state after
checkpoint rollback.
The root cause:
The `activate_stage1_device_prepare()` will mark the device as
failed when controller is deactivating or deactivated.
In `activate_stage1_device_prepare()`, the controller device is
retrieved from NMActiveConnection, it will be NULL when NMActiveConnection
is in deactivated state. This will cause device been set to
`NM_DEVICE_STATE_REASON_DEPENDENCY_FAILED` which prevent all follow
up `autoconnect` actions.
Fix:
When noticing controller is deactivating or deactivated with reason
`NM_DEVICE_STATE_REASON_NEW_ACTIVATION`, use new function
`nm_active_connection_set_controller_dev()` to wait on controller
device state between NM_DEVICE_STATE_PREPARE and
NM_DEVICE_STATE_ACTIVATED. After that, use existing
`nm_active_connection_set_controller()` to use new
NMActiveConnection of controller to move on.
Resolves: https://issues.redhat.com/browse/RHEL-31972
Signed-off-by: Gris Ge <fge@redhat.com>
(cherry picked from commit a68d2fd780)
Usually, when the method is "auto" we want to avoid configuring routes
until the automatic method completes. To achieve that, we clear the
"allow_routes_without_address" flag of l3cds when the method is "auto".
For VPNs, IP configurations with only routes are perfectly valid,
therefore set the flag.
(cherry picked from commit d1ffdb28eb)
The name "dhcp_enabled" is misleading because the flag is set for
method=auto, which doesn't necessarily imply DHCP. Also, it doesn't
convey what the flag is used for. Rename it to
"allow_routes_without_address".
(cherry picked from commit b31febea22)
An IPv4-over-IPv6 (or vice-versa) IPsec VPN can return IP
configurations with routes and without addresses. For example, in this
scenario:
+---------------+ +---------------+
| fd01::10/64 <-- VPN --> fd02::20/64 |
| host1 | | host2 |
+-------^-------+ +-------^-------+
| |
+-------v-------+ +-------v-------+
| subnet1 | | subnet2 |
| 172.16.1.0/24 | | 172.16.2.0/24 |
+---------------+ +---------------+
host1 and host2 establish a IPv6 tunnel which encapsulates packets
between the two IPv4 subnets. Therefore, in routed mode, host1 will
need to configure a route like "172.16.2.0/24 via ipsec1" even if the
host doesn't have any IPv4 address on the VPN interface.
Accept IP configurations without address from the VPN; only check that
the address and prefix are sane if they are provided.
(cherry picked from commit 97f185e1f8)
Commit 797f3cafee ('device: fall back to saved use_tempaddr value
instead of rereading /proc') changed the behaviour of how to get the
last resort default value for ip6-privacy property.
Previously we read it from /proc/sys/net/ipv6/conf/default, buf after
this commit we started to read /proc/sys/net/ipv6/conf/<iface> instead,
because the user might have set a different value specific for that device.
As NetworkManager changes that value on connection activation, we used
the value read at the time that NetworkManager was started.
Commit 6cb14ae6a6 ('device: introduce ipv6.temp-valid-lifetime and
ipv6.temp-preferred-lifetime properties') introduced 2 new IPv6 privacy
related properties relying on the same mechanism.
However, this new behaviour is problematic because it's not predictable
nor reliable:
- NetworkManager is normally started at boot time. That means that, if a
user wants to set a new value to /proc/sys/net/ipv6/conf/<iface>,
NetworkManager is likely alread running, so the change won't take
effect.
- If NetworkManager is restarted it will read the value again, but this
value can be the one set by NetworkManager itself in the last
activation. This means that different values can be used as default in
the same system boot depending on the restarts of NetworkManager.
Moreover, this weird situation might happen:
- Connection A with ip6-privacy=2 is activated
- NetworkManager is stopped. The value in
/proc/sys/net/ipv6/conf/<iface>/use_tempaddr remains as 2.
- NetworkManager starts. It reads from /proc/sys/... and saves the value
'2' as the default.
- Connection B with no ip6-privacy setting is activated. The '2' saved
as default value is used. The connection didn't specify any value for
it, and the value '2' was set by another connection for that specific
connection only, not manually by a user that wanted '2' to be the
default.
A user shouldn't have to think on when NetworkManager starts or restarts
to known in an easy and predictable way what the default value for
certain property is. It's totally counterintuitive.
Revert back to the old behaviour of reading from
/proc/sys/net/ipv6/conf/default. Although this value is used by the
kernel only for newly created interfaces, and not for already existing
ones, it is reasonable to think on these settings as "systemwide
defaults" that the user has chosen.
Note that setting a different default in NetworkManager.conf still takes
precedence.
(cherry picked from commit 7ec363a79a)
If a connection is in-memory (i.e. has flag "unsaved"), after a
checkpoint and rollback it can be wrongly persisted to disk:
- if the connection was modified and written to disk after the
rollback, during the rollback we update it again with persist mode
"keep", which keeps it on disk;
- if the connection was deleted after the rollback, during the
rollback we add it again with persist mode "to-disk".
Instead, remember whether the connection had the "unsaved" flag set
and try to restore the previous state.
However, this is not straightforward as there are 4 different possible
states for the settings connection: persistent; in-memory only;
in-memory shadowing a persistent file; in-memory shadowing a detached
persistent file (i.e. the deletion of the connection doesn't delete
the persistent file). Handle all those cases.
Fixes: 3e09aed2a0 ('checkpoint: add create, rollback and destroy D-Bus API')
When we recibe a Netlink message with a "route change" event, normally
we just ignore it if it's a route that we don't track (i.e. because of
the route protocol).
However, it's not that easy if it has the NLM_F_REPLACE flag because
that means that it might be replacing another route. If the kernel has
similar routes which are candidates for the replacement, it's hard for
NM to guess which one of those is being replaced (as the kernel doesn't
have a "route ID" or similar field to indicate it). Moreover, the kernel
might choose to replace a route that we don't have on cache, so we know
nothing about it.
It is important to note that we cannot just discard Netlink messages of
routes that we don't track if they has the NLM_F_REPLACE. For example,
if we are tracking a route with proto=static, we might receive a replace
message, changing that route to proto=other_proto_that_we_dont_track. We
need to process that message and remove the route from our cache.
As NM doesn't know what route is being replaced, trying to guess will
lead to errors that will leave the cache in an inconsistent state.
Because of that, it just do a cache resync for the routes.
For IPv4 there was an optimization to this: if we don't have in the
cache any route candidate for the replacement there are only 2 possible
options: either add the new route to the cache or discard it if we are
not interested on it. We don't need a resync for that.
This commit is extending that optimization to IPv6 routes. There is no
reason why it shouldn't work in the same way than with IPv4. This
optimization will only work well as long as we find potential candidate
routes in the same way than the kernel (comparing the same fields). NM
calls to this "comparing by WEAK_ID". But this can also happen with IPv4
routes.
It is worth it to enable this optimization because there are routing
daemons using custom routing protocols that makes tens or hundreds of
updates per second. If they use NLM_F_REPLACE, this caused NM to do a
resync hundreds of times per second leading to a 100% CPU usage:
https://issues.redhat.com/browse/RHEL-26195
An additional but smaller optimization is done in this commit: if we
receive a route message for routes that we don't track AND doesn't have
the NLM_F_REPLACE flag, we can ignore the entire message, thus avoiding
the memory allocation of the nmp_object. That nmp_object was going to be
ignored later, anyway, so better to avoid these allocations that, with
the routing daemon of the above's example, can happen hundreds of times
per second.
With this changes, the CPU usage doing `ip route replace` 300 times/s
drops from 100% to 1%. Doing `ip route replace` as fast as possible,
without any rate limitting, still keeps NM with a 3% CPU usage in the
system that I have used to test.
The D-Bus and C APIs admit setting the 802.1X certificates as blobs, as
the documentation of the properties explains. However, this is not
possible from nmcli, where only path to the certificates' files is possible.
This difference in nmcli was explained in the description message that
is shown in nmcli's editor, but this is a documentation that most users
won't ever see, and still the main documentation in nm-settings-nmcli is
missleading.
Add a nmcli specific documentation for the relevant properties and
remove the nmcli's editor descriptions as they are no longer needed.
This allows SLAAC for IPv6 to be performed, even when no IPv6
address was passed by the bearer. The link-local address will be
assigned, because of do_auto = TRUE.
The commit also allows the DNS assignment to be made statically when
no IPv6 address has been statically assigned yet. This is to be able
to receive IPv6 DNS servers via signalling, where host SLAAC still
needs to be performed for some modems (e.g. some huawei modems).
This also changes the logging so that SLAAC usage is logged
on a separate line.
In the gtkdoc comments, the text below tags like `Since: 1.2` is
discarded. In the property `autoconnect-slaves` a line indicating its
deprecation was below one of these tags. As a result, it was missing in
the man page. Fix it.
Fixes: 194455660d ('connection: deprecate NMSettingConnection autoconnect-slaves property')