The DNS name can now also contain the DoT server name. It's not longer a
binary IP address only.
Extend NML3ConfigData to account for that. To track the additional
data, use the string representation. The alternative to have a separate
type that contains the parsed information would be cumbersome too.
It is almost always wrong, to split IPv4 and IPv6 behaviors at a high level.
Most of the code does something very similar. Combine the two functions.
and let them handle the difference closer to where it is.
Instead of assuming any address that disappeared was because of a DAD
failure, check explicitly that either:
- the address is still present with DADFAILED flag (in case it was a
permanent address), or
- the address was removed and platform recorded that it had the
DADFAILED flag.
A DAD failure is in most cases a symptom of a network
misconfiguration; as such it must be logged in the default
configuration (info level).
While at it, fix other log messages.
Since we evaluate platform changes in a idle handler, there can be
multiple DAD failure at the same time that must generate a single
ndisc.configuration-change signal.
The function is unused at the moment.
All the callers pass either AF_INET or AF_INET6, drop support for
AF_UNSPEC; this simplifies the function for the next commit that adds
a @conflicts argument.
This is useful for manual testing ("manual", in the sense that you can
write a script that tests the behavior of the platform cache, without
humanly reading the logfile).
Usage:
To write the content of the platform cache once:
./src/core/platform/tests/monitor -P -S './statefile'
To keep monitor running, and update the state file:
./src/core/platform/tests/monitor -S './statefile'
The variable is passed to nmtstp_run_command_check_external(), which accepts
-1 to mean choose randomly. Change the function signature to reflect that.
Handle IP Configuration requests from IWD so that, when IWD's main.conf
setting [General].NetworkConfigurationEnabled is true, we don't try to
run DHCP or static addressing in parallel with IWD's internal DHCP or
static addressing.
Since part of the IWD secret agent and the new NetConfig agent
registration code is common, the agent object's path is changed.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1337
On the surface, writing a file seams simple enough. But there are many
pitfalls:
- we should retry on EINTR.
- we should check for incomplete writes and loop.
- we possibly should check errors from close.
- we possibly should write to a temporary file and do atomic rename.
Use nm_utils_file_set_contents() to get this right.
Cleanup the handling of close().
First of all, closing an invalid (non-negative) file descriptor (EBADF) is
always a serious bug. We want to catch that. Hence, we should use nm_close()
(or nm_close_with_error()) which asserts against such bugs. Don't ever use
close() directly, to get that additional assertion.
Also, our nm_close() handles EINTR internally and correctly. Recent
POSIX defines that on EINTR the close should be retried. On Linux,
that is never correct. After close() returns, the file descriptor is
always closed (or invalid). nm_close() gets this right, and pretends
that EINTR is a success (without retrying).
The majority of our file descriptors are sockets, etc. That means,
often an error from close isn't something that we want to handle. Adjust
nm_close() to return no error and preserve the caller's errno. That is
the appropriate reaction to error (ignoring it) in most of our cases.
And error from close may mean that there was an IO error (except EINTR
and EBADF). In a few cases, we may want to handle that. For those
cases we have nm_close_with_error().
TL;DR: use almost always nm_close(). Unless you want to handle the error
code, then use nm_close_with_error(). Never use close() directly.
There is much reading on the internet about handling errors of close and
in particular EINTR. See the following links:
https://lwn.net/Articles/576478/https://askcodes.net/coding/what-to-do-if-a-posix-close-call-fails-https://www.austingroupbugs.net/view.php?id=529https://sourceware.org/bugzilla/show_bug.cgi?id=14627https://news.ycombinator.com/item?id=3363819https://peps.python.org/pep-0475/
instead of always re-requesting secrets on authentication failure ask NMSetting
if this is really needed. Currently only for the case "802.1x with TLS" this
behaves differently, i.e. no re-request.
Currently, when performing DNS resolution with systemd-resolved,
NetworkManager tells systemd-resolved to consider only DNS configuration
for the network interface that the connectivity check request will be
routed through. But this is not correct because DNS and routing are
configured entirely separately. For example, say we have a VPN that
receives all DNS but only a subset of routing. NetworkManager will
configure systemd-resolved with no DNS servers on any interface except
for the VPN interface, but will still route traffic through other
interfaces. This is entirely legitimate and works fine in practice,
except for the connectivity check.
To fix this, we just drop the restriction and allow systemd-resolved to
consider its full configuration, which is what gets used normally
anyway. This allows our connectivity check to match the real
configuration instead of failing spuriously.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1107https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1415
The "device ... not available because device is strictly unmanaged" is
almost certainly the least interesting of the reasons why connection
can't be activated on a device.
Invent a new error level for it and demote it.
Before:
Error: Connection activation failed: No suitable device found
for this connection (device lo not available because
device is strictly unmanaged).
After
Error: Connection activation failed: No suitable device found
for this connection (device eth0 not available because
profile is not compatible with device (...)).
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1433
It is not possible to configure a VLAN interface on unmanaged NIC.
This forces users who only want to create a VLAN interface to take
ownership over possibly shared underlying NIC.
In OpenShift, the SR-IOV operator is currently not using
NetworkManager to configure VFs. When it starts working with a NIC,
it explicitly makes it unmanaged. Then, users cannot create a VLAN
interface on PFs managed by the operator.
This commit eliminates this issue by allowing configuring VLAN on
an interface without requesting it to be managed by NetworkManager.
This commit is part of a broader change that eliminates inheriting
the unmanaged condition from the parent of a device, for all device
types:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1418https://bugzilla.redhat.com/show_bug.cgi?id=2110307
ensure_teamd_connection() is called from multiple spots. Sometimes
we call opportunistically without having started teamd (e.g. when on
update_connection() when generating a connection for teaming device that
was created) and handle the failure to connect gracefully.
Let's not pollute the logs with things on ERROR level that are not
actually serious. Replace the logging statements with DEBUG or WARN
depending on whether we expect ensure_teamd_connection() to actually
succeed.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1422
Call teamdctl_port_config_update_raw() when we're attaching a port even
if all of team-slave setting properties are default.
This is done to ensure teamd "knows" about the port (that is,
"teamdctl ... port present" returns success) when we're done activating
the slave connection. It will pick it up anyway from netlink, but that
can happen after the activation is done, resulting in a possible race.
Fixes-test: @remove_active_team_profile
https://bugzilla.redhat.com/show_bug.cgi?id=2102375https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1421
In nm_dns_manager_set_ip_config() we try to avoid calling update_dns()
unless something changes, because updating DNS is expensive and can
trigger other actions such as a new hostname resolution.
When we add a new ip_data, even if the new element is equivalent to
the old one that was removed, we need to sort the list again.
Fixes: ce0a36d20f ('dns: better track l3cd changes')
https://bugzilla.redhat.com/show_bug.cgi?id=2098574
Clang 15 ([1], [2]) added
Added the -Wunreachable-code-generic-assoc diagnostic flag (grouped
under the -Wunreachable-code flag) which is enabled by default and warns
the user about _Generic selection associations which are unreachable
because the type specified is an array type or a qualified type.
This causes compiler warnings with various uses of _Generic():
../src/libnm-glib-aux/nm-shared-utils.h:2489:12: error: due to lvalue conversion of the controlling expression, association of type 'const char *const *const' will never be selected becaus
e it is qualified [-Werror,-Wunreachable-code-generic-assoc]
return nm_strv_find_first((const char *const *) strv->pdata, strv->len, str);
^
../src/libnm-glib-aux/nm-shared-utils.h:475:25: note: expanded from macro 'nm_strv_find_first'
_nm_strv_find_first(NM_CAST_STRV_CC(list), (len), (needle))
^
../src/libnm-glib-aux/nm-macros-internal.h:397:22: note: expanded from macro 'NM_CAST_STRV_CC'
const char *const*const: (const char *const*) (value), \
^
Clang is correct.
[1] https://releases.llvm.org/15.0.0/tools/clang/docs/ReleaseNotes.html#improvements-to-clang-s-diagnostics
[2] https://reviews.llvm.org/D125259
Clang 15 now (correctly) warns about this:
../src/libnm-core-impl/nm-vpn-plugin-info.c:201:40: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
_nm_vpn_plugin_info_get_default_dir_etc()
^
void
../src/libnm-core-impl/nm-vpn-plugin-info.c:213:40: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
_nm_vpn_plugin_info_get_default_dir_lib()
^
void
../src/libnm-core-impl/nm-vpn-plugin-info.c:226:41: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
_nm_vpn_plugin_info_get_default_dir_user()
^
void
../src/libnm-core-impl/nm-vpn-plugin-info.c:315:29: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
nm_vpn_plugin_info_list_load()
^
void
For a long time we have a function like nm_uuid_generate_from_strings().
This was recently reworked and renamed, but it preserved behavior. Preserving
behavior is important for this function, because for the existing users,
we need to keep generating the same UUIDs.
Originally, this function was a variadic function with NULL sentinel.
That means, when you write
nm_uuid_generate_from_strings(uuid_type, type_arg, v1, v2, v3, NULL);
and v2 happens to be NULL, then v3 is ignored. That is most likely not
what the user intended. Maybe they had a bug and v2 should not be NULL.
But nm_uuid_generate_from_strings() should not require that all
arguments are non-NULL and it should not ignore arguments after the
first NULL.
For example, one user works around this via
uuid = nm_uuid_generate_from_strings_old("ibft",
s_hwaddr,
s_vlanid ? "V" : "v",
s_vlanid ? s_vlanid : "",
s_ipaddr ? "A" : "DHCP",
s_ipaddr ? s_ipaddr : "");
which is cumbersome and ugly.
That will be fixed next, by adding a function that doesn't suffer
from this problem. But "this problem" is part of the API of the
function, we cannot just change it. Instead, rename it and all
users, so they can keep doing the same.
New users of course should no longer use the "old" function.
At startup, we remove from ovsdb any existing interface created by NM
and later an interface with the same name might be readded. This can
cause race conditions. Consider this series of events:
1. at startup NM removes the entry from ovsdb;
2. ovsdb reports success;
3. NM inserts an interface with the same name again;
4. ovs-vswitch monitors ovsdb changes, and gets events for removal and
insertion. Depending on how those events are split in different
batches, it might decide:
4a. to delete the link and add it back, or
4b. to keep the existing link because the delete and insertion
cancel out each other.
When NM sees the link staying in platform, it doesn't know if it's
because of 4b or because 4a will happen eventually.
To avoid this ambiguity, after ovsdb reports the successful deletion
NM should also wait that the link disappears from platform.
Unfortunately, this means that ovsdb gets a dependency to the platform
code.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1386
For connections with multi-connect property set to "multiple", the
autoconnect-retries should be tracked per device and not per connection.
That means, if autoconnect-retries is set to 2, each device using that
connection should retry to autoconnect 2 times.
The device autoconnect retries is -2 by default. This is a special
value, in NMPolicy context, if the connection used is multi-connect the
device value will be set to match the connection retries. Each time the
device picks a different connection, it will reset the device
autoconnect retries to -2 and if needed, sync. with the connection
retries.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1387https://bugzilla.redhat.com/show_bug.cgi?id=2039734
This partly reverts 1fe8166fc9 ('device: only deactivate when the master
we've enslaved to goes away').
If the controller fails while the port is not yet fully attached,
before this patch the following happened:
<info> [1664299566.1065] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299566.1073] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299566.1073] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (not enslaved) (configure)
<debug> [1664299566.1073] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<info> [1664299566.1080] device (eth1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Note that now eth1 has no controller, but it lingers in "ip-config" state indefinitely.
If we look at a case where the port is already attached we see:
<info> [1664299540.9661] device (bond0): state change: secondaries -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299540.9667] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299540.9667] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (enslaved) (configure)
<debug> [1664299540.9667] platform: (eth1) link: releasing 10 from master 'bond0' (80)
...
<info> [1664299540.9740] device (bond0): detached bond port eth1
...
<debug> [1664299540.9749] device[a9f10ea824bb1725] (eth1): Activation: connection 'eth1' master failed
...
<warn> [1664299540.9749] device (eth1): queue-state[secondaries, reason:none, id:520]: replace previously queued state change
...
<debug> [1664299540.9750] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: queue state change
<debug> [1664299540.9751] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664299541.0201] device[a9f10ea824bb1725] (eth1): enslaved to unknown device 0 (??)
...
<debug> [1664299541.0227] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: change state
<info> [1664299541.0228] device (eth1): state change: ip-check -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Fix that by not ignoring the nm_device_slave_notify_release() call. Now we get:
<info> [1664391684.9757] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<debug> [1664391684.9759] active-connection[69c2b12d61f5b171]: set state deactivated (was activating)
<debug> [1664391684.9760] active-connection[142bb8240f6a696d]: check-master-ready: already signalled (state activating, master 0x56116f1480a0 is in state deactivated)
...
<debug> [1664391684.9762] manager: ActivatingConnection now (none)
...
<warn> [1664391684.9763] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664391684.9763] device[142828814dec6e26] (bond0): master: release one slave 720791275fe8a68c/eth1 (not enslaved) (configure)
<debug> [1664391684.9763] device[720791275fe8a68c] (eth1): Activation: connection 'eth1' master failed
...
<debug> [1664391684.9764] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: queue state change
<debug> [1664391684.9765] device[720791275fe8a68c] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664391684.9797] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: change state
<info> [1664391684.9797] device (eth1): state change: config -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Commit 1fe8166fc9 ('device: only deactivate when the master we've
enslaved to goes away') added the "return", but it seems to also add it
in cases where we need to handle this. Restrict the return to cases if
we do "no-config".
https://bugzilla.redhat.com/show_bug.cgi?id=2130287
Fixes: 1fe8166fc9 ('device: only deactivate when the master we've enslaved to goes away')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1406