We don't cache certain routes, for example based on the protocol. This is
a performance optimization to ignore routes that we usually don't care
about.
Still, if the user does `ip route replace` with such a route, then we
need to pass it to nmp_cache_update_netlink_route(), so that we can
properly remove the replaced route.
Knowing which route was replaces might be impossible, as our cache does
not contain all routes. Likely all that nmp_cache_update_netlink_route()
can to is to set "resync_required" for NLM_F_REPLACE. But for that it
should see the object first.
This also means, if we ever write a BPF filter to filter out messages
that contain NLM_F_REPLACE, because that would lead to cache inconsistencies.
In kernel, the valid range for the weight is 1-256 (on netlink this is
expressed as u8 in rtnh_hops, ranging 0-255).
We need an additional value, to represent
- unset weight, for non-ECMP routes in kernel.
- in libnm API, to express routes that should not be merged as ECMP
routes (the default).
Extend the type in NMPlatformIP4Route.weight to u16, and fix the code
for the special handling of the numeric range.
Also the libnm API needs to change. Modify the type of the attribute on
D-Bus from "b" to "u", to use a 32 bit integer. We use 32 bit, because
we already have common code to handle 32 bit unsigned integers, despite
only requiring 257 values. It seems better to stick to a few data types
(u32) instead of introducing more, only because the range is limited.
Co-Authored-By: Fernando Fernandez Mancera <ffmancera@riseup.net>
Fixes: 1bbdecf5e1 ('platform: manage ECMP routes')
Sometimes the buffer space of the netlink socket runs out and we lose
the response to our link change:
<info> [1670321010.2952] platform-linux: netlink[rtnl]: read: too many netlink events. Need to resynchronize platform cache
<warn> [1670321010.3467] platform-linux: do-change-link[2]: failure changing link: internal failure 3
With 3 above being WAIT_FOR_NL_RESPONSE_RESULT_FAILED_RESYNC.
Let's try harder.
https://bugzilla.redhat.com/show_bug.cgi?id=2154350
This is not nice:
<warn> [1670321010.3467] platform-linux: do-change-link[2]: failure changing link: internal failure 3
Let's explain what "internal failure 3" is.
The warning "-Wcast-align=strict" seems useful and will be enabled
next. Fix places that currently cause the warning by using the
new macro NM_CAST_ALIGN(). This macro also nm_assert()s that the alignment
is correct.
This is the version shipped in Fedora 37. As Fedora 37 is now out, the
core developers switch to it. Our gitlab-ci will also use that as base
image for the check-{patch.tree} tests and to generate the pages. There
is a need that everybody agrees on which clang-format version to use,
and that version should be the one of the currently used Fedora release.
Also update the used Fedora image in "contrib/scripts/nm-code-format-container.sh"
script.
The gitlab-ci still needs update in the following commit. The change
in isolation will break the "check-tree" test.
When adding a new route we need to consider it contains extra nexthops
i.e it is a ECMP route. As we cannot modify the NMPObject once created,
we need to pass the extra nexthops as an argument.
We cannot use the original NMPObject because normalization is happening
during when adding the route.
When reading from netlink an ECMP IPv4 route, we need to parse the
multiple nexthops. In order to do that, we are introducing
NMPlatformIP4RtNextHop struct.
The first nexthop information will be kept at the original
NMPlatformIP4Route and the new property n_nexthops will indicate how
many nexthops we need to consider.
If an address is removed during pruning and it had the TENTATIVE flag
before, the most likely cause of the removal is that it failed DAD. It
could also be that the user removed it at the same time we needed to
resync the platform cache, but that seems more unlikely.
Cleanup the handling of close().
First of all, closing an invalid (non-negative) file descriptor (EBADF) is
always a serious bug. We want to catch that. Hence, we should use nm_close()
(or nm_close_with_error()) which asserts against such bugs. Don't ever use
close() directly, to get that additional assertion.
Also, our nm_close() handles EINTR internally and correctly. Recent
POSIX defines that on EINTR the close should be retried. On Linux,
that is never correct. After close() returns, the file descriptor is
always closed (or invalid). nm_close() gets this right, and pretends
that EINTR is a success (without retrying).
The majority of our file descriptors are sockets, etc. That means,
often an error from close isn't something that we want to handle. Adjust
nm_close() to return no error and preserve the caller's errno. That is
the appropriate reaction to error (ignoring it) in most of our cases.
And error from close may mean that there was an IO error (except EINTR
and EBADF). In a few cases, we may want to handle that. For those
cases we have nm_close_with_error().
TL;DR: use almost always nm_close(). Unless you want to handle the error
code, then use nm_close_with_error(). Never use close() directly.
There is much reading on the internet about handling errors of close and
in particular EINTR. See the following links:
https://lwn.net/Articles/576478/https://askcodes.net/coding/what-to-do-if-a-posix-close-call-fails-https://www.austingroupbugs.net/view.php?id=529https://sourceware.org/bugzilla/show_bug.cgi?id=14627https://news.ycombinator.com/item?id=3363819https://peps.python.org/pep-0475/
When there are many VFs the default buffer size of 1 memory page is
not enough. Each VF can take up to ~120 bytes and so when the page
size is 4KiB at most ~34 VFs can be added.
Specify the buffer size when allocating the message.
Add a len argument to nlmsg_alloc() and nlmsg_alloc_simple(). After
that, nlmsg_alloc_size() can be dropped. Also, rename
nlmsg_alloc_simple() to nlmsg_alloc_new().
Prefer it over strncmp(), because it seems easier to understand (to me).
Prefer it over g_str_has_prefix(), because it can directly expand
to a plain strncmp() -- instead of first humping to glib, then calling
strlen() before calling strncmp().
Turns out, modern rmnet_* devices doesn't use ARPHRD_ETHER arptype, but
ARPHRD_RAWIP. Also complicating the fact is that ARPHRD_RAWIP is
actually added in v4.14, but devices using kernel before that version
define this value as "530" in an out-of-tree patch [1].
Recognize this case and check explicitly for 3 values of arptype.
[1] 54948008c2https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1392
These variants provide additional nm_assert() checks, and are thus
preferable.
Note that we cannot just blindly replace &g_array_index() with
&nm_g_array_index(), because the latter would not allow getting a
pointer at index [arr->len]. That might be a valid (though uncommon)
usecase. The correct replacement of &g_array_index() is thus
nm_g_array_index_p().
I checked the code manually and replaced uses of nm_g_array_index_p()
with &nm_g_array_index(), if that was a safe thing to do. The latter
seems preferable, because it is familar to &g_array_index().
On netlink API, the attribute is indeed u32. However, this is an ifindex
which in most other kernel APIs and in NetworkManager code is a signed
integer. Note that of course kernel would only ever assign numbers that
are valid ifindexes, thus in the suitable range.
The IFLA_BOND_ACTIVE_SLAVE and IFLA_BOND_PRIMARY are not the same.
If the primary is not set, then that's it. Don't fallback.
Only NetworkManager API deprecated "active-slave" and uses it as
alias for "primary". That does not mean, kernel/netlink does that.
- name things related to `in_addr_t`, `struct in6_addr`, `NMIPAddr` as
`nm_ip4_addr_*()`, `nm_ip6_addr_*()`, `nm_ip_addr_*()`, respectively.
- we have a wrapper `nm_inet_ntop()` for `inet_ntop()`. This name
of our wrapper is chosen to be familiar with the libc underlying
function. With this, also name functions that are about string
representations of addresses `nm_inet_*()`, `nm_inet4_*()`,
`nm_inet6_*()`. For example, `nm_inet_parse_str()`,
`nm_inet_is_normalized()`.
<<<<
R() {
git grep -l "$1" | xargs sed -i "s/\<$1\>/$2/g"
}
R NM_CMP_DIRECT_IN4ADDR_SAME_PREFIX NM_CMP_DIRECT_IP4_ADDR_SAME_PREFIX
R NM_CMP_DIRECT_IN6ADDR_SAME_PREFIX NM_CMP_DIRECT_IP6_ADDR_SAME_PREFIX
R NM_UTILS_INET_ADDRSTRLEN NM_INET_ADDRSTRLEN
R _nm_utils_inet4_ntop nm_inet4_ntop
R _nm_utils_inet6_ntop nm_inet6_ntop
R _nm_utils_ip4_get_default_prefix nm_ip4_addr_get_default_prefix
R _nm_utils_ip4_get_default_prefix0 nm_ip4_addr_get_default_prefix0
R _nm_utils_ip4_netmask_to_prefix nm_ip4_addr_netmask_to_prefix
R _nm_utils_ip4_prefix_to_netmask nm_ip4_addr_netmask_from_prefix
R nm_utils_inet4_ntop_dup nm_inet4_ntop_dup
R nm_utils_inet6_ntop_dup nm_inet6_ntop_dup
R nm_utils_inet_ntop nm_inet_ntop
R nm_utils_inet_ntop_dup nm_inet_ntop_dup
R nm_utils_ip4_address_clear_host_address nm_ip4_addr_clear_host_address
R nm_utils_ip4_address_is_link_local nm_ip4_addr_is_link_local
R nm_utils_ip4_address_is_loopback nm_ip4_addr_is_loopback
R nm_utils_ip4_address_is_zeronet nm_ip4_addr_is_zeronet
R nm_utils_ip4_address_same_prefix nm_ip4_addr_same_prefix
R nm_utils_ip4_address_same_prefix_cmp nm_ip4_addr_same_prefix_cmp
R nm_utils_ip6_address_clear_host_address nm_ip6_addr_clear_host_address
R nm_utils_ip6_address_same_prefix nm_ip6_addr_same_prefix
R nm_utils_ip6_address_same_prefix_cmp nm_ip6_addr_same_prefix_cmp
R nm_utils_ip6_is_ula nm_ip6_addr_is_ula
R nm_utils_ip_address_same_prefix nm_ip_addr_same_prefix
R nm_utils_ip_address_same_prefix_cmp nm_ip_addr_same_prefix_cmp
R nm_utils_ip_is_site_local nm_ip_addr_is_site_local
R nm_utils_ipaddr_is_normalized nm_inet_is_normalized
R nm_utils_ipaddr_is_valid nm_inet_is_valid
R nm_utils_ipx_address_clear_host_address nm_ip_addr_clear_host_address
R nm_utils_parse_inaddr nm_inet_parse_str
R nm_utils_parse_inaddr_bin nm_inet_parse_bin
R nm_utils_parse_inaddr_bin_full nm_inet_parse_bin_full
R nm_utils_parse_inaddr_prefix nm_inet_parse_with_prefix_str
R nm_utils_parse_inaddr_prefix_bin nm_inet_parse_with_prefix_bin
R test_nm_utils_ip6_address_same_prefix test_nm_ip_addr_same_prefix
./contrib/scripts/nm-code-format.sh -F
Since the generic netlink API does (currently) not support notifications
about changes of the MPTCP addresses, we won't get notifications when
they change, and it seems wrong to put such things in the NMPlatform
cache.
We can just get the list of endpoints by polling, so add a function
nm_platform_mptcp_addrs_dump() for that.
Also, add nm_platform_mptcp_addr_update() which can add/remove/update
MPTCP addresses.
CentOS 7's headers don't yet contains IFLA_BOND_PEER_NOTIF_DELAY.
Define it ourselves.
Fixes: f900f7bc2c ('platform: add netlink support for bond link')
sysfs is deprecated and kernel people will not add new bond options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
Don't mix <net/ethernet.h> and <linux/if_ether.h>.
Fixes the following build error with musl libc:
In file included from /usr/include/net/ethernet.h:10,
from ../src/libnm-platform/nm-linux-platform.c:17:
/usr/include/netinet/if_ether.h:115:8: error: redefinition of 'struct ethhdr'
115 | struct ethhdr {
| ^~~~~~
In file included from ../src/linux-headers/ethtool.h:19,
from ../src/libnm-std-aux/nm-linux-compat.h:22,
from ../src/libnm-platform/nm-linux-platform.c:10:
/usr/include/linux/if_ether.h:169:8: note: originally defined here
169 | struct ethhdr {
| ^~~~~~
Fixes: dc98ab807c ('platform: include "linux-headers" via "libnm-std-aux/nm-linux-compat.h"')
For IPv6, kernel does not accept the ifa_scope parameter and always
determines the scope based on the address itself.
For IPv4, it honors whatever scope the user sets via netlink.
NetworkManager does not allow to directly configure the address
scope, but autodetects it.
Use nm_platform_ip4_address_get_scope() for detecting the scopt.
This also fixes the issue that to detect loopback addresses 127.0.0.0/8
and use scope "host".
Try:
$ nmcli device modify "$IFACE" +ipv4.addresses 127.0.0.5/8
We have our own copy of linux kernel headers, and we must never
directly include the corresponding versions from the system.
Avoid that, by only including the clones via "libnm-std-aux/nm-linux-compat.h"
and by including the compat wrapper header before other system headers.
Some compiler versions don't like this. Workaround.
src/libnm-platform/nm-linux-platform.c: In function event_seq_check:
src/libnm-platform/nm-linux-platform.c:7254:1: error: label at end of compound statement
out:
^~~
Fixes: 3d4906a3da ('platform: add genl socket support for events and genl family')
That seems common. It's also done by genl-ctrl-list and
iproute2's genl tool.
Also, use avoid the leading zeros (0x1c instead of 0x001c).
iproute2's genl tool does the former, libnl3's genl-ctrl-list
does the latter.
We now cache the family ID for generic netlink protocols. However,
when we for example create a wireguard interface, the kernel module
might just get autoloaded. At this point, we didn't know the family ID
yet.
We already made an effort, that if the family ID is unknown during
nm_platform_genl_get_family_id(), we would try to poll the genl socket
in the hope there is a relevant event there. However, polling the socket
also means to potentially emit all signals for any change that happen.
We don't want that, if we currently are already polling the socket.
Instead, fallback to synchronously get the family ID.
$ sudo rmmod wireguard \
./tools/run-nm-test.sh -m src/core/platform/tests/test-link-linux -p /link/software/detect/wireguard/1/external
Fixes: 3d4906a3da ('platform: add genl socket support for events and genl family')
An NMPObject is hashable, can be compared and printed. That is useful.
Make an NMPObject for MPTCP addresses. It will hold the content of
MPTCP_PM_ATTR_ADDR netlink attribute. But like other NMPObject types it
will also be used to represent the data as NetworkManager tracks it.
- use proper integer types. A netlink message cannot be as large as
size_t, because the length is tracked in an uint32_t. Use the
right types.
- fields like "nlmsg_type" or "nlmsg_flags" are uint16_t. Use the
right types.
- note that nlmsg_size() still returns and accepts "int". Maybe
the should be adjusted too, but we use macros from kernel headers,
which also use int. Even if that is not the type of the length on
the binary protocol. So some of these functions still use int, to
be closer and compatible with <linux/netlink.h>.
For generic netlink, the family-id is important. It changes when
loading/unloading a module, so we should not cache it indefinitely.
To get this right, takes some effort. For "nl80211", "nl802154"
and "wireguard", we only cache the family ID in relation to an
interface. If the module gets unloaded, the family ID also becomes
irrelevant and we need to re-fetch it the next time.
For generic families like "mptcp_pm" or "ethtool", they are commonly not
kernel modules and cannot be unloaded. So caching them would be
(probably) fine.
Still. Some generic netlink families emit notifications, and it will
be interesting to be able to handle them. Since that will be useful later,
start by doing something simple: let the generic netlink family also be
cached this way. Generic netlink will send notifications when a family gets
added/deleted, and we can use that to reliably cache the family ID.
We only care about a well-known set of generic families. Unlike libnl
(which has "struct genl_family" object to handle any family), we can hard
code the few we care about (NMPGenlFamilyType).
This adds the necessary infrastructure of NMLinuxPlatform to listen to
events on the generic netlink socket.