For IPv6, kernel doesn't care. If the gateway is ::, you may or may
not set the onlink attribute. But for IPv4 routes, that gets rejected:
# ip route add 1.2.3.4/32 dev v onlink
Error: Invalid flags for nexthop - PERVASIVE and ONLINK can not be set.
Silently suppress setting the flag in that case and ignore the user
request. After all, the effect is probably the same (that is, the route
is onlink anyway).
(cherry picked from commit 8b14849877)
The major point of NMDedupMultiIndex is that it can de-duplicate
the objects. It thus makes sense the everybody is using the same
instance. Make the multi-idx instance of NMPlatform configurable.
This is not used outside of unit tests, because the daemon currently
always creates one platform instance and everybody then re-uses the
instance of the platform.
While this is (currently) only used by tests, and that the performance
optimization of de-duplicating is irrelevant for tests, this is still
useful. The test can then check whether two separate NMPlatform objects
shared the same instance and whether it was de-duplicated.
There really is no way around this. As we don't cache all the routes
(e.g. ignored based on rtm_protocol or rtm_type), we cannot know which
route was replaced, when we get a NLM_F_REPLACE message.
We need to request a new dump in that case, which can be expensive, if
there are a lot of routes or if replace happens frequently.
The only possible solutions would be:
1) NetworkManager caches all routes, but it also needs to make sure to
get *everything* right. In particular, to understand every relevant
route attribute (including those added in the future, which is
impossible).
2) kernel provides a reasonable API (rhbz#1337855, rhbz#1337860) that
allows to sufficiently understand what is going on based on the
netlink notifications.
When you issue
ip route replace broadcast 1.2.3.4/32 dev eth0
then this route may well replace a (unicast) route that we have in
the cache.
Previously, we would right away ignore such messages in
_new_from_nl_route(), which means we miss the fact that a route gets
replaced.
Instead, we need to parse the message at least so far, that we can
detect and handle the replace.
We don't cache certain routes, for example based on the protocol. This is
a performance optimization to ignore routes that we usually don't care
about.
Still, if the user does `ip route replace` with such a route, then we
need to pass it to nmp_cache_update_netlink_route(), so that we can
properly remove the replaced route.
Knowing which route was replaces might be impossible, as our cache does
not contain all routes. Likely all that nmp_cache_update_netlink_route()
can to is to set "resync_required" for NLM_F_REPLACE. But for that it
should see the object first.
This also means, if we ever write a BPF filter to filter out messages
that contain NLM_F_REPLACE, because that would lead to cache inconsistencies.
The route table is part of the weak-id. You can see that with:
ip route replace unicast 1.2.3.4/32 dev eth0 table 57
ip route replace unicast 1.2.3.4/32 dev eth0 table 58
afterwards, `ip route show table all` will list both routes. The replace
operation is only per-table. Note that NMP_CACHE_ID_TYPE_ROUTES_BY_WEAK_ID
already got this right.
Fixes: 10ac675299 ('platform: add support for routing tables to platform cache')
In kernel, the valid range for the weight is 1-256 (on netlink this is
expressed as u8 in rtnh_hops, ranging 0-255).
We need an additional value, to represent
- unset weight, for non-ECMP routes in kernel.
- in libnm API, to express routes that should not be merged as ECMP
routes (the default).
Extend the type in NMPlatformIP4Route.weight to u16, and fix the code
for the special handling of the numeric range.
Also the libnm API needs to change. Modify the type of the attribute on
D-Bus from "b" to "u", to use a 32 bit integer. We use 32 bit, because
we already have common code to handle 32 bit unsigned integers, despite
only requiring 257 values. It seems better to stick to a few data types
(u32) instead of introducing more, only because the range is limited.
Co-Authored-By: Fernando Fernandez Mancera <ffmancera@riseup.net>
Fixes: 1bbdecf5e1 ('platform: manage ECMP routes')
Sometimes the buffer space of the netlink socket runs out and we lose
the response to our link change:
<info> [1670321010.2952] platform-linux: netlink[rtnl]: read: too many netlink events. Need to resynchronize platform cache
<warn> [1670321010.3467] platform-linux: do-change-link[2]: failure changing link: internal failure 3
With 3 above being WAIT_FOR_NL_RESPONSE_RESULT_FAILED_RESYNC.
Let's try harder.
https://bugzilla.redhat.com/show_bug.cgi?id=2154350
This is not nice:
<warn> [1670321010.3467] platform-linux: do-change-link[2]: failure changing link: internal failure 3
Let's explain what "internal failure 3" is.
G_TYPE_CHECK_INSTANCE_CAST() can trigger a "-Wcast-align":
src/core/devices/nm-device-macvlan.c: In function 'parent_changed_notify':
/usr/include/glib-2.0/gobject/gtype.h:2421:42: error: cast increases required alignment of target type [-Werror=cast-align]
2421 | # define _G_TYPE_CIC(ip, gt, ct) ((ct*) ip)
| ^
/usr/include/glib-2.0/gobject/gtype.h:501:66: note: in expansion of macro '_G_TYPE_CIC'
501 | #define G_TYPE_CHECK_INSTANCE_CAST(instance, g_type, c_type) (_G_TYPE_CIC ((instance), (g_type), c_type))
| ^~~~~~~~~~~
src/core/devices/nm-device-macvlan.h:13:6: note: in expansion of macro 'G_TYPE_CHECK_INSTANCE_CAST'
13 | (G_TYPE_CHECK_INSTANCE_CAST((obj), NM_TYPE_DEVICE_MACVLAN, NMDeviceMacvlan))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Avoid that by using _NM_G_TYPE_CHECK_INSTANCE_CAST().
This can only be done for our internal usages. The public headers
of libnm are not changed.
For MACsec interfaces, kernel announces the parent ifindex in the
generic IFLA_LINK netlink attribute, which we save in
NMPlatformLink.parent. There is no need to have a dedicate member in
NMPlatformLnkMacsec.
The dedicate member was never set and during a restart of
NetworkManager the parent of the MACsec device could be unset leading
to a failed assertion:
act_stage2_config: assertion 'parent' failed
Fixes: 85103656e9 ('platform: add support for macsec links')
https://bugzilla.redhat.com/show_bug.cgi?id=2122564https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1481
With gcc-12.2.1-4.fc37 on i686 we get:
./src/libnm-platform/nmp-object.h: In function 'nmp_object_ref':
./src/libnm-platform/nmp-object.h:626:12: error: cast increases required alignment of target type [-Werror=cast-align]
626 | return (const NMPObject *) nm_dedup_multi_obj_ref((const NMDedupMultiObj *) obj);
| ^
cc1: all warnings being treated as errors
Work around that be increasing the alignment of NMDedupMultiObj.
It has no downsides, because we usually put a NMDedupMultiObj in heap
allocated memory, which is already suitably aligned. Or we put it on
the stack, where wasting a few bytes for the alignment doesn't matter.
We basically never embed NMDedupMultiObj in an array where the increase
of alignment would waste additional space.
The warning "-Wcast-align=strict" seems useful and will be enabled
next. Fix places that currently cause the warning by using the
new macro NM_CAST_ALIGN(). This macro also nm_assert()s that the alignment
is correct.
We put all these structs inside the tagged union NMPObject.
Also, in a sense NMPlatformObject is the base "type" of all
these structs, meaning, it should be able to up and downcast.
Ensure the alignment matches.
This helps to avoid "-Wcast-align" warnings when trying to cast
a (NMPlatformObject*) to another (NMPlatformXXX *) type. Something
we commonly do.
Support managing the loopback interface through NM as the users want to
set the proper mtu for loopback interface when forwarding the packets.
Additionally, the IP addresses, DNS, route and routing rules are also
allowed to configure for the loopback connection profiles.
https://bugzilla.redhat.com/show_bug.cgi?id=2060905
clang-3.4.2-9.el7 does not like nesting NM_MAX() macro inside nm_hash_update_vals() macro.
Workaround by using MAX() instead. NM_MAX() uses an expression statement and NM_UNIQ()
to evaluate the arguments only once. We don't need that here and glib's MAX() suffices.
CC src/libnm-platform/src_libnm_platform_libnm_platform_la-nm-platform.lo
../src/libnm-platform/nm-platform.c:8247:53: error: in-class initializer for static data member is not a constant expression
(guint8) NM_MAX(obj->weight, 1u));
^
../src/libnm-std-aux/nm-std-aux.h:399:40: note: expanded from macro 'NM_MAX'
#define NM_MAX(a, b) __NM_MAX(NM_UNIQ, a, NM_UNIQ, b)
^
../src/libnm-std-aux/nm-std-aux.h:402:39: note: expanded from macro '__NM_MAX'
typeof(a) NM_UNIQ_T(A, aq) = (a); \
^
../src/libnm-glib-aux/nm-hash-utils.h:124:36: note: expanded from macro 'nm_hash_update_vals'
NM_HASH_COMBINE_VALS(_val, __VA_ARGS__); \
^
Fixes: 8cc41d41fe ('platform: add NM_PLATFORM_IP_ROUTE_CMP_TYPE_ECMP_ID for comparing ECMP base route')
This is the version shipped in Fedora 37. As Fedora 37 is now out, the
core developers switch to it. Our gitlab-ci will also use that as base
image for the check-{patch.tree} tests and to generate the pages. There
is a need that everybody agrees on which clang-format version to use,
and that version should be the one of the currently used Fedora release.
Also update the used Fedora image in "contrib/scripts/nm-code-format-container.sh"
script.
The gitlab-ci still needs update in the following commit. The change
in isolation will break the "check-tree" test.
We sometimes have functions foo() and foo_full(), in which case
foo() has fewer arguments and just calls foo_full(). The "full"
function here is the more powerful one, and foo() is implemented
in terms of the former.
nm_platform_ip4_route_cmp_full() and m_platform_ip4_route_cmp() inverted
that pattern. The "_full" there stands for the full comparison, to not
allowing to select the comparison type.
That inconsistency is ugly. Also, these wrappers were used at only few
places. Let's drop them.
While at it, also drop nm_platform_qdisc_cmp() and rename
nm_platform_qdisc_cmp_full(). Here cmp()/cmp_full() followed the common
pattern foo()/foo_full(), but it's still hardly used and unnecessary.
When adding a new route we need to consider it contains extra nexthops
i.e it is a ECMP route. As we cannot modify the NMPObject once created,
we need to pass the extra nexthops as an argument.
We cannot use the original NMPObject because normalization is happening
during when adding the route.
When reading from netlink an ECMP IPv4 route, we need to parse the
multiple nexthops. In order to do that, we are introducing
NMPlatformIP4RtNextHop struct.
The first nexthop information will be kept at the original
NMPlatformIP4Route and the new property n_nexthops will indicate how
many nexthops we need to consider.
The NMPObject is a tagged union. There is no need to initialize anything
after the size of the actually used union field. Change this, so maybe we
get a valgrind warning about uninitialized memory if we wrongly try to
access it.
On the other hand, the object really is supposed to be a full
NMPObject. Previously, we would get a valgrind warning, if we tried to
pass fewer data there.
It really doesn't matter much, but all other functions don't assume
that there is any important data after the size indicated by the class.
We will extend IPv4 routes with the list of next hops. This field will
be heap allocated and be part of the NMPObjectIP4Route object, while
also being part of the identity. To support the ID operator that checks
fields of the NMPObject, add a "for_id" argument to the hash/cmp hooks.
Also, a function that sets cmd_obj_{hash_update,cmp}() MUST not set
cmd_plobj_id_{hashupdate,cmp}(), as it would have overlapping
functionality. Therefore, the objects that define
cmd_obj_{hash_update,cmp}() need to fully implement the ID comparison.
A NMPClass that has data outside the plobj part, needs
to implement the cmd_obj_*() hooks, instead of cmd_plobj_*().
For those objects, reasoning only about the plobj part is not
sufficient. Implementing both hooks is also unnecessary and
confusing.
Ensure that if we have cmd_obj_*() hooks set, that the corresponding
cmd_plobj_*() hooks are unset.
The if-else-if was wrong. It meant that if an object did not implement
cmd_plobj_id_copy(), nothign was copied (for id-only).
I think this code path was not actually hit, because we never clone
an object only by ID.
Fixes: c91a4617a1 ('nmp-object: allow missing implementations for certain virtual functions')
If an address is removed during pruning and it had the TENTATIVE flag
before, the most likely cause of the removal is that it failed DAD. It
could also be that the user removed it at the same time we needed to
resync the platform cache, but that seems more unlikely.
Cleanup the handling of close().
First of all, closing an invalid (non-negative) file descriptor (EBADF) is
always a serious bug. We want to catch that. Hence, we should use nm_close()
(or nm_close_with_error()) which asserts against such bugs. Don't ever use
close() directly, to get that additional assertion.
Also, our nm_close() handles EINTR internally and correctly. Recent
POSIX defines that on EINTR the close should be retried. On Linux,
that is never correct. After close() returns, the file descriptor is
always closed (or invalid). nm_close() gets this right, and pretends
that EINTR is a success (without retrying).
The majority of our file descriptors are sockets, etc. That means,
often an error from close isn't something that we want to handle. Adjust
nm_close() to return no error and preserve the caller's errno. That is
the appropriate reaction to error (ignoring it) in most of our cases.
And error from close may mean that there was an IO error (except EINTR
and EBADF). In a few cases, we may want to handle that. For those
cases we have nm_close_with_error().
TL;DR: use almost always nm_close(). Unless you want to handle the error
code, then use nm_close_with_error(). Never use close() directly.
There is much reading on the internet about handling errors of close and
in particular EINTR. See the following links:
https://lwn.net/Articles/576478/https://askcodes.net/coding/what-to-do-if-a-posix-close-call-fails-https://www.austingroupbugs.net/view.php?id=529https://sourceware.org/bugzilla/show_bug.cgi?id=14627https://news.ycombinator.com/item?id=3363819https://peps.python.org/pep-0475/
When there are many VFs the default buffer size of 1 memory page is
not enough. Each VF can take up to ~120 bytes and so when the page
size is 4KiB at most ~34 VFs can be added.
Specify the buffer size when allocating the message.
Add a len argument to nlmsg_alloc() and nlmsg_alloc_simple(). After
that, nlmsg_alloc_size() can be dropped. Also, rename
nlmsg_alloc_simple() to nlmsg_alloc_new().