NetworkManager/src/libnm-platform
Íñigo Huguet 4d426f581d platform: avoid routes resync for routes that we don't track
When we recibe a Netlink message with a "route change" event, normally
we just ignore it if it's a route that we don't track (i.e. because of
the route protocol).

However, it's not that easy if it has the NLM_F_REPLACE flag because
that means that it might be replacing another route. If the kernel has
similar routes which are candidates for the replacement, it's hard for
NM to guess which one of those is being replaced (as the kernel doesn't
have a "route ID" or similar field to indicate it). Moreover, the kernel
might choose to replace a route that we don't have on cache, so we know
nothing about it.

It is important to note that we cannot just discard Netlink messages of
routes that we don't track if they has the NLM_F_REPLACE. For example,
if we are tracking a route with proto=static, we might receive a replace
message, changing that route to proto=other_proto_that_we_dont_track. We
need to process that message and remove the route from our cache.

As NM doesn't know what route is being replaced, trying to guess will
lead to errors that will leave the cache in an inconsistent state.
Because of that, it just do a cache resync for the routes.

For IPv4 there was an optimization to this: if we don't have in the
cache any route candidate for the replacement there are only 2 possible
options: either add the new route to the cache or discard it if we are
not interested on it. We don't need a resync for that.

This commit is extending that optimization to IPv6 routes. There is no
reason why it shouldn't work in the same way than with IPv4. This
optimization will only work well as long as we find potential candidate
routes in the same way than the kernel (comparing the same fields). NM
calls to this "comparing by WEAK_ID". But this can also happen with IPv4
routes.

It is worth it to enable this optimization because there are routing
daemons using custom routing protocols that makes tens or hundreds of
updates per second. If they use NLM_F_REPLACE, this caused NM to do a
resync hundreds of times per second leading to a 100% CPU usage:
https://issues.redhat.com/browse/RHEL-26195

An additional but smaller optimization is done in this commit: if we
receive a route message for routes that we don't track AND doesn't have
the NLM_F_REPLACE flag, we can ignore the entire message, thus avoiding
the memory allocation of the nmp_object. That nmp_object was going to be
ignored later, anyway, so better to avoid these allocations that, with
the routing daemon of the above's example, can happen hundreds of times
per second.

With this changes, the CPU usage doing `ip route replace` 300 times/s
drops from 100% to 1%. Doing `ip route replace` as fast as possible,
without any rate limitting, still keeps NM with a 3% CPU usage in the
system that I have used to test.
2024-04-30 13:13:46 +02:00
..
devlink sriov: allow reading empty eswitch paramaters via Devlink 2024-02-21 11:27:36 +01:00
tests platform: drop unused nl_socket_set_nonblocking() function 2023-04-04 08:43:20 +02:00
wifi wifi: fix enumeration of 6 GHz channels from wiphy 2024-04-02 16:12:57 +02:00
wpan all: use _NM_G_TYPE_CHECK_INSTANCE_CAST() for internal uses 2022-12-16 10:55:03 +01:00
meson.build platform: netlink: add devlink support 2024-02-21 11:27:28 +01:00
nm-linux-platform.c platform: avoid routes resync for routes that we don't track 2024-04-30 13:13:46 +02:00
nm-linux-platform.h platform: allow setting multi_idx instance for NMPlatform 2023-01-19 08:56:21 +01:00
nm-netlink.c platform/netlink: use nm_random_get_bytes() for initial seq value 2024-04-17 08:30:46 +00:00
nm-netlink.h clang-format: reformat code with clang-format 16.0.2-1.fc38 2023-05-19 10:53:13 +02:00
nm-platform-private.h format: reformat source tree with clang-format 13.0 2021-11-29 09:31:09 +00:00
nm-platform-utils.c all: use NM_MIN() instead of MIN() 2023-11-15 09:32:20 +01:00
nm-platform-utils.h ethtool: introduce EEE support 2023-11-03 15:41:21 +00:00
nm-platform.c Updated code format 2024-04-08 06:35:20 +00:00
nm-platform.h sriov: set the devlink's eswitch inline-mode and encap-mode 2024-02-21 11:27:32 +01:00
nmp-base.c platform: move NMPlatformIP[46]Address to "nmp-plobj.c" 2022-09-23 11:43:36 +02:00
nmp-base.h HSR: add support to HSR/PRP interface 2023-12-05 08:05:56 +01:00
nmp-global-tracker.c all: use c_list_is_empty_or_single() where appropriate 2023-03-08 15:34:47 +01:00
nmp-global-tracker.h platform: introduce function to globally track local route rule 2023-02-21 15:36:38 +01:00
nmp-netns.c platform: avoid printing raw pointer values in log 2023-01-19 08:56:21 +01:00
nmp-netns.h all: use _NM_G_TYPE_CHECK_INSTANCE_CAST() for internal uses 2022-12-16 10:55:03 +01:00
nmp-object.c platform: avoid routes resync for routes that we don't track 2024-04-30 13:13:46 +02:00
nmp-object.h HSR: add support to HSR/PRP interface 2023-12-05 08:05:56 +01:00
nmp-plobj.c all: use NM_MIN() instead of MIN() 2023-11-15 09:32:20 +01:00
nmp-plobj.h all: use NM_MIN_CONST()/NM_MAX_CONST() instead of MIN()/MAX() 2023-11-15 09:32:19 +01:00
README.md platform: support IPv6 mulitpath routes and fix cache inconsistency 2022-02-16 09:59:49 +01:00

libnm-platform

A static helper library that provides NMPlatform and other utils. This is NetworkManager's internal netlink library, but also contains helpers for sysfs, ethtool and other kernel APIs.

NMPlaform is also a cache of objects of the netlink API: NMPCache and NMPObject. These objects are used throughout NetworkManager also for generally tracking information about these types. For example, NMPlatformIP4Address (the public part of a certain type of NMPObject) is not only used to track platform addresses from netlink in the cache, but to track information about IPv4 addresses in general.

This depends on the following helper libraries