- use proper integer types. A netlink message cannot be as large as
size_t, because the length is tracked in an uint32_t. Use the
right types.
- fields like "nlmsg_type" or "nlmsg_flags" are uint16_t. Use the
right types.
- note that nlmsg_size() still returns and accepts "int". Maybe
the should be adjusted too, but we use macros from kernel headers,
which also use int. Even if that is not the type of the length on
the binary protocol. So some of these functions still use int, to
be closer and compatible with <linux/netlink.h>.
For generic netlink, the family-id is important. It changes when
loading/unloading a module, so we should not cache it indefinitely.
To get this right, takes some effort. For "nl80211", "nl802154"
and "wireguard", we only cache the family ID in relation to an
interface. If the module gets unloaded, the family ID also becomes
irrelevant and we need to re-fetch it the next time.
For generic families like "mptcp_pm" or "ethtool", they are commonly not
kernel modules and cannot be unloaded. So caching them would be
(probably) fine.
Still. Some generic netlink families emit notifications, and it will
be interesting to be able to handle them. Since that will be useful later,
start by doing something simple: let the generic netlink family also be
cached this way. Generic netlink will send notifications when a family gets
added/deleted, and we can use that to reliably cache the family ID.
We only care about a well-known set of generic families. Unlike libnl
(which has "struct genl_family" object to handle any family), we can hard
code the few we care about (NMPGenlFamilyType).
This adds the necessary infrastructure of NMLinuxPlatform to listen to
events on the generic netlink socket.
Currently there is no problem. However, DelayedActionType is a packed
enum, and if we add a few more enum values, it might happen that
DELAYED_ACTION_TYPE_MAX is 0x8000 and DelayedActionType effectively
uint16_t.
When that happens, the code would become an infinite loop, because
0x8000 is not larger than DELAYED_ACTION_TYPE_MAX, but `<<= 1`
shifts out the bit, making it zero.
Avoid that.
If nm_platform_get_cache_tc() is disabled, there is no need to refresh
it. Filter those flags out.
Also, don't duplicate the code and add a helper function
delayed_action_schedule_refresh_all().
Reorder fields in DelayedActionWaitForNlResponseData, so that
the struct size is optimal due to the alignment constraints.
Also, when we remember enum values and embed them somewhere, it's nice
if they only take the space actually needed. _nm_packed solves that.
The term "addr_family" is used very frequently, and it usually is an
auto variable or a function parameter.
It is interesting to search where this field is used. So rename to give
it a unique (and better fitting) name.
While at it, use gint8 to encode the addr_family. It's always
sufficient, and this reduces the size of RefreshAllInfo from 8 bytes
to two.
In the past, nmp_lookup_init_object() could both lookup all object for a
certain ifindex, and lookup all objects of a type. That fallback path
already leads to an assertion failure fora while now, so nobody should
be using this function to lookup all objects of a certain type (for
what, we have nmp_lookup_init_obj_type()).
Now, remove the fallback path, and rename the function to what it really
does.
The real purpose is that we set the socket options before bind().
For that, we need to be able to specify the flag during nl_socket_new().
Another reason is that these are common questions to ponder while
creating a netlink socket. There shouldn't be several setter functions,
just specify the flag right away. These parameters are not going to
change afterwards (at least, we don't need/use that and we don't have
API for that either).
We will need this, for getting nl_pktinfo control messages
that contain the extended destination group number.
Also, drop NL_SOCK_PASSCRED. It was only used to not iterate over the
control messages, but doing that should be cheap.
There really is no need for two(!) heap allocations while parsing
the netlink message. We already have it in the buffer. Just use it.
Note that netlink attributes need to be aligned to 4 bytes. But
nlmsg_next() already ensures that, so not even for alignment purpose we
need to clone the message.
Create a new "struct nl_msg_lite" that can hold pointers to everything
we need.
Whether we use a socket blockingly or non-blocking is usually determined
upfront and does not change. Make it a parameter of nl_socket_new().
Also, it saves an additional syscall.
- "priv->nlh" to "priv->sk_rtnl": as we also have an genl socket,
"nlh" is not a good name. The point is that this is rtnetlink.
Also, "h" sounds like a handle, that is, a file descriptor.
Make this clearer with a "sk_" prefix.
- "priv->genl" to "priv->sk_genl_sync": This socket is only used for synchronous
operations, that is, it is passed to various independent components, that use
it to send a request and wait for the response (while consuming all messages).
We will have a use for a second socket, hence the "_sync" part.
The "sk_" prefix is for consistency with "sk_rtnl".
- "priv->event_source" to "priv->rtnl_event_source". Just make it
clearer, that this is for the rtnetlink socket. In any case,
this field is hardly used at all, it can have a sturdy name.
Sockets are really a fundamental thing we require to operate.
We cannot meaningfully operate, if we fail to create them.
That is also why a too low file descriptor limit is fatal
and unsupported. This is similar with out of memory situations.
Just require that we always are able to create the generic
netlink socket.
There are only two callers of nl_socket_new(). One for NETLINK_GENERIC
and one for NETLINK_ROUTE.
We already were enabling ext-ack for the rtnetlink socket. Also enable
it for the genl socket.
Do that, but just moving this inside nl_socket_new(). I cannot imagine a
case where we don't want this.
Create and use new nl_socket_new().
nl_socket_alloc() really does nothing but allocating the struct and
initializing the fd to -1. In all cases, we want to call nl_connect()
right after.
Combine the two. Then we also cannot have a "struct nl_sock" without a
valid fd. This means several error checks can be dropped.
Note that former nl_connect() did several things at once. Maybe, for
more flexibility one would need to tweak what should be done there.
For now that is not necessary. In any case, if we need more flexibility,
then we would control what nl_connect() (now nl_socket_new()) does, and not
the split between nl_socket_alloc() and nl_connect().
These string functions allow to omit the string buffer. This is for
convenience, to use a global (thread-local) buffer. I think that is
error prone and we should drop that "convenience" feature.
At various places, pass a stack allocated buffer.
(cherry picked from commit b87afac8e8)
We often create the source with default priority, no destroy function and
attach it to the default context (g_main_context_default()). For that
case, we have wrapper functions like nm_g_timeout_add_source()
and nm_g_idle_add_source(). Use those.
There should be no change in behavior.
This allows to fetch the information about the AP that CSME if connected
to. It'll allow us to connect to the exact same AP and shaving off the
scan from the connection, improving the connection time.
Add support for IPv6 multipath routes, by treating them as single-hop
routes. Otherwise, we can easily end up with an inconsistent platform
cache.
Background:
-----------
Routes are hard. We have NMPlatform which is a cache of netlink objects.
That means, we have a hash table and we cache objects based on some
identity (nmp_object_id_equal()). So those objects must have some immutable,
indistinguishable properties that determine whether an object is the
same or a different one.
For routes and routing rules, this identifying property is basically a subset
of the attributes (but not all!). That makes it very hard, because tomorrow
kernel could add an attribute that becomes part of the identity, and NetworkManager
wouldn't recognize it, resulting in cache inconsistency by wrongly
thinking two different routes are one and the same. Anyway.
The other point is that we rely on netlink events to maintain the cache.
So when we receive a RTM_NEWROUTE we add the object to the cache, and
delete it upon RTM_DELROUTE. When you do `ip route replace`, kernel
might replace a (different!) route, but only send one RTM_NEWROUTE message.
We handle that by somehow finding the route that was replaced/deleted. It's
ugly. Did I say, that routes are hard?
Also, for IPv4 routes, multipath attributes are just a part of the
routes identity. That is, you add two different routes that only differ
by their multipath list, and then kernel does as you would expect.
NetworkManager does not support IPv4 multihop routes and just ignores
them.
Also, a multipath route can have next hops on different interfaces,
which goes against our current assumption, that an NMPlatformIP4Route
has an interface (or no interface, in case of blackhole routes). That
makes it hard to meaningfully support IPv4 routes. But we probably don't
have to, because we can just pretend that such routes don't exist and
our cache stays consistent (at least, until somebody calls `ip route
replace` *sigh*).
Not so for IPv6. When you add (`ip route append`) an IPv6 route that is
identical to an existing route -- except their multipath attribute -- then it
behaves as if the existing route was modified and the result is the
merged route with more next-hops. Note that in this case kernel will
only send a RTM_NEWROUTE message with the full multipath list. If we
would treat the multipath list as part of the route's identity, this
would be as if kernel deleted one routes and created a different one (the
merged one), but only sending one notification. That's a bit similar to
what happens during `ip route replace`, but it would be nightmare to
find out which route was thereby replaced.
Likewise, when you delete a route, then kernel will "subtract" the
next-hop and sent a RTM_DELROUTE notification only about the next-hop that
was deleted. To handle that, you would have to find the full multihop
route, and replace it with the remainder after the subtraction.
NetworkManager so far ignored IPv6 routes with more than one next-hop, this
means you can start with one single-hop route (that NetworkManger sees
and has in the platform cache). Then you create a similar route (only
differing by the next-hop). Kernel will merge the routes, but not notify
NetworkManager that the single-hop route is not longer a single-hop
route. This can easily cause a cache inconsistency and subtle bugs. For
IPv6 we MUST handle multihop routes.
Kernels behavior makes little sense, if you expect that routes have an
immutable identity and want to get notifications about addition/removal.
We can however make sense by it by pretending that all IPv6 routes are
single-hop! With only the twist that a single RTM_NEWROUTE notification
might notify about multiple routes at the same time. This is what the
patch does.
The Patch
---------
Now one RTM_NEWROUTE message can contain multiple IPv6 routes
(NMPObject). That would mean that nmp_object_new_from_nl() needs to
return a list of objects. But it's not implemented that way. Instead,
we still call nmp_object_new_from_nl(), and the parsing code can
indicate that there is something more, indicating the caller to call
nmp_object_new_from_nl() again in a loop to fetch more objects.
In practice, I think all RTM_DELROUTE messages for IPv6 routes are
single-hop. Still, we implement it to handle also multi-hop messages the
same way.
Note that we just parse the netlink message again from scratch. The alternative
would be to parse the first object once, and then clone the object and
only update the next-hop. That would be more efficient, but probably
harder to understand/implement.
https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20
To parse the RTA_MULTIHOP message, "policy" is not right (which is used
to parse the overall message). Instead, we don't really have a special
policy that we should use.
This was not a severe issue, because the allocated buffer (with
G_N_ELEMENTS(policy) elements) was larger than need be. And apparently,
using the wrong policy also didn't cause us to reject important
messages.
The variable with this purpose is usually called "IS_IPv4".
It's upper case, because usually this is a const variable, and because
it reminds of the NM_IS_IPv4(addr_family) macro. That letter case
is unusual, but it makes sense to me for the special purpose that this
variable has.
Anyway. The naming of this variable is a different point. Let's
use the variable name that is consistent and widely used.
We always call nl_recv() in a loop. If it would be necessary to clear
the variable, then it would need to happen inside the loop. But it's
not necessary.
Instead of allocating a receive buffer for each nl_recv() call, re-use a
pre-allocated buffer.
The buffer is part of NMPlatform and kept around. As we would not have more
than one NMPlatform instance per netns, we waste a limited amount of
memory.
The buffer gets initialized with 32k, which should be large enough for
any rtnetlink message that we might receive. As before, if the buffer
would be too small, then the first large message would be lost (as we don't
peek). But then we would allocate a larger buffer, resync the platform cache
and recover.