Since commit 528a63d9cc ('platform: avoid unnecessary configuration of
IP address in nm_platform_ip_address_sync()'), we no longer configure the
IP address if it is in the platform cache. But the cache might not be
up to date. Process any pending netlink events.
https://bugzilla.redhat.com/show_bug.cgi?id=2073926
Fixes: 528a63d9cc ('platform: avoid unnecessary configuration of IP address in nm_platform_ip_address_sync()')
These string functions allow to omit the string buffer. This is for
convenience, to use a global (thread-local) buffer. I think that is
error prone and we should drop that "convenience" feature.
At various places, pass a stack allocated buffer.
We call sync many times. Often there is nothing to update. Check the
cache first, before (re) adding it.
Note that many addresses have a limited lifetime, that is, a lifetime
that keeps counting down with seconds granularity. For those (common)
cases we will only avoid the call to kernel if there are two syncs
within less than a second.
We already have a comparison of NMPlatformIPXAddress with the modes
"full" and "id". The former is needed to fully compare two addresses,
the latter as identity for tracking addresses in the cache.
In NetworkManager we also use the NMPlatformIP[46]Address structure to
track the addresses we want to configure. When we add them in kernel,
we will later see them in the platform cache. However, some fields
will be slightly different. For example, "addr_source" address will
always be "kernel", because that one is not a field we configure in
kernel. Also, the "n_ifa_flags" probably differ (getting "permanent"
and "secondary" flags).
Add a compare function that can ignore such differences.
Also add nm_platform_vtable_address for accessing the IPv4 and IPv6
methods generically (based on an "IS_IPv4" variable).
nmp_utils_lifetime_get() calculates the lifetime of addresses,
and it bases the result on a "now" timestamp.
If you have two addresses and calculate their expiry, then we want to
base it on top of the same "now" timestamp, meaning, we should
only call nm_utils_get_monotonic_timestamp_sec() once. This is also a
performance optimization. But much more importantly, when we make a
comparison at a certain moment, we need that all sides have the same
understanding of the current timestamp.
But nmp_utils_lifetime_get() does not always require the now timestamp.
And the caller doesn't know, whether it will need it (short of knowing
how nmp_utils_lifetime_get() is implemented). So, make the now parameter
an in/out argument. If we pass in an already valid now timestamp, use
that. Otherwise, fetch the current time and also return it.
It is rather unlikely, that we call this function with no existing
routes/addresses. Hence, usually this does not safe an allocation
of the GPtrArray.
However, it's slightly less code and makes more sense this way
(instead of checking afterwards, whether the array is empty and
destroy it).
The entire point of the dance in nm_platform_ip_address_sync() is to ensure that
conflicting IPv4 addresses are in their right order, that is, they have
the right primary/secondary flag.
Kernel only sets secondary flags for addresses that are in the same
subnet, and we also only care about the relative order of addresses
that are in the same subnet. In particular, because we rely on kernel's
"secondary" flag to implement this.
But kernel only treads addresses as secondary, if they share the exact
same subnet. For example, 192.168.0.5/24 and 192.168.0.6/25 would not
be treated as primary/secondary but just as unrelated addresses, even if
the address cleared of it's host part is the same.
This means, we must not only hash the network part of the addresses, but
also the prefix length. Implement that, by tracking the full NMPObject.
None of the callers really handle the return value of nm_platform_ip_address_sync()
or whether the function encountered problems. What would they anyway do
about that?
For IPv4 we were already ignoring errors to add addresses, but for IPv6 we
aborted. That seems wrong. As the caller does not really handle errors,
I think we should follow through and add all addresses in case of error.
Still, also collect a overall "success" of the function and return it.
In the past, nm_platform_ip_address_sync() only had the @known_addresses
argument. We would figure out which addresses to delete and which to preserve,
based on what addresses were known. That means, @known_addresses must have contained
all the addresses we wanted to preserve, even the external ones. That approach
was inherently racy.
Instead, nowadays we have the addresses we want to configure (@known_addresses)
and the addresses we want to delete (@prune_addresses). This started to change in
commit dadfc3abd5 ('platform: allow injecting the list of addresses to prune'),
but only commit 58287cbcc0 ('core: rework IP configuration in NetworkManager using
layer 3 configuration') actually changed to pass separate @prune_addresses argument.
However, the order of IP addresses matters and there is no sensible kernel API
to configure the order (short of adding them in the right order), we still need
to look at all the addresses, check their order, and possibly delete some.
That is, we need to handle addresses we want to delete (@prune_addresses)
but still look at all addresses in platform (@plat_addresses) to check
their order.
Now, first handle @prune_addresses. That's simple. These are just the
addresses we want to delete. Second, get the list of all addresses in
platform (@plat_addresses) and check the order.
Note that if there is an external address that interferes with our
desired order, we will leave it untouched. Thus, such external addresses
might prevent us from getting the order as desired. But that's just
how it is. Don't add addresses outside of NetworkManager to avoid that.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
Frequencies with the 'disabled' flag are supported by the driver but
disabled in the current regulatory domain. Don't add them to the list
of supported frequencies since they are not usable.
This is especially needed since commit f18bf17dea ('wifi: cleanup
ensure_hotspot_frequency()'), as now NetworkManager explicitly sets a
random, stable channel for Wi-Fi hotspots. If the choosen channel is
disabled, the hotspot fails to start.
Disabled channels are displayed in the 'iw phy' output as '(disabled)':
[...]
Frequencies:
* 2412 MHz [1] (30.0 dBm)
* 2417 MHz [2] (30.0 dBm)
* 2422 MHz [3] (30.0 dBm)
* 2427 MHz [4] (30.0 dBm)
* 2432 MHz [5] (30.0 dBm)
* 2437 MHz [6] (30.0 dBm)
* 2442 MHz [7] (30.0 dBm)
* 2447 MHz [8] (30.0 dBm)
* 2452 MHz [9] (30.0 dBm)
* 2457 MHz [10] (30.0 dBm)
* 2462 MHz [11] (30.0 dBm)
* 2467 MHz [12] (disabled)
* 2472 MHz [13] (disabled)
* 2484 MHz [14] (disabled)
Note that currently NM loads the list only at startup and therefore,
in case of a change of regulatory domain, a restart of the daemon is
needed to have the list updated. This needs to be improved.
https://bugzilla.redhat.com/show_bug.cgi?id=2062785
Fixes: f18bf17dea ('wifi: cleanup ensure_hotspot_frequency()')
We often create the source with default priority, no destroy function and
attach it to the default context (g_main_context_default()). For that
case, we have wrapper functions like nm_g_timeout_add_source()
and nm_g_idle_add_source(). Use those.
There should be no change in behavior.
This allows to fetch the information about the AP that CSME if connected
to. It'll allow us to connect to the exact same AP and shaving off the
scan from the connection, improving the connection time.
gcc-12.0.1-0.8.fc36 is annoying with false positives.
It's related to g_error() and its `for(;;) ;`.
For example:
../src/libnm-glib-aux/nm-shared-utils.c: In function 'nm_utils_parse_inaddr_bin_full':
../src/libnm-glib-aux/nm-shared-utils.c:1145:26: error: dangling pointer to 'error' may be used [-Werror=dangling-pointer=]
1145 | error->message);
| ^~
/usr/include/glib-2.0/glib/gmessages.h:343:32: note: in definition of macro 'g_error'
343 | __VA_ARGS__); \
| ^~~~~~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1133:31: note: 'error' declared here
1133 | gs_free_error GError *error = NULL;
| ^~~~~
/usr/include/glib-2.0/glib/gmessages.h:341:25: error: dangling pointer to 'addrbin' may be used [-Werror=dangling-pointer=]
341 | g_log (G_LOG_DOMAIN, \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
342 | G_LOG_LEVEL_ERROR, \
| ~~~~~~~~~~~~~~~~~~~~~~~
343 | __VA_ARGS__); \
| ~~~~~~~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1141:13: note: in expansion of macro 'g_error'
1141 | g_error("unexpected assertion failure: could parse \"%s\" as %s, but not accepted by "
| ^~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1112:14: note: 'addrbin' declared here
1112 | NMIPAddr addrbin;
| ^~~~~~~
I think the warning could potentially be useful and prevent real bugs.
So don't disable it altogether, but go through the effort to suppress it
at the places where it currently happens.
Note that NM_PRAGMA_WARNING_DISABLE_DANGLING_POINTER macro only expands
to suppressing the warning with __GNUC__ equal to 12. The purpose is to
only suppress the warning where we know we want to. Hopefully other gcc
versions don't have this problem.
I guess, we could also write a NM_COMPILER_WARNING() check in
"m4/compiler_options.m4", to disable the warning if we detect it. But
that seems too cumbersome.
As we iterate over "self->num_freqs", we must not modify "freqs",
otherwise, the second and subsequenty frequencies in self->freqs[i]
cannot match.
Fixes: dd8c546ff0 ('2007-12-27 Dan Williams <dcbw@redhat.com>')
Fixes: ba8527ca58 ('wifi: preliminary nl80211 patch')
Add support for IPv6 multipath routes, by treating them as single-hop
routes. Otherwise, we can easily end up with an inconsistent platform
cache.
Background:
-----------
Routes are hard. We have NMPlatform which is a cache of netlink objects.
That means, we have a hash table and we cache objects based on some
identity (nmp_object_id_equal()). So those objects must have some immutable,
indistinguishable properties that determine whether an object is the
same or a different one.
For routes and routing rules, this identifying property is basically a subset
of the attributes (but not all!). That makes it very hard, because tomorrow
kernel could add an attribute that becomes part of the identity, and NetworkManager
wouldn't recognize it, resulting in cache inconsistency by wrongly
thinking two different routes are one and the same. Anyway.
The other point is that we rely on netlink events to maintain the cache.
So when we receive a RTM_NEWROUTE we add the object to the cache, and
delete it upon RTM_DELROUTE. When you do `ip route replace`, kernel
might replace a (different!) route, but only send one RTM_NEWROUTE message.
We handle that by somehow finding the route that was replaced/deleted. It's
ugly. Did I say, that routes are hard?
Also, for IPv4 routes, multipath attributes are just a part of the
routes identity. That is, you add two different routes that only differ
by their multipath list, and then kernel does as you would expect.
NetworkManager does not support IPv4 multihop routes and just ignores
them.
Also, a multipath route can have next hops on different interfaces,
which goes against our current assumption, that an NMPlatformIP4Route
has an interface (or no interface, in case of blackhole routes). That
makes it hard to meaningfully support IPv4 routes. But we probably don't
have to, because we can just pretend that such routes don't exist and
our cache stays consistent (at least, until somebody calls `ip route
replace` *sigh*).
Not so for IPv6. When you add (`ip route append`) an IPv6 route that is
identical to an existing route -- except their multipath attribute -- then it
behaves as if the existing route was modified and the result is the
merged route with more next-hops. Note that in this case kernel will
only send a RTM_NEWROUTE message with the full multipath list. If we
would treat the multipath list as part of the route's identity, this
would be as if kernel deleted one routes and created a different one (the
merged one), but only sending one notification. That's a bit similar to
what happens during `ip route replace`, but it would be nightmare to
find out which route was thereby replaced.
Likewise, when you delete a route, then kernel will "subtract" the
next-hop and sent a RTM_DELROUTE notification only about the next-hop that
was deleted. To handle that, you would have to find the full multihop
route, and replace it with the remainder after the subtraction.
NetworkManager so far ignored IPv6 routes with more than one next-hop, this
means you can start with one single-hop route (that NetworkManger sees
and has in the platform cache). Then you create a similar route (only
differing by the next-hop). Kernel will merge the routes, but not notify
NetworkManager that the single-hop route is not longer a single-hop
route. This can easily cause a cache inconsistency and subtle bugs. For
IPv6 we MUST handle multihop routes.
Kernels behavior makes little sense, if you expect that routes have an
immutable identity and want to get notifications about addition/removal.
We can however make sense by it by pretending that all IPv6 routes are
single-hop! With only the twist that a single RTM_NEWROUTE notification
might notify about multiple routes at the same time. This is what the
patch does.
The Patch
---------
Now one RTM_NEWROUTE message can contain multiple IPv6 routes
(NMPObject). That would mean that nmp_object_new_from_nl() needs to
return a list of objects. But it's not implemented that way. Instead,
we still call nmp_object_new_from_nl(), and the parsing code can
indicate that there is something more, indicating the caller to call
nmp_object_new_from_nl() again in a loop to fetch more objects.
In practice, I think all RTM_DELROUTE messages for IPv6 routes are
single-hop. Still, we implement it to handle also multi-hop messages the
same way.
Note that we just parse the netlink message again from scratch. The alternative
would be to parse the first object once, and then clone the object and
only update the next-hop. That would be more efficient, but probably
harder to understand/implement.
https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20
To parse the RTA_MULTIHOP message, "policy" is not right (which is used
to parse the overall message). Instead, we don't really have a special
policy that we should use.
This was not a severe issue, because the allocated buffer (with
G_N_ELEMENTS(policy) elements) was larger than need be. And apparently,
using the wrong policy also didn't cause us to reject important
messages.
The general idea is that when we have entries tracked by the
route-manager, that we can mark them all as dirty. Then, calling the
"track" function will reset the dirty flag. Finally, there is a method
to delete all dirty entries.
As we can lookup an entry with O(1) (using dictionaries), we can
sync the list of tracked objects with O(n). We just need to track
all the ones we care about, and then delete those that were not touched
(that is, are still dirty).
Previously, we had to explicitly mark all entries as dirty. We can do
better. Just let nmp_route_manager_untrack_all() mark the survivors as
dirty right away. This way, we can save iterating the list once.
It also makes sense because the only purpose of the dirty flag is to
aid this prune mechanism with track/untrack-all. So, untrack-all can
just help out, and leave the remaining entries dirty, so that the next
track does the right thing.
We now track up to three kinds of object types in NMPRouteManager.
There is only one place, where we need to iterate over all objects of
the same type (e.g. all ipv4-routes), and that is nmp_route_manager_sync().
Previously, we only had one GHashTable with all the object, and when
iterating we had to skip over them after checking the type. That has some
overhead, but OK.
The ugliness with iterating over a GHashTable is that the order is non
deterministic. We should have a defined order in which things happen. To
achieve that, track three different CList, one for each object type.
Also, I expect that to be slightly faster, as you only have to iterate
over the list you care about.
NM_HASH_OBFUSCATE_PTR() is some snake-oil to not log raw pointer values.
It obviously makes debugging harder.
But we don't need to generate differently obfuscated pointer values.
At least, let most users use the same obfuscation, so that the values
are comparable.