Frequencies with the 'disabled' flag are supported by the driver but
disabled in the current regulatory domain. Don't add them to the list
of supported frequencies since they are not usable.
This is especially needed since commit f18bf17dea ('wifi: cleanup
ensure_hotspot_frequency()'), as now NetworkManager explicitly sets a
random, stable channel for Wi-Fi hotspots. If the choosen channel is
disabled, the hotspot fails to start.
Disabled channels are displayed in the 'iw phy' output as '(disabled)':
[...]
Frequencies:
* 2412 MHz [1] (30.0 dBm)
* 2417 MHz [2] (30.0 dBm)
* 2422 MHz [3] (30.0 dBm)
* 2427 MHz [4] (30.0 dBm)
* 2432 MHz [5] (30.0 dBm)
* 2437 MHz [6] (30.0 dBm)
* 2442 MHz [7] (30.0 dBm)
* 2447 MHz [8] (30.0 dBm)
* 2452 MHz [9] (30.0 dBm)
* 2457 MHz [10] (30.0 dBm)
* 2462 MHz [11] (30.0 dBm)
* 2467 MHz [12] (disabled)
* 2472 MHz [13] (disabled)
* 2484 MHz [14] (disabled)
Note that currently NM loads the list only at startup and therefore,
in case of a change of regulatory domain, a restart of the daemon is
needed to have the list updated. This needs to be improved.
https://bugzilla.redhat.com/show_bug.cgi?id=2062785
Fixes: f18bf17dea ('wifi: cleanup ensure_hotspot_frequency()')
We often create the source with default priority, no destroy function and
attach it to the default context (g_main_context_default()). For that
case, we have wrapper functions like nm_g_timeout_add_source()
and nm_g_idle_add_source(). Use those.
There should be no change in behavior.
This allows to fetch the information about the AP that CSME if connected
to. It'll allow us to connect to the exact same AP and shaving off the
scan from the connection, improving the connection time.
gcc-12.0.1-0.8.fc36 is annoying with false positives.
It's related to g_error() and its `for(;;) ;`.
For example:
../src/libnm-glib-aux/nm-shared-utils.c: In function 'nm_utils_parse_inaddr_bin_full':
../src/libnm-glib-aux/nm-shared-utils.c:1145:26: error: dangling pointer to 'error' may be used [-Werror=dangling-pointer=]
1145 | error->message);
| ^~
/usr/include/glib-2.0/glib/gmessages.h:343:32: note: in definition of macro 'g_error'
343 | __VA_ARGS__); \
| ^~~~~~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1133:31: note: 'error' declared here
1133 | gs_free_error GError *error = NULL;
| ^~~~~
/usr/include/glib-2.0/glib/gmessages.h:341:25: error: dangling pointer to 'addrbin' may be used [-Werror=dangling-pointer=]
341 | g_log (G_LOG_DOMAIN, \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
342 | G_LOG_LEVEL_ERROR, \
| ~~~~~~~~~~~~~~~~~~~~~~~
343 | __VA_ARGS__); \
| ~~~~~~~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1141:13: note: in expansion of macro 'g_error'
1141 | g_error("unexpected assertion failure: could parse \"%s\" as %s, but not accepted by "
| ^~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1112:14: note: 'addrbin' declared here
1112 | NMIPAddr addrbin;
| ^~~~~~~
I think the warning could potentially be useful and prevent real bugs.
So don't disable it altogether, but go through the effort to suppress it
at the places where it currently happens.
Note that NM_PRAGMA_WARNING_DISABLE_DANGLING_POINTER macro only expands
to suppressing the warning with __GNUC__ equal to 12. The purpose is to
only suppress the warning where we know we want to. Hopefully other gcc
versions don't have this problem.
I guess, we could also write a NM_COMPILER_WARNING() check in
"m4/compiler_options.m4", to disable the warning if we detect it. But
that seems too cumbersome.
As we iterate over "self->num_freqs", we must not modify "freqs",
otherwise, the second and subsequenty frequencies in self->freqs[i]
cannot match.
Fixes: dd8c546ff0 ('2007-12-27 Dan Williams <dcbw@redhat.com>')
Fixes: ba8527ca58 ('wifi: preliminary nl80211 patch')
Add support for IPv6 multipath routes, by treating them as single-hop
routes. Otherwise, we can easily end up with an inconsistent platform
cache.
Background:
-----------
Routes are hard. We have NMPlatform which is a cache of netlink objects.
That means, we have a hash table and we cache objects based on some
identity (nmp_object_id_equal()). So those objects must have some immutable,
indistinguishable properties that determine whether an object is the
same or a different one.
For routes and routing rules, this identifying property is basically a subset
of the attributes (but not all!). That makes it very hard, because tomorrow
kernel could add an attribute that becomes part of the identity, and NetworkManager
wouldn't recognize it, resulting in cache inconsistency by wrongly
thinking two different routes are one and the same. Anyway.
The other point is that we rely on netlink events to maintain the cache.
So when we receive a RTM_NEWROUTE we add the object to the cache, and
delete it upon RTM_DELROUTE. When you do `ip route replace`, kernel
might replace a (different!) route, but only send one RTM_NEWROUTE message.
We handle that by somehow finding the route that was replaced/deleted. It's
ugly. Did I say, that routes are hard?
Also, for IPv4 routes, multipath attributes are just a part of the
routes identity. That is, you add two different routes that only differ
by their multipath list, and then kernel does as you would expect.
NetworkManager does not support IPv4 multihop routes and just ignores
them.
Also, a multipath route can have next hops on different interfaces,
which goes against our current assumption, that an NMPlatformIP4Route
has an interface (or no interface, in case of blackhole routes). That
makes it hard to meaningfully support IPv4 routes. But we probably don't
have to, because we can just pretend that such routes don't exist and
our cache stays consistent (at least, until somebody calls `ip route
replace` *sigh*).
Not so for IPv6. When you add (`ip route append`) an IPv6 route that is
identical to an existing route -- except their multipath attribute -- then it
behaves as if the existing route was modified and the result is the
merged route with more next-hops. Note that in this case kernel will
only send a RTM_NEWROUTE message with the full multipath list. If we
would treat the multipath list as part of the route's identity, this
would be as if kernel deleted one routes and created a different one (the
merged one), but only sending one notification. That's a bit similar to
what happens during `ip route replace`, but it would be nightmare to
find out which route was thereby replaced.
Likewise, when you delete a route, then kernel will "subtract" the
next-hop and sent a RTM_DELROUTE notification only about the next-hop that
was deleted. To handle that, you would have to find the full multihop
route, and replace it with the remainder after the subtraction.
NetworkManager so far ignored IPv6 routes with more than one next-hop, this
means you can start with one single-hop route (that NetworkManger sees
and has in the platform cache). Then you create a similar route (only
differing by the next-hop). Kernel will merge the routes, but not notify
NetworkManager that the single-hop route is not longer a single-hop
route. This can easily cause a cache inconsistency and subtle bugs. For
IPv6 we MUST handle multihop routes.
Kernels behavior makes little sense, if you expect that routes have an
immutable identity and want to get notifications about addition/removal.
We can however make sense by it by pretending that all IPv6 routes are
single-hop! With only the twist that a single RTM_NEWROUTE notification
might notify about multiple routes at the same time. This is what the
patch does.
The Patch
---------
Now one RTM_NEWROUTE message can contain multiple IPv6 routes
(NMPObject). That would mean that nmp_object_new_from_nl() needs to
return a list of objects. But it's not implemented that way. Instead,
we still call nmp_object_new_from_nl(), and the parsing code can
indicate that there is something more, indicating the caller to call
nmp_object_new_from_nl() again in a loop to fetch more objects.
In practice, I think all RTM_DELROUTE messages for IPv6 routes are
single-hop. Still, we implement it to handle also multi-hop messages the
same way.
Note that we just parse the netlink message again from scratch. The alternative
would be to parse the first object once, and then clone the object and
only update the next-hop. That would be more efficient, but probably
harder to understand/implement.
https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20
To parse the RTA_MULTIHOP message, "policy" is not right (which is used
to parse the overall message). Instead, we don't really have a special
policy that we should use.
This was not a severe issue, because the allocated buffer (with
G_N_ELEMENTS(policy) elements) was larger than need be. And apparently,
using the wrong policy also didn't cause us to reject important
messages.
The general idea is that when we have entries tracked by the
route-manager, that we can mark them all as dirty. Then, calling the
"track" function will reset the dirty flag. Finally, there is a method
to delete all dirty entries.
As we can lookup an entry with O(1) (using dictionaries), we can
sync the list of tracked objects with O(n). We just need to track
all the ones we care about, and then delete those that were not touched
(that is, are still dirty).
Previously, we had to explicitly mark all entries as dirty. We can do
better. Just let nmp_route_manager_untrack_all() mark the survivors as
dirty right away. This way, we can save iterating the list once.
It also makes sense because the only purpose of the dirty flag is to
aid this prune mechanism with track/untrack-all. So, untrack-all can
just help out, and leave the remaining entries dirty, so that the next
track does the right thing.
We now track up to three kinds of object types in NMPRouteManager.
There is only one place, where we need to iterate over all objects of
the same type (e.g. all ipv4-routes), and that is nmp_route_manager_sync().
Previously, we only had one GHashTable with all the object, and when
iterating we had to skip over them after checking the type. That has some
overhead, but OK.
The ugliness with iterating over a GHashTable is that the order is non
deterministic. We should have a defined order in which things happen. To
achieve that, track three different CList, one for each object type.
Also, I expect that to be slightly faster, as you only have to iterate
over the list you care about.
NM_HASH_OBFUSCATE_PTR() is some snake-oil to not log raw pointer values.
It obviously makes debugging harder.
But we don't need to generate differently obfuscated pointer values.
At least, let most users use the same obfuscation, so that the values
are comparable.
Routes of type blackhole, unreachable, prohibit don't have an
ifindex/device. They are thus in many ways similar to routing rules,
as they are global. We need a mediator to keep track which routes
to configure.
This will be very similar to what NMPRulesManager already does for
routing rules. Rename the API, so that it also can be used for routes.
Renaming the file will be done next, so that git's rename detection
doesn't get too confused.
So far, certain NMObject types could not have an ifindex of zero. Hence,
nmp_lookup_init_object() took such an ifindex to mean lookup all objects
of that type.
Soon, we will support blackhole/unreachable/prohibit route types, which
have their ifindex set to zero. It is still useful to lookup those routes
types via nmp_lookup_init_object().
Change behaviour how to interpret the ifindex. Note that this also
affects various callers of nmp_lookup_init_object(). If somebody was
relying on the previous behavior, it would need fixing.
_vt_cmd_obj_is_alive_ipx_route() is called by nmp_object_is_alive().
Non-alive objects are not put into the cache.
That certainly makes sense for RTM_F_CLONED routes, because they are
generated ad-hoc during the `ip route get` request.
Checking for the ifindex is not necessary. For one, some route types
(blackhole, unreachable, prohibit) don't have an ifindex. Also, the
purpose of _vt_cmd_obj_is_alive_ipx_route() is not to validate the
object. Just don't create objects without an ifindex, if you think the
route needs an ifindex. Checking here is not useful.
We also don't check that other fields like rt_source are valid, so there
is no need to do it for the ifindex either.
Currently, for NMPlatformIP[46]Route always has a gateway, even if it's
possibly set to 0.0.0.0/::. Not sure whether kernel has a further
distinction between no-gateway and all-zero gateway.
Anyway. For us, a gateway of 0.0.0.0/:: means the same as having no
gateway. We cannot differentiate the two (nor do we need to).
Don't print that in nm_platform_ip[46]_route_to_string().
Also, because we are going to add blackhole route types, which cannot
have a next-hop. But we do this change for all routes types, because
it makes sense in general (and also what `ip route show` prints).
The variable with this purpose is usually called "IS_IPv4".
It's upper case, because usually this is a const variable, and because
it reminds of the NM_IS_IPv4(addr_family) macro. That letter case
is unusual, but it makes sense to me for the special purpose that this
variable has.
Anyway. The naming of this variable is a different point. Let's
use the variable name that is consistent and widely used.
We don't run glib-mkenums for certain sources like "core" and
"libnm-glib-aux".
These annotations have no effect. Drop them.
They also mess with the automated formatting.
We always call nl_recv() in a loop. If it would be necessary to clear
the variable, then it would need to happen inside the loop. But it's
not necessary.
Instead of allocating a receive buffer for each nl_recv() call, re-use a
pre-allocated buffer.
The buffer is part of NMPlatform and kept around. As we would not have more
than one NMPlatform instance per netns, we waste a limited amount of
memory.
The buffer gets initialized with 32k, which should be large enough for
any rtnetlink message that we might receive. As before, if the buffer
would be too small, then the first large message would be lost (as we don't
peek). But then we would allocate a larger buffer, resync the platform cache
and recover.
Add parameter to accept a pre-allocated buffer for nl_recv(). In
practice, rtnetlink messages are not larger than 32k, so we can always
pre-allocate it and avoid the need to heap allocate it.
In the past, nl_recv() was libnl3 code and thus used malloc()/realloc() to
allocate the buffer. That meant, we should free the buffer with libc's free()
function instead of glib's g_free(). That is what nm_auto_free is for.
Nowadays, nl_recv() is forked and glib-ified, and uses the glib wrappers to
allocate the buffer. Thus the buffer should also be freed with g_free()
(and gs_free).
In practice there is no difference, because glib's allocation directly
uses libc's malloc/free. This is purely a matter of style.
We need to support systems where there are hundereds of thousand routes
(e.g. BGP software).
For that, we will ignore routes that have a rtm_protocol value which was
certainly not created by NetworkManager. Before we had a BPF filer to
filter them out, but that had issues and is removed for now. Still,
don't put those routes in the platform cache and ignore them early.
Note that we still deserialize the RTM_NEWROUTE message to a NMPObject,
and don't shortcut earlier. The reason is that we should still call
delayed_action_refresh_all_in_progress() and handle the RTM_GETROUTE
response, even if the route has a different protocol. So we error out
later, shortly before putting the object in the cache. This means we
will malloc() a NMPObject and initialize it, but that is probably cheap,
compared to the first problem that the process already had to wake up
and read the netlink socket.
This restores the effective behavior of the BPF filter, albeit with a
higher overhead, as the route is rejected later. The important part for
now is that we stick to the behavior of not caching certain routes of a
certain protocol. If that can in the future be optimized (e.g. by a new
BPF filter), then we should do that. But for now that would be only a
future performance improvement, which requires new profiling first and
that has not the highest priority. Note that not caching certain routes
already should reduce the largest part of the overhead that those routes
brings. Whether this form is sufficient to reach the expected
performance goals, needs to be measured in the future.
The socket BPF filter operates on the entire message (skb). One netlink
message (for RTM_NEWROUTE) usually contains a list of routes (struct
nlmsghdr), but the BPF filter was only looking at the first route to decide
whether to throw away the entire message. That is wrong.
This causes that we miss routes. It also means that the response to our
RTM_GETROUTE dump request could be filtered and we poll/wait (unsuccessfully)
for it:
<trace> [1641829930.4111] platform-linux: netlink: read: wait for ACK for sequence number 26...
To get this right, the BPF filter would have to look at all routes in the
list to find out whether there are any we want to receive (and if
there are any, to pass the entire message, regardless that is might also
contain routes we want to ignore). As BPF does not support loops, that
is not trivial. Arguably, we still could do something where we look at a
bounded, unrolled number of routes, in the hope that there are not more
routes in one message that we support. The problem is that this BPF
filter is supposed to filter out massive amounts of routes, so most
messages will be filled to the brim with routes and we would have to
expect a high number of routes in the RTM_NEWROUTE that need checking.
We could still ignore routes in _new_from_nl_route(), to enter the
cache. That means, we would catch them later than with the BPF filter,
but still before a lot of the overhead of handling them. That probably
will be done next, but for now revert the commit.
This reverts commit e9ca5583e5.
https://bugzilla.redhat.com/show_bug.cgi?id=2037411
The force-commit flag is used to force the commit of an address or a
route from DHCP/RA even when it was removed from platform externally
(for example because it expired). Routes generated from the l3cd
should also have the flag set.
Without this, NM properly re-adds the DHCP address after the lease is
lost and obtained again, but fails to add the prefix-route.
Fixes: 2838b1c5e8 ('core: track force-commit flag for l3cd and platform objects')
https://bugzilla.redhat.com/show_bug.cgi?id=2033991https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1049
Try to debug a hang in platform code, presumably during poll().
This logging seems useful for debugging this particular issue,
but it might be useful in general.
Routing daemons can add a large amount of routes to the
system. Currently NM receives netlink notifications for all those
routes and exposes them on D-Bus. With many routes, the daemon becomes
increasingly slow and uses a lot of memory.
The rtm_protocol field of the route indicates the source of the
route. From /usr/include/linux/rtnetlink.h, the allowed values are:
#define RTPROT_UNSPEC 0
#define RTPROT_REDIRECT 1 /* Route installed by ICMP redirects;
not used by current IPv4 */
#define RTPROT_KERNEL 2 /* Route installed by kernel */
#define RTPROT_BOOT 3 /* Route installed during boot */
#define RTPROT_STATIC 4 /* Route installed by administrator */
/* Values of protocol >= RTPROT_STATIC are not interpreted by kernel;
they are just passed from user and back as is.
It will be used by hypothetical multiple routing daemons.
Note that protocol values should be standardized in order to
avoid conflicts.
*/
#define RTPROT_GATED 8 /* Apparently, GateD */
#define RTPROT_RA 9 /* RDISC/ND router advertisements */
#define RTPROT_MRT 10 /* Merit MRT */
#define RTPROT_ZEBRA 11 /* Zebra */
#define RTPROT_BIRD 12 /* BIRD */
#define RTPROT_DNROUTED 13 /* DECnet routing daemon */
#define RTPROT_XORP 14 /* XORP */
#define RTPROT_NTK 15 /* Netsukuku */
#define RTPROT_DHCP 16 /* DHCP client */
#define RTPROT_MROUTED 17 /* Multicast daemon */
#define RTPROT_KEEPALIVED 18 /* Keepalived daemon */
#define RTPROT_BABEL 42 /* Babel daemon */
#define RTPROT_OPENR 99 /* Open Routing (Open/R) Routes */
#define RTPROT_BGP 186 /* BGP Routes */
#define RTPROT_ISIS 187 /* ISIS Routes */
#define RTPROT_OSPF 188 /* OSPF Routes */
#define RTPROT_RIP 189 /* RIP Routes */
#define RTPROT_EIGRP 192 /* EIGRP Routes */
Since NM uses only values <= RTPROT_STATIC, plus RTPROT_RA and
RTPROT_DHCP, add a BPF filter to the netlink socket to discard
notifications for other route types.
https://bugzilla.redhat.com/show_bug.cgi?id=1861527https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1038