Add nm_platform_kernel_support_get_full() to allow fetching the support
state without setting it to the compile time default.
Also, use g_atomic_int_get() to access _nm_platform_kernel_support_state
values. We should not access static variables without synchronization.
Better get it correct in any case than fast.
This parameter really affects whether a candidate in @addresses_prune will be
considered or not. Since we already construct the prune list separately, this
parameter is at the wrong place.
This requires us to re-implement nm_platform_lookup_clone(). While the
function has a predicate callback that we could use for this purpose,
I will later add a separate predicate argument to
nm_platform_ip_address_get_prune_list(). When that happens, it would
be cumbersome to chain the two function pointers. Instead, reimplement
nm_platform_lookup_clone().
Follow the pattern of nm_platform_ip_route_sync(), which also accepts
the list of addresses that are potential candidates for removal.
This allows the caller to carefully construct the list of addresses
which are possibly removed, so that sync (possibly) only adds new
addresses.
It is beneficial to have both address families side by side.
A lot of operations are exactly the same, so it's preferable to see
that. Especially in the cases where they differ, it's preferable to see
how they differ (and why).
Userspace cannot add IPv6 routes with metric 0. Trying to do that, will
be coerced by kernel to route metric 1024. For IPv4 this is different,
and metric zero is commonly allowed.
However, kernel itself can add IPv6 routes with metric zero:
# ip -6 route show table local
local fe80::2029:c7ff:fec9:698a dev v proto kernel metric 0 pref medium
That means, we must not treat route metric zero special for most cases.
Only, when we want to add routes (based on user configuration), we must
coerce a route metric of zero to 1024.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/563
nm_platform_ethtool_init_ring() only has one caller. It's simpler to
drop the function and implement it at the only place where it is needed.
Maybe there could be a place for a function to initialize NMEthtoolRingState,
one option after the other. However, at the moment there is only one
user, so don't implement it.
This fixes various minor issues:
- the function had a NMPlatform argument, although the argument
is not used. Thus function merely operates on a NMEthtoolRingState
instance and shouldn't have a nm_platform_*() name.
- nm_platform_ethtool_init_ring() returned a boolean, but all
code paths (except assertion failures) returned success.
- as the function returned an error status, the caller was compelled
to handle an error that could never happen.
- the option was specified by name, although we already have a more
efficient way to express the option: the NMEthtoolID. Also, the
caller already needed to resolve the name to the NMEthtoolID, so
there was no need to again lookup the ID by name.
Rework qdisc synchronization. The previous implementation added all
known qdiscs and removed unneeded ones from platform; this had some
problems:
- kernel doesn't allow to add (with exclusive flag) a qdisc if one
with the same parent already exists;
- if we use the replace flag instead of add, then it becomes possible
to add a new qdisc with the same parent of an existing one. However
if the existing qdisc is of the same kind, kernel will try to to
change() it, which fails for some qdiscs (e.g. sfq).
- kernel doesn't allow to delete a qdisc with handle of zero because
that is the default qdisc and can only be replaced;
Fix that.
We already have NMEthtoolID to handle coalesce options in a way that is
convenient programmatically. That is, we can iterate over all valid
coalesce options (it's just an integer) and use that in a more generic
way.
If NMEthtoolCoalesceState names all fields explicitly, we need explicit
code that names each coalesce option. Especially since NMEthtoolCoalesceState
is an internal and intermediate data structure, this is cumbersome
and unnecessary.
Thereby it also fixes the issue that nm_platform_ethtool_init_coalesce() has a
NMPlatform argument without actually needing it.
nm_platform_ethtool_init_coalesce() does not operate on a NMPlatform
instance, and should not have the appearance of being a method of
NMPlatform.
Only in one moment we need the old and requested settings together:
during _ethtool_coalesce_set(). But for that we shouldn't track both
states in "NMEthtoolCoalesceState".
Simplify "NMEthtoolCoalesceState" to only contain one set of options.
By tracking less state, the code becomes simpler, because you don't
need to wonder where the old and requested state is used.
This is also what iproute2 does ([1]) when creating a default broadcast address
with `ip addr add 192.168.1.5/24 brd + dev eth0`.
Also, kernel does in fib_add_ifaddr() ([2]):
```
__be32 addr = ifa->ifa_local;
__be32 prefix = ifa->ifa_address & mask;
...
/* Add broadcast address, if it is explicitly assigned. */
if (ifa->ifa_broadcast && ifa->ifa_broadcast != htonl(0xFFFFFFFF))
fib_magic(RTM_NEWROUTE, RTN_BROADCAST, ifa->ifa_broadcast, 32,
prim, 0);
if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags & IFA_F_SECONDARY) &&
(prefix != addr || ifa->ifa_prefixlen < 32)) {
if (!(ifa->ifa_flags & IFA_F_NOPREFIXROUTE))
fib_magic(RTM_NEWROUTE,
dev->flags & IFF_LOOPBACK ? RTN_LOCAL : RTN_UNICAST,
prefix, ifa->ifa_prefixlen, prim,
ifa->ifa_rt_priority);
/* Add network specific broadcasts, when it takes a sense */
if (ifa->ifa_prefixlen < 31) {
fib_magic(RTM_NEWROUTE, RTN_BROADCAST, prefix, 32,
prim, 0);
fib_magic(RTM_NEWROUTE, RTN_BROADCAST, prefix | ~mask,
32, prim, 0);
}
}
```
Which means by default kernel already adds those special broadcast routes which
are identical to what we configure with IFA_BROADCAST. However, kernel too bases
them on the peer (IFA_ADDRESS).
[1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/ip/ipaddress.c?id=d5391e186f04214315a5a80797c78e50ad9f5271#n2380
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/fib_frontend.c?id=bef1d88263ff769f15aa0e1515cdcede84e61d15#n1109
- track the broadcast address in NMPlatformIP4Address. For addresses
that we receive from kernel and that we cache in NMPlatform, this
allows us to show the additional information. For example, we
can see it in debug logging.
- when setting the address, we still mostly generate our default
broadcast address. This is done in the only relevant caller
nm_platform_ip4_address_sync(). Basically, we merely moved setting
the broadcast address to the caller.
That is, because no callers explicitly set the "use_ip4_broadcast_address"
flag (yet). However, in the future some caller might want to set an explicit
broadcast address.
In practice, we currently don't support configuring special broadcast
addresses in NetworkManager. Instead, we always add the default one with
"address|~netmask" (for plen < 31).
Note that a main point of IFA_BROADCAST is to add a broadcast route to
the local table. Also note that kernel anyway will add such a
"address|~netmask" route, that is regardless whether IFA_BROADCAST is
set or not. Hence, setting it or not makes very little difference for
normal broadcast addresses -- because kernel tends to add this route either
way. It would make a difference if NetworkManager configured an unusual
IFA_BROADCAST address or an address for prefixes >= 31 (in which cases
kernel wouldn't add them automatically). But we don't do that at the
moment.
So, while what NM does has little effect in practice, it still seems
more correct to add the broadcast address, only so that you see it in
`ip addr show`.
In several cases, the layer 2 and layer 3 type are very similar, also from
kernel's point of view. For example, "gre"/"gretap" and "ip6tnl"/"ip6gre"/"ip6gretap"
and "macvlan"/"macvtap".
While it makes sense that these have different NMLinkType types
(NM_LINK_TYPE_MACV{LAN,TAP}) and different NMPObject types
(NMPObjectLnkMacv{lan,tap}), it makes less sense that they have
different NMPlatformLnk* structs.
Remove the NMPlatformLnkMacvtap typedef. A typedef does not make things simpler,
but is rather confusing. Because several API that we would usually have, does
not exist for the typedef (e.g. there is no nm_platform_lnk_macvtap_to_string()).
Note that we also don't have such a typedef for NMPlatformLnkIp6Tnl
and NMPlatformLnkGre, which has the same ambiguity between the link type
and the struct with the data.
The abbreviations "ns" and "ms" seem not very clear to me. Spell them
out to nsec/msec. Also, in parts we already used the longer abbreviations,
so it wasn't consistent.
- systemd-networkd and initscripts both support it.
- it seems suggested to configure routes with scope "link" on AWS.
- the scope is only supported for IPv4 routes. Kernel ignores the
attribute for IPv6 routes.
- we don't support the aliases like "link" or "global". Instead
only the numeric value is supported. This is different from
systemd-networkd, which accepts names like "global" and "link",
but no numerical values. I think restricting ourself only to
the aliases unnecessarily limits what is possible on netlink.
The alternative would be to allow aliases and numbers both,
but that causes multiple ways to define something and has
thus downsides. So, only numeric values.
- when setting rtm_scope to RT_SCOPE_NOWHERE (0, the default), kernel
will coerce that to RT_SCOPE_LINK. This ambiguity of nowhere vs. link
is a problem, but we don't do anything about it.
- The other problem is, that when deleting a route with scope RT_SCOPE_NOWHERE,
this acts as a wild care and removes the first route that matches (given the
other route attributes). That means, NetworkManager has no meaningful
way to delete a route with scope zero, there is always the danger that
we might delete the wrong route. But this is nothing new to this
patch. The problem existed already previously, except that
NetworkManager could only add routes with scope nowhere (i.e. link).
Track whether IP addresses were added by NM or externally. In this way
it becomes possible in a later commit to add prefix route only for
addresses added by NM.
IPv6 router advertisement messages contain the following parameters
(RFC 4861):
- Reachable time: 32-bit unsigned integer. The time, in
milliseconds, that a node assumes a neighbor is reachable after
having received a reachability confirmation. Used by the Neighbor
Unreachability Detection algorithm. A value of zero means
unspecified (by this router).
- Retrans Timer: 32-bit unsigned integer. The time, in milliseconds,
between retransmitted Neighbor Solicitation messages. Used by
address resolution and the Neighbor Unreachability Detection
algorithm. A value of zero means unspecified (by this router).
Currently NM ignores them; however, since it leaves accept_ra=1, the
kernel parses RAs and applies those parameters for us [1].
In the next commit kernel handling of RAs will be disabled, so let NM
set those neighbor-related parameters.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ndisc.c?h=v5.2#n1353