For routes and routing rules, kernel uses a certain (not stictly defined) set
of attributes to decide whether to routes/rules are identical.
That is a problem, as different kernel versions disagree on whether
two routes/rules are the same (EEXIST) or not.
Note that when NetworkManager tries to add a rule with protocol set to
anything but RTPROT_UNSPEC, then kernel will ignore the attribute if it
doesn't have support for it. Meaning: the added rule will have a
different protocol setting then intended.
Note that NMPRulesManager will add a rule if it doesn't find it in the
platform cache so far. That means, when looking into the platform cache
we must ignore or honor the protocol like kernel does.
This does not only affect FRA_PROTOCOL, but all attributes where kernel
and NetworkManager disagrees. But the protocol is the most prominent
one, because the rules tracked by nmp_rules_manager_track_default()
specify the protocol.
(cherry picked from commit ef4f8ccf6d)
Next we will need to detect more kernel features. First refactor the
handling of these to require less code changes and be more efficient.
A plain nm_platform_kernel_support_get() only reqiures to access an
array in the common case.
The other important change is that the function no longer requires a
NMPlatform instance. This allows us to check kernel support from
anywhere. The only thing is that we require kernel support to be
initialized before calling this function. That means, an NMPlatform
instance must have detected support before.
(cherry picked from commit ee269b318e)
In some cases it is convenient to specify ranges of bridge vlans, as
already supported by iproute2 and natively by kernel. With this commit
it becomes possible to add a range in this way:
nmcli connection modify eth0-slave +bridge-port.vlans "100-200 untagged"
vlan ranges can't be PVIDs because only one PVID vlan can exist.
https://bugzilla.redhat.com/show_bug.cgi?id=1652910
(cherry picked from commit 7093515777)
Routing rules are unlike addresses or routes not tied to an interface.
NetworkManager thinks in terms of connection profiles. That works well
for addresses and routes, as one profile configures addresses and routes
for one device. For example, when activating a profile on a device, the
configuration does not interfere with the addresses/routes of other
devices. That is not the case for routing rules, which are global, netns-wide
entities.
When one connection profile specifies rules, then this per-device configuration
must be merged with the global configuration. And when a device disconnects later,
the rules must be removed.
Add a new NMPRulesManager API to track/untrack routing rules. Devices can
register/add there the routing rules they require. And the sync method will
apply the configuration. This is be implemented on top of NMPlatform's
caching API.
Add and implement NMPlatformRoutingRule types and let the platform cache
handle rules.
Rules are special in two ways:
- they don't have an ifindex. That makes them different from all other
currently existing NMPlatform* types, which have an "ifindex" field and
"implement" NMPlatformObjWithIfindex.
- they have an address family, but contrary to addresses and routes, there
is only one NMPlatformRoutingRule object to handle both address
families.
Both of these points require some special considerations.
Kernel treats routing-rules quite similar to routes. That is, kernel
allows to add different rules/routes, as long as they differ in certain
fields. These "fields" make up the identity of the rules/routes. But
in practice, it's not defined which fields contribute to the identity
of these objects. That makes using the netlink API very hard. For
example, when kernel gains support for a new attribute which
NetworkManager does not know yet, then users can add two rules/routes
that look the same to NetworkManager. That can easily result in cache
inconsistencies.
Another problem is, that older kernel versions may not yet support all
fields, which NetworkManager (and newer kernels) considers for identity.
The older kernel will not simply reject netlink messages with these unknown
keys, instead it will proceed adding the route/rule without it. That means,
the added route/rule will have a different identity than what NetworkManager
intended to add.
The function is unused. It would require redesign to work with
future changes, and since it's unused, just drop it.
The long reasoning is:
Currently, a refresh-all is tied to an NMPObjectType. However, with
NMPObjectRoutingRule (for policy-routing-rules) that will no longer
be the case.
That is because NMPObjectRoutingRule will be one object type for
AF_INET and AF_INET6. Contrary to IPv4 addresses and routes, where
there are two sets of NMPObject types.
The reason is, that it's preferable to treat IPv4 and IPv6 objects
similarly, that is: as the same type with an address family property.
That also follows netlink, which uses RTM_GET* messages for both
address families, and the address family is expressed inside the
message.
But then an API like nm_platform_refresh_all() makes little sense,
it would require at least an addr_family argument. But since the
API is unused, just drop it.
Until now, all implemented NMPObject types have an ifindex field (from
links, addresses, routes, qdisc to tfilter).
The NMPObject structure contains a union of all available types, that
makes it easier to down-case from an NMPObject pointer to the actual
content.
The "object" field of NMPObject of type NMPlatformObject is the lowest
common denominator.
We will add NMPlatformRoutingRules (for policy routing rules). That type
won't have an ifindex field.
Hence, drop the "ifindex" field from NMPlatformObject type. But also add
a new type NMPlatformObjWithIfindex, that can represent all types that
have an ifindex.
We will need more flags.
WireGuard internal tools solve this by embedding the change flags inside
the structure that corresponds to NMPlatformLnkWireGuard. We don't do
that, NMPlatformLnkWireGuard is only for containing the information about
the link.
The caller may not wish to replace existing peers, but only update/add
the peers explicitly passed to nm_platform_link_wireguard_change().
I think that is in particular interesting, because for the most part
NetworkManager will configure the same set of peers over and over again
(whenever we resolve the DNS name of an IP endpoint of the WireGuard
peer).
At that point, it seems disruptive to drop all peers and re-add them
again. Setting @replace_peers to %FALSE allows to only update/add.
We still don't use getnameinfo(). This is used for logging,
where we want to see a string representation that is as close
as possible to the actual bytes (to spot differences). It should
not be obfuscated by a libc function out of our control.
Also fix the notation for the IPv6 scope ID to use the common '%'
character.
We need more information what failed. Don't only return success/failure,
but an error number.
Note that we still don't actually return an error number. Only
the link_add() function is changed to return an nm-error integer.
Platform had it's own scheme for reporting errors: NMPlatformError.
Before, NMPlatformError indicated success via zero, negative integer
values are numbers from <errno.h>, and positive integer values are
platform specific codes. This changes now according to nm-error:
success is still zero. Negative values indicate a failure, where the
numeric value is either from <errno.h> or one of our error codes.
The meaning of positive values depends on the functions. Most functions
can only report an error reason (negative) and success (zero). For such
functions, positive values should never be returned (but the caller
should anticipate them).
For some functions, positive values could mean additional information
(but still success). That depends.
This is also what systemd does, except that systemd only returns
(negative) integers from <errno.h>, while we merge our own error codes
into the range of <errno.h>.
The advantage is to get rid of one way how to signal errors. The other
advantage is, that these error codes are compatible with all other
nm-errno values. For example, previously negative values indicated error
codes from <errno.h>, but it did not entail error codes from netlink.
While nm_utils_inet*_ntop() accepts a %NULL buffer to fallback
to a static buffer, don't do that.
I find the possibility of using a static buffer here error prone
and something that should be avoided. There is of course the downside,
that in some cases it requires an additional line of code to allocate
the buffer on the stack as auto-variable.
Now that we have other helper function on platfrom for setting
IP configuration sysctls, rename the function to set the hop-limit
to match the pattern.
Checking whether the link exists in the cache, before talking to kernel
serves no purpose.
- in all cases, the caller already has a good indication that the link
in fact exists. That is, because the caller makes decisions on what to
do, based on what platform told it earlier. Thus, the check usually succeeds
anyway.
- in the unexpected case it doesn't succeed, we
- should not silently return without logging at least a message
- we possibly still want to send the netlink message to kernel,
just to have it fail. Note that the ifindex is indeed the identifier
for the link, so there is no danger of accidentally killing the
wrong link.
Well, theoretically there is, because the kernel's ifindex counter can
wrap or get reused when moving links between namespaces. But checking
the cache would not protect against that anyway! Worst case, the cache
would already have the impostor link and would not prevent from doing
the wrong thing. After all, they do have the same identifier, so how
would we know that this is in fact a different link?
We want that all code paths assert strictly and gracefully.
That means, if we have function nm_platform_link_get() which calls
nm_platform_link_get_obj(), then we don't need to assert the same things
twice. Don't have the calling function assert itself, if it is obvious
that the first thing that it does, is calling a function that itself
asserts the same conditions.
On the other hand, it simply indicates a bug passing a non-positive
ifindex to any of these platform functions. No longer let
nm_platform_link_get_obj() handle negative ifindex gracefully. Instead,
let it directly pass it to nmp_cache_lookup_link(), which eventually
does a g_return_val_if_fail() check. This quite possible enables
assertions on a lot of code paths. But note that g_return_val_if_fail()
is graceful and does not lead to a crash (unless G_DEBUG=fatal-criticals
is set for debugging).
We want to assert for valid input arguments, but we don't want
multiple assertions for the same.
Move the assertion from nm_platform_link_get() to
nm_platform_link_get_obj().
That way, nm_platform_link_get_obj() also checks the input arguments.
At the same time, nm_platform_link_get() gets simpler and still does
the same amount of assertions.
Add helper nm_platform_link_get_ifi_flags() to access the
ifi-flags.
This replaces the internal API _link_get_flags() and makes it public.
However, the return value also allows to distinguish between errors
and valid flags.
Also, consider non-visible links. These are links that are in netlink,
but not visible in udev. The ifi-flags are inherrently netlink specific,
so it seems wrong to pretend that the link doesn't exist.
In the past, the headers "linux/if.h" and "net/if.h" were incompatible.
That means, we can either include one or the other, but not both.
This is fixed in the meantime, however the issue still exists when
building against older kernel/glibc.
That means, including one of these headers from a header file
is problematic. In particular if it's a header like "nm-platform.h",
which itself is dragged in by many other headers.
Avoid that by not including these headers from "platform.h", but instead
from the source files where needed (or possibly from less popular header
files).
Currently there is no problem. However, this allows an unknowing user to
include <net/if.h> at the same time with "nm-platform.h", which is easy
to get wrong.
- previously, parsing wireguard genl data resulted in memory corruption:
- _wireguard_update_from_allowedips_nla() takes pointers to
allowedip = &g_array_index (buf->allowedips, NMWireGuardAllowedIP, buf->allowedips->len - 1);
but resizing the GArray will invalidate this pointer. This happens
when there are multiple allowed-ips to parse.
- there was some confusion who owned the allowedips pointers.
_wireguard_peers_cpy() and _vt_cmd_obj_dispose_lnk_wireguard()
assumed each peer owned their own chunk, but _wireguard_get_link_properties()
would not duplicate the memory properly.
- rework memory handling for allowed_ips. Now, the NMPObjectLnkWireGuard
keeps a pointer _allowed_ips_buf. This buffer contains the instances for
all peers.
The parsing of the netlink message is the complicated part, because
we don't know upfront how many peers/allowed-ips we receive. During
construction, the tracking of peers/allowed-ips is complicated,
via a CList/GArray. At the end of that, we prettify the data
representation and put everything into two buffers. That is more
efficient and simpler for user afterwards. This moves complexity
to the way how the object is created, vs. how it is used later.
- ensure that we nm_explicit_bzero() private-key and preshared-key. However,
that only works to a certain point, because our netlink library does not
ensure that no data is leaked.
- don't use a "struct sockaddr" union for the peer's endpoint. Instead,
use a combintation of endpoint_family, endpoint_port, and
endpoint_addr.
- a lot of refactoring.
Also, add two more features "tx-tcp-segmentation" and
"tx-tcp6-segmentation". There are two reasons for that:
- systemd-networkd supports setting these two features,
so lets support them too (apparently they are important
enough for networkd).
- these two features are already implicitly covered by "tso".
Like for the "ethtool" program, "tso" is an alias for several
actual features. By adding two features that are already
also covered by an alias (which sets multiple kernel names
at once), we showcase how aliases for the same feature can
coexist. In particular, note how setting
"tso on tx-tcp6-segmentation off" will behave as one would
expect: all 4 tso features covered by the alias are enabled,
except that particular one.