We need more information what failed. Don't only return success/failure,
but an error number.
Note that we still don't actually return an error number. Only
the link_add() function is changed to return an nm-error integer.
Platform had it's own scheme for reporting errors: NMPlatformError.
Before, NMPlatformError indicated success via zero, negative integer
values are numbers from <errno.h>, and positive integer values are
platform specific codes. This changes now according to nm-error:
success is still zero. Negative values indicate a failure, where the
numeric value is either from <errno.h> or one of our error codes.
The meaning of positive values depends on the functions. Most functions
can only report an error reason (negative) and success (zero). For such
functions, positive values should never be returned (but the caller
should anticipate them).
For some functions, positive values could mean additional information
(but still success). That depends.
This is also what systemd does, except that systemd only returns
(negative) integers from <errno.h>, while we merge our own error codes
into the range of <errno.h>.
The advantage is to get rid of one way how to signal errors. The other
advantage is, that these error codes are compatible with all other
nm-errno values. For example, previously negative values indicated error
codes from <errno.h>, but it did not entail error codes from netlink.
While nm_utils_inet*_ntop() accepts a %NULL buffer to fallback
to a static buffer, don't do that.
I find the possibility of using a static buffer here error prone
and something that should be avoided. There is of course the downside,
that in some cases it requires an additional line of code to allocate
the buffer on the stack as auto-variable.
Now that we have other helper function on platfrom for setting
IP configuration sysctls, rename the function to set the hop-limit
to match the pattern.
Checking whether the link exists in the cache, before talking to kernel
serves no purpose.
- in all cases, the caller already has a good indication that the link
in fact exists. That is, because the caller makes decisions on what to
do, based on what platform told it earlier. Thus, the check usually succeeds
anyway.
- in the unexpected case it doesn't succeed, we
- should not silently return without logging at least a message
- we possibly still want to send the netlink message to kernel,
just to have it fail. Note that the ifindex is indeed the identifier
for the link, so there is no danger of accidentally killing the
wrong link.
Well, theoretically there is, because the kernel's ifindex counter can
wrap or get reused when moving links between namespaces. But checking
the cache would not protect against that anyway! Worst case, the cache
would already have the impostor link and would not prevent from doing
the wrong thing. After all, they do have the same identifier, so how
would we know that this is in fact a different link?
We want that all code paths assert strictly and gracefully.
That means, if we have function nm_platform_link_get() which calls
nm_platform_link_get_obj(), then we don't need to assert the same things
twice. Don't have the calling function assert itself, if it is obvious
that the first thing that it does, is calling a function that itself
asserts the same conditions.
On the other hand, it simply indicates a bug passing a non-positive
ifindex to any of these platform functions. No longer let
nm_platform_link_get_obj() handle negative ifindex gracefully. Instead,
let it directly pass it to nmp_cache_lookup_link(), which eventually
does a g_return_val_if_fail() check. This quite possible enables
assertions on a lot of code paths. But note that g_return_val_if_fail()
is graceful and does not lead to a crash (unless G_DEBUG=fatal-criticals
is set for debugging).
We want to assert for valid input arguments, but we don't want
multiple assertions for the same.
Move the assertion from nm_platform_link_get() to
nm_platform_link_get_obj().
That way, nm_platform_link_get_obj() also checks the input arguments.
At the same time, nm_platform_link_get() gets simpler and still does
the same amount of assertions.
Add helper nm_platform_link_get_ifi_flags() to access the
ifi-flags.
This replaces the internal API _link_get_flags() and makes it public.
However, the return value also allows to distinguish between errors
and valid flags.
Also, consider non-visible links. These are links that are in netlink,
but not visible in udev. The ifi-flags are inherrently netlink specific,
so it seems wrong to pretend that the link doesn't exist.
In the past, the headers "linux/if.h" and "net/if.h" were incompatible.
That means, we can either include one or the other, but not both.
This is fixed in the meantime, however the issue still exists when
building against older kernel/glibc.
That means, including one of these headers from a header file
is problematic. In particular if it's a header like "nm-platform.h",
which itself is dragged in by many other headers.
Avoid that by not including these headers from "platform.h", but instead
from the source files where needed (or possibly from less popular header
files).
Currently there is no problem. However, this allows an unknowing user to
include <net/if.h> at the same time with "nm-platform.h", which is easy
to get wrong.
- previously, parsing wireguard genl data resulted in memory corruption:
- _wireguard_update_from_allowedips_nla() takes pointers to
allowedip = &g_array_index (buf->allowedips, NMWireGuardAllowedIP, buf->allowedips->len - 1);
but resizing the GArray will invalidate this pointer. This happens
when there are multiple allowed-ips to parse.
- there was some confusion who owned the allowedips pointers.
_wireguard_peers_cpy() and _vt_cmd_obj_dispose_lnk_wireguard()
assumed each peer owned their own chunk, but _wireguard_get_link_properties()
would not duplicate the memory properly.
- rework memory handling for allowed_ips. Now, the NMPObjectLnkWireGuard
keeps a pointer _allowed_ips_buf. This buffer contains the instances for
all peers.
The parsing of the netlink message is the complicated part, because
we don't know upfront how many peers/allowed-ips we receive. During
construction, the tracking of peers/allowed-ips is complicated,
via a CList/GArray. At the end of that, we prettify the data
representation and put everything into two buffers. That is more
efficient and simpler for user afterwards. This moves complexity
to the way how the object is created, vs. how it is used later.
- ensure that we nm_explicit_bzero() private-key and preshared-key. However,
that only works to a certain point, because our netlink library does not
ensure that no data is leaked.
- don't use a "struct sockaddr" union for the peer's endpoint. Instead,
use a combintation of endpoint_family, endpoint_port, and
endpoint_addr.
- a lot of refactoring.
Also, add two more features "tx-tcp-segmentation" and
"tx-tcp6-segmentation". There are two reasons for that:
- systemd-networkd supports setting these two features,
so lets support them too (apparently they are important
enough for networkd).
- these two features are already implicitly covered by "tso".
Like for the "ethtool" program, "tso" is an alias for several
actual features. By adding two features that are already
also covered by an alias (which sets multiple kernel names
at once), we showcase how aliases for the same feature can
coexist. In particular, note how setting
"tso on tx-tcp6-segmentation off" will behave as one would
expect: all 4 tso features covered by the alias are enabled,
except that particular one.
We commonly don't use the glib typedefs for char/short/int/long,
but their C types directly.
$ git grep '\<g\(char\|short\|int\|long\|float\|double\)\>' | wc -l
587
$ git grep '\<\(char\|short\|int\|long\|float\|double\)\>' | wc -l
21114
One could argue that using the glib typedefs is preferable in
public API (of our glib based libnm library) or where it clearly
is related to glib, like during
g_object_set (obj, PROPERTY, (gint) value, NULL);
However, that argument does not seem strong, because in practice we don't
follow that argument today, and seldomly use the glib typedefs.
Also, the style guide for this would be hard to formalize, because
"using them where clearly related to a glib" is a very loose suggestion.
Also note that glib typedefs will always just be typedefs of the
underlying C types. There is no danger of glib changing the meaning
of these typedefs (because that would be a major API break of glib).
A simple style guide is instead: don't use these typedefs.
No manual actions, I only ran the bash script:
FILES=($(git ls-files '*.[hc]'))
sed -i \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>\( [^ ]\)/\1\2/g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\> /\1 /g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>/\1/g' \
"${FILES[@]}"
Add platform support for IP6GRE and IP6GRETAP tunnels. The former is a
virtual tunnel interface for GRE over IPv6 and the latter is the L2
variant.
The platform code internally reuses and extends the same structure
used by IPv6 tunnels.
Certain platform operations are logged both in nm-platform.c and
nm-linux-platform.c, resulting in duplicate messages. Drop log prints
from the latter.
Add support for a new wireguard link type to the platform code. For now
this only covers querying existing links via genetlink and parsing them
into platform objects.
Devices of different link types can actually have the same MAC address.
We'll want to use this to find a device of a particular type by its
hardware address.
The order we want to enforce is only among addresses with the same
scope, as the kernel always keeps addresses sorted by
scope. Therefore, apply the same sorting to known addresses, so that
we don't try to unnecessary change the order of addresses with
different scopes.
https://bugzilla.redhat.com/show_bug.cgi?id=1578668
Coccinelle:
@@
expression a, b;
@@
-a ? a : b
+a ?: b
Applied with:
spatch --sp-file ternary.cocci --in-place --smpl-spacing --dir .
With some manual adjustments on spots that Cocci didn't catch for
reasons unknown.
Thanks to the marvelous effort of the GNU compiler developer we can now
spare a couple of bits that could be used for more important things,
like this commit message. Standards commitees yet have to catch up.
Without ifindex, adding the direct route to gateway fails:
platform: route-sync: failure to add IPv6 route: fd02::/64 via fd01::1 dev 1635 metric 101 mss 0 rt-src user: No route to host (113); try adding direct route to gateway fd01::1/128 via :: metric 101 mss 0 rt-src user
platform: route: append IPv6 route: fd01::1/128 via :: metric 101 mss 0 rt-src user
platform-linux: delayed-action: schedule wait-for-nl-response (seq 269, timeout in 0.199999195, response-type 0)
platform-linux: delayed-action: handle wait-for-nl-response (any)
platform-linux: netlink: recvmsg: new message NLMSG_ERROR, flags 0, seq 269
platform-linux: netlink: recvmsg: error message from kernel: No such device (19) for request 269
Fixes: c9f89cafdf
Since commit 78ed0a4a23 (device: add
IPv6 link local address via merge-and-apply) we handle also IPv6 link
local addresses like regular addresses. That is, we also add them during
merge-and-apply and sync them via nm_platform_ip6_address_sync().
ip6-address-sync loops over the platform addresses, to find which
addresses shall be deleted, and which shall be deleted in order to
fix the address order/priority. At that point, we must not ignore
link-local addresses anymore, but handle them too.
Otherwise, during each resync we have link local addresses, and
platform-sync thinks that the address order is wrong. That wrongly
leads to remove most addresses and re-adding them.
Fixes: 78ed0a4a23
For completeness, extend the API to support non-persistant
device. That requires that nm_platform_link_tun_add()
returns the file descriptor.
While NetworkManager doesn't create such devices itself,
it recognizes the IFLA_TUN_PERSIST / IFF_PERSIST flag.
Since ip-tuntap (obviously) cannot create such devices,
we cannot add a test for how non-persistent devices look
in the platform cache. Well, we could instead add them
with ioctl directly, but instead, just extend the platform
API to allow for that.
Also, use the function from test-lldp.c to (optionally) use
nm_platform_link_tun_add() to create the tap device.
Switch from "pi on|off" to optinally printing "pi" to indicate
whether the flag is set. That follows ip-tuntap syntax and is
more familiar:
$ ip tuntap help
Usage: ip tuntap { add | del | show | list | lst | help } [ dev PHYS_DEV ]
[ mode { tun | tap } ] [ user USER ] [ group GROUP ]
[ one_queue ] [ pi ] [ vnet_hdr ] [ multi_queue ] [ name NAME ]
Where: USER := { STRING | NUMBER }
GROUP := { STRING | NUMBER }
Also, print the "persist" flag.
Kernel does not all allow to configure a route via a gateway, if the
gateway is not directly reachable.
For non-manually added routes (e.g. from DHCP), we ignore them as a
server configuration errors. For manually added routes, we try to work
around them.
Note that if the user adds a manual route that references a gateway,
maybe he should be required to also add a matching onlink route for
the gateway (or an address that results in a device-route), otherwise
the configuration could be considered invalid. That was however not
done historically, and also, it seems a rather unhelpful behavior.
NetworkManage should just make it work, not not assume anything is
wrong with the configuration. Similarly, for IPv4, the user could
configure the route as onlink, however, that still requires extra
configuration of which the user might not be aware.
This would apply for example, when a connection has method=auto,
and would obtain the routes automatically. It seems sensible to
allow the user to add a route via the gateway, if he ~knows~ that
this particular network will provide such a configuration via DHCP.
In the past however, we tried not to automatically add a device route,
but instead see whether we will get a suitable route via DHCP. If we
wouldn't get such a route, we would however fail the connection.
However, this is really very hard to get right.
We call ip_config_merge_and_apply() possibly before receiving automatic
IP configuration (commit 7070d17ced, "device: reset
@con_ip6_config on failure before RA"). In this case, we could not yet
configure the route. Instead, we also cannot fail (yet), because we should
wait whether we will receive a route that makes this configuration
feasable.
That is hard to get right. How long should we wait? If we get a DHCP lease
and still cannot add the route, should we fail the IP configuration or wait
longer for another lease? Worse, if we decide to fail the IP configuration,
it might not fail the entire activation. Instead, we would only mark the
current address family as failed. If we later get a DHCP lease, should we
retry to add the route again? -- probably yes. If we still fail, we would
need to keep the IP configuration in failed state, regardless that DHCP
succeeded. Part of the problem is, that we are bad at tracking the
failed state per IP method. So, if manual configuration fails but DHCP
succeeds, we get the state wrong. That should be fixed separately, but it
just shows how hard it is to have this route that we currently cannot
add, and wanting to wait for something that might never come, but still
fail at some point.
Instead, if we cannot add a route due to a missing onlink gateway,
just retry and add the /32 or /128 direct route ourself.
Note that for IPv6 routes that have a "src" address which is still
TENTATIVE, we also cannot currently add the route and retry later.
However, that is fundamentally different, because:
- the configuration here is correct, it's only that the address
didn't yet pass IPv6 DAD and kernel is being unhelpful (rh#1457196).
- we only have to wait a few seconds for DAD to complete or fail.
So, it's easy to implement this sensibly.
Move handling non-NM_IP_CONFIG_SOURCE_USER routes first. These are
routes that were added manually by the user in the connection.
Note that there is no change in behavior, because of how
_err_inval_due_to_ipv6_tentative_pref_src() would only accept
user routes already.
Kernel recently got support for exposing TUN/TAP information on netlink
[1], [2], [3]. Add support for it to the platform cache.
The advantage of using netlink is that querying sysctl bypasses the
order of events of the netlink socket. It is out of sync and racy. For
example, platform cache might still think that a tun device exists, but
a subsequent lookup at sysfs might fail because the device was deleted
in the meantime. Another point is, that we don't get change
notifications via sysctl and that it requires various extra syscalls
to read the device information. If the tun information is present on
netlink, put it into the cache. This bypasses checking sysctl while
we keep looking at sysctl for backward compatibility until we require
support from kernel.
Notes:
- we had two link types NM_LINK_TYPE_TAP and NM_LINK_TYPE_TUN. This
deviates from the model of how kernel treats TUN/TAP devices, which
makes it more complicated. The link type of a NMPlatformLink instance
should match what kernel thinks about the device. Point in case,
when parsing RTM_NETLINK messages, we very early need to determine
the link type (_linktype_get_type()). However, to determine the
type of a TUN/TAP at that point, we need to look into nested
netlink attributes which in turn depend on the type (IFLA_INFO_KIND
and IFLA_INFO_DATA), or even worse, we would need to look into
sysctl for older kernel vesions. Now, the TUN/TAP type is a property
of the link type NM_LINK_TYPE_TUN, instead of determining two
different link types.
- various parts of the API (both kernel's sysctl vs. netlink) and
NMDeviceTun vs. NMSettingTun disagree whether the PI is positive
(NM_SETTING_TUN_PI, IFLA_TUN_PI, NMPlatformLnkTun.pi) or inverted
(NM_DEVICE_TUN_NO_PI, IFF_NO_PI). There is no consistent way,
but prefer the positive form for internal API at NMPlatformLnkTun.pi.
- previously NMDeviceTun.mode could not change after initializing
the object. Allow for that to happen, because forcing some properties
that are reported by kernel to not change is wrong, in case they
might change. Of course, in practice kernel doesn't allow the device
to ever change its type, but the type property of the NMDeviceTun
should not make that assumption, because, if it actually changes, what
would it mean?
- note that as of now, new netlink API is not yet merged to mainline Linus
tree. Shortcut _parse_lnk_tun() to not accidentally use unstable API
for now.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1277457
[2] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=1ec010e705934c8acbe7dbf31afc81e60e3d828b
[3] https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=118eda77d6602616bc523a17ee45171e879d1818https://bugzilla.redhat.com/show_bug.cgi?id=1547213https://github.com/NetworkManager/NetworkManager/pull/77
NMTST_ASSERT_PLATFORM_NETNS_CURRENT() already checks that the current namespace
is correct. Remove the duplicate assertion.
Also, NMP_CACHE_OPS_UNCHANGED is numerically identical to NM_PLATFORM_SIGNAL_NONE.
Use it in the assertion.