I think the code before was correct. At the very least because
we only run the while-loop at most once because multipath routes
are not supported.
However, it seems odd that the while loop checks for
"tlen >= rtnh->rtnh_len"
but later we do
"tlen -= RTNH_ALIGN (rtnh->rtnh_len)"
Well, arguably, tlen itself is aligned to 4 bytes (as kernel sends
the netlink message that way). So, it was indeed fine.
Still, confusing. Try to check more explicitly for the buffer sizes.
nla_data_as() has a static assertion that casting to the pointer is
valid (regarding the alignment of the structure). It also contains
an nm_assert() that the data is in fact large enough.
It's safer, hence prefer it a some places where it makes sense.
- nlmsg_append_struct() determines the size based on the argument.
It avoids typing, but more importantly: it avoids typing redundant
information (which we might get wrong).
- also, declare the structs as const, where possible.
- drop explicit MAX sizes like
static const struct nla_policy policy[IFLA_INET6_MAX+1] = {
The compiler will deduce that.
It saves redundant information (which is possibly wrong). Also,
the max define might be larger than we actually need it, so we
just waste a few bytes of static memory and do unnecesary steps
during validation.
Also, the compiler will catch bugs, if the array size of policy/tb
is too short for what we access later (-Warray-bounds).
- avoid redundant size specifiers like:
static const struct nla_policy policy[IFLA_INET6_MAX+1] = {
...
struct nlattr *tb[IFLA_INET6_MAX+1];
...
err = nla_parse_nested (tb, IFLA_INET6_MAX, attr, policy);
- use the nla_parse*_arr() macros that determine the maxtype
based on the argument types.
- move declaration of "static const struct nla_policy policy" variable
to the beginning, before auto variables.
- drop unneeded temporay error variables.
In practice, we don't fail to create the nlmsg, because in glib
malloc() cannot fail and we always create large enough buffers.
Anyway, handle the error correctly, and reduce the in-progress
counter again.
We will need more flags.
WireGuard internal tools solve this by embedding the change flags inside
the structure that corresponds to NMPlatformLnkWireGuard. We don't do
that, NMPlatformLnkWireGuard is only for containing the information about
the link.
Use the NM_ERRNO_NATIVE() macro that asserts that these errno numbers are
indeed positive. Using the macro also serves as a documentation of what
the meaning of these numbers is.
That is often not obvious, whether we have an nm_errno(), an nm_errno_native()
(from <errno.h>), or another error number (e.g. WaitForNlResponseResult). This
situation already improved by merging netlink error codes (nle),
NMPlatformError enum and <errno.h> as nm_errno(). But we still must
always be careful about not to mix error codes from different
domains or transform them appropriately (like nm_errno_from_native()).
When the logging level is DEBUG or TRACE, we keep all the sysctl
values we read in a cache to log how they change. Currently there is
no limit on the size of this cache and it can take a large amount of
memory.
Implement a LRU cache where the oldest entries are deleted to make
space for new ones.
https://github.com/NetworkManager/NetworkManager/pull/294
Previously, _nm_logging_clear_platform_logging_cache was an extern variable,
and NMLinuxPlatform would set it to a function pointer at certain points.
That's unnecessary complex, also when trying to make nm-logging thread-safe,
it's just more global variables that need to be considered. Don't do it
that way, but just link in a regular function.
Since commit 9ecdba316 ('platform: create netlink messages directly
without libnl-route-3') we're unconditionally setting IFA_ADDRESS to
the peer address, even if there's no peer and it's all zeroes.
The kernel actually stopped caring somewhere around commit caeaba790
('ipv6: add support of peer address') in v3.10, but Ubuntu Touch likes
to run Android's v3.4 on some poorly supported hardware.
Fixes: 9ecdba316chttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/77
The caller may not wish to replace existing peers, but only update/add
the peers explicitly passed to nm_platform_link_wireguard_change().
I think that is in particular interesting, because for the most part
NetworkManager will configure the same set of peers over and over again
(whenever we resolve the DNS name of an IP endpoint of the WireGuard
peer).
At that point, it seems disruptive to drop all peers and re-add them
again. Setting @replace_peers to %FALSE allows to only update/add.
Add cmp/hash functions that correctly honor the well known fields, instead
of doing memcmp/memcpy of the entire sockaddr structure.
Also, move the set function to nm_sock_addr_union_cpy() and
nm_sock_addr_union_cpy_untrusted(). This also gets it right
to ensure all bytes of the union are initialized (to zero).
We need more information what failed. Don't only return success/failure,
but an error number.
Note that we still don't actually return an error number. Only
the link_add() function is changed to return an nm-error integer.
Platform had it's own scheme for reporting errors: NMPlatformError.
Before, NMPlatformError indicated success via zero, negative integer
values are numbers from <errno.h>, and positive integer values are
platform specific codes. This changes now according to nm-error:
success is still zero. Negative values indicate a failure, where the
numeric value is either from <errno.h> or one of our error codes.
The meaning of positive values depends on the functions. Most functions
can only report an error reason (negative) and success (zero). For such
functions, positive values should never be returned (but the caller
should anticipate them).
For some functions, positive values could mean additional information
(but still success). That depends.
This is also what systemd does, except that systemd only returns
(negative) integers from <errno.h>, while we merge our own error codes
into the range of <errno.h>.
The advantage is to get rid of one way how to signal errors. The other
advantage is, that these error codes are compatible with all other
nm-errno values. For example, previously negative values indicated error
codes from <errno.h>, but it did not entail error codes from netlink.
While nm_utils_inet*_ntop() accepts a %NULL buffer to fallback
to a static buffer, don't do that.
I find the possibility of using a static buffer here error prone
and something that should be avoided. There is of course the downside,
that in some cases it requires an additional line of code to allocate
the buffer on the stack as auto-variable.
I think this is preferred over memset(), because it allows the
compiler to better unstand what is happening.
Also, strictly speaking in the C language, %NULL pointers are not
guaranteed to have an all zero bit pattern. Of course, that is already
required on any architecture where NetworkManager is running.
- previously, parsing wireguard genl data resulted in memory corruption:
- _wireguard_update_from_allowedips_nla() takes pointers to
allowedip = &g_array_index (buf->allowedips, NMWireGuardAllowedIP, buf->allowedips->len - 1);
but resizing the GArray will invalidate this pointer. This happens
when there are multiple allowed-ips to parse.
- there was some confusion who owned the allowedips pointers.
_wireguard_peers_cpy() and _vt_cmd_obj_dispose_lnk_wireguard()
assumed each peer owned their own chunk, but _wireguard_get_link_properties()
would not duplicate the memory properly.
- rework memory handling for allowed_ips. Now, the NMPObjectLnkWireGuard
keeps a pointer _allowed_ips_buf. This buffer contains the instances for
all peers.
The parsing of the netlink message is the complicated part, because
we don't know upfront how many peers/allowed-ips we receive. During
construction, the tracking of peers/allowed-ips is complicated,
via a CList/GArray. At the end of that, we prettify the data
representation and put everything into two buffers. That is more
efficient and simpler for user afterwards. This moves complexity
to the way how the object is created, vs. how it is used later.
- ensure that we nm_explicit_bzero() private-key and preshared-key. However,
that only works to a certain point, because our netlink library does not
ensure that no data is leaked.
- don't use a "struct sockaddr" union for the peer's endpoint. Instead,
use a combintation of endpoint_family, endpoint_port, and
endpoint_addr.
- a lot of refactoring.
Move NMLinuxPlatformPrivate earlier.
In the past, I moved the declaration of NMLinuxPlatformPrivate
after utility functions which are independent from platform
instance.
However, parsing netlink messages actually requires
NMLinuxPlatformPrivate, because we want to access the "genl"
socket.
So, move the types to the beginning of the file, like we do
for most other source files.
The _lookup_cached_link() function, should not skip over links which are
currently in the cache, but not in netlink. Instead, let the callers
skip them, as they see fit.
No change in behavior, because the few callers now explicitly check
for this.
- drop "goto error_label" in favor of returning right away.
At most places, there was no need to do any cleanup or
the cleanup is handled via nm_auto().
- adjust the return types of wireguard functions to return
a boolean success/failure, instead of some error code which
we didn't use.
- the change to _wireguard_get_link_properties() is intentional.
This was wrong previously, because in _wireguard_get_link_properties()
obj is always a newly created instance, and never has a genl
family ID set. This will be improved later.