When the logging level is DEBUG or TRACE, we keep all the sysctl
values we read in a cache to log how they change. Currently there is
no limit on the size of this cache and it can take a large amount of
memory.
Implement a LRU cache where the oldest entries are deleted to make
space for new ones.
https://github.com/NetworkManager/NetworkManager/pull/294
For static functions inside a module, the compiler determines on its own
whether to inline the function.
Also, "inline" was used at some places that don't immediatly look like
candidates for inlining. It was most likely a copy&paste error.
NMPNetns instances are immutable, hence they can be easily shared
between threads. All we need, is that the stack of namespaces is
thread-local.
Also note that NMPNetns uses almost no other API, except some bits from
"shared/nm-utils/" and nm-logging. These parts are already supposed to
be thread-safe.
The only complications is that when the thread exits, we need to
destroy the NMPNetns instances. That is especially important because
they hold file descriptors. This is accomplished using pthread's
thread-specific data. An alternative would be C11 threads' tss_create(),
but not all systems that we run against support that yet. This means,
we need to link with pthreads, but we already do that anyway.
Note that glib also requires pthreads. So, we don't get an additional
dependency here.
Previously, _nm_logging_clear_platform_logging_cache was an extern variable,
and NMLinuxPlatform would set it to a function pointer at certain points.
That's unnecessary complex, also when trying to make nm-logging thread-safe,
it's just more global variables that need to be considered. Don't do it
that way, but just link in a regular function.
Since commit 9ecdba316 ('platform: create netlink messages directly
without libnl-route-3') we're unconditionally setting IFA_ADDRESS to
the peer address, even if there's no peer and it's all zeroes.
The kernel actually stopped caring somewhere around commit caeaba790
('ipv6: add support of peer address') in v3.10, but Ubuntu Touch likes
to run Android's v3.4 on some poorly supported hardware.
Fixes: 9ecdba316chttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/77
The caller may not wish to replace existing peers, but only update/add
the peers explicitly passed to nm_platform_link_wireguard_change().
I think that is in particular interesting, because for the most part
NetworkManager will configure the same set of peers over and over again
(whenever we resolve the DNS name of an IP endpoint of the WireGuard
peer).
At that point, it seems disruptive to drop all peers and re-add them
again. Setting @replace_peers to %FALSE allows to only update/add.
We still don't use getnameinfo(). This is used for logging,
where we want to see a string representation that is as close
as possible to the actual bytes (to spot differences). It should
not be obfuscated by a libc function out of our control.
Also fix the notation for the IPv6 scope ID to use the common '%'
character.
Add cmp/hash functions that correctly honor the well known fields, instead
of doing memcmp/memcpy of the entire sockaddr structure.
Also, move the set function to nm_sock_addr_union_cpy() and
nm_sock_addr_union_cpy_untrusted(). This also gets it right
to ensure all bytes of the union are initialized (to zero).
Since we already cached the result of getpagesize() in a static variable (at
two places), move the code to nm-shared-utils, so it is reusable.
Also, use sysconf() instead of getpagesize(), like suggested by `man
getpagesize`.
We need more information what failed. Don't only return success/failure,
but an error number.
Note that we still don't actually return an error number. Only
the link_add() function is changed to return an nm-error integer.
Platform had it's own scheme for reporting errors: NMPlatformError.
Before, NMPlatformError indicated success via zero, negative integer
values are numbers from <errno.h>, and positive integer values are
platform specific codes. This changes now according to nm-error:
success is still zero. Negative values indicate a failure, where the
numeric value is either from <errno.h> or one of our error codes.
The meaning of positive values depends on the functions. Most functions
can only report an error reason (negative) and success (zero). For such
functions, positive values should never be returned (but the caller
should anticipate them).
For some functions, positive values could mean additional information
(but still success). That depends.
This is also what systemd does, except that systemd only returns
(negative) integers from <errno.h>, while we merge our own error codes
into the range of <errno.h>.
The advantage is to get rid of one way how to signal errors. The other
advantage is, that these error codes are compatible with all other
nm-errno values. For example, previously negative values indicated error
codes from <errno.h>, but it did not entail error codes from netlink.
While nm_utils_inet*_ntop() accepts a %NULL buffer to fallback
to a static buffer, don't do that.
I find the possibility of using a static buffer here error prone
and something that should be avoided. There is of course the downside,
that in some cases it requires an additional line of code to allocate
the buffer on the stack as auto-variable.
Now that we have other helper function on platfrom for setting
IP configuration sysctls, rename the function to set the hop-limit
to match the pattern.
I think this is preferred over memset(), because it allows the
compiler to better unstand what is happening.
Also, strictly speaking in the C language, %NULL pointers are not
guaranteed to have an all zero bit pattern. Of course, that is already
required on any architecture where NetworkManager is running.
NMP_SYSCTL_PATHID_NETDIR_unsafe() uses alloca() to allocate the string.
Assert that the "path" argument is reasonably short.
In practice, that is of course the case, because there are only 2 callers
which take care not to pass an untrusted, unbounded path argument.
Checking whether the link exists in the cache, before talking to kernel
serves no purpose.
- in all cases, the caller already has a good indication that the link
in fact exists. That is, because the caller makes decisions on what to
do, based on what platform told it earlier. Thus, the check usually succeeds
anyway.
- in the unexpected case it doesn't succeed, we
- should not silently return without logging at least a message
- we possibly still want to send the netlink message to kernel,
just to have it fail. Note that the ifindex is indeed the identifier
for the link, so there is no danger of accidentally killing the
wrong link.
Well, theoretically there is, because the kernel's ifindex counter can
wrap or get reused when moving links between namespaces. But checking
the cache would not protect against that anyway! Worst case, the cache
would already have the impostor link and would not prevent from doing
the wrong thing. After all, they do have the same identifier, so how
would we know that this is in fact a different link?
We want that all code paths assert strictly and gracefully.
That means, if we have function nm_platform_link_get() which calls
nm_platform_link_get_obj(), then we don't need to assert the same things
twice. Don't have the calling function assert itself, if it is obvious
that the first thing that it does, is calling a function that itself
asserts the same conditions.
On the other hand, it simply indicates a bug passing a non-positive
ifindex to any of these platform functions. No longer let
nm_platform_link_get_obj() handle negative ifindex gracefully. Instead,
let it directly pass it to nmp_cache_lookup_link(), which eventually
does a g_return_val_if_fail() check. This quite possible enables
assertions on a lot of code paths. But note that g_return_val_if_fail()
is graceful and does not lead to a crash (unless G_DEBUG=fatal-criticals
is set for debugging).
nm_platform_link_delete() will soon assert against positive ifindex
argument.
nm_platform_link_delete (NM_PLATFORM_GET, nm_platform_link_get_ifindex (NM_PLATFORM_GET, DEVICE_NAME));
will result in an assertion, if the link does not exist.
Extend nmtstp_link_delete() to gracefully skip deleting the link
so that it can be used in such situations.
Also, rename nmtstp_link_del() to nmtstp_link_delete(), because it's
closer to nm_platform_link_delete().
We want to assert for valid input arguments, but we don't want
multiple assertions for the same.
Move the assertion from nm_platform_link_get() to
nm_platform_link_get_obj().
That way, nm_platform_link_get_obj() also checks the input arguments.
At the same time, nm_platform_link_get() gets simpler and still does
the same amount of assertions.
In nmp_cache_lookup_link_full(), we may have multiple candidates that match.
Continue searching, until we find a visible one. That way, visible results
are preferred.
Note that for links, nmp_object_is_visible() checks whether the link is
visible in netlink (instead of only udev).
Correct the spelling across the *entire* tree, including translations,
comments, etc. It's easier that way.
Even the places where it's not exposed to the user, such as tests, so
that we learn how is it spelled correctly.
Add helper nm_platform_link_get_ifi_flags() to access the
ifi-flags.
This replaces the internal API _link_get_flags() and makes it public.
However, the return value also allows to distinguish between errors
and valid flags.
Also, consider non-visible links. These are links that are in netlink,
but not visible in udev. The ifi-flags are inherrently netlink specific,
so it seems wrong to pretend that the link doesn't exist.
Sometimes the test fail:
$ make -j 10 src/platform/tests/test-address-linux
$ while true; do
NMTST_DEBUG=d ./tools/run-nm-test.sh src/platform/tests/test-address-linux 2>&1 > log.txt || break;
done
fails with:
ERROR: src/platform/tests/test-address-linux - Bail out! test:ERROR:src/platform/tests/test-common.c:790:nmtstp_ip_address_assert_lifetime: assertion failed (adr <= lft): (1001 <= 1000)
That is, because of a wrong check. Fix it.
In the past, the headers "linux/if.h" and "net/if.h" were incompatible.
That means, we can either include one or the other, but not both.
This is fixed in the meantime, however the issue still exists when
building against older kernel/glibc.
That means, including one of these headers from a header file
is problematic. In particular if it's a header like "nm-platform.h",
which itself is dragged in by many other headers.
Avoid that by not including these headers from "platform.h", but instead
from the source files where needed (or possibly from less popular header
files).
Currently there is no problem. However, this allows an unknowing user to
include <net/if.h> at the same time with "nm-platform.h", which is easy
to get wrong.