The idea of nm_free_secret() is to clear the secrets from memory. That
surely is some layer of extra snake oil, because we tend to pass secrets
via D-Bus, where the memory gets passed down to (D-Bus) libraries which
have no idea to keep it private. Still...
But turns out, malloc_usable_size() might not actually be usable for
this. Read the discussion at [1].
Stop using malloc_usable_size(), which seems unfortunate.
There is probably no secret relevant data after the NUL byte anyway,
because we tend to create such strings once, and don't rewrite/truncate
them afterwards (which would leave secrets behind as garbage).
Note that systemd's erase_and_free() still uses malloc_usable_size()
([2]) but the macro foo to get that right is terrifying ([3]).
[1] https://github.com/systemd/systemd/issues/22801#issuecomment-1343041481
[2] 11c0f0659e/src/basic/memory-util.h (L101)
[3] 7929e180aa
Fixes: d63cd26e60 ('shared: improve nm_free_secret() to clear entire memory buffer')
(cherry picked from commit 8b66865a88)
Trigger a dispatcher event when a connection is reapplied on a NM device.
Some devices such as phones have already a DHCP client running for accepting
connections when they are plugged into USB to transfer data over SSH.
When NetworkManager switches the connection IP method to shared,
it spawns a dnsmasq process to handle DHCP and DNS for that connection.
However, a dispatcher event is needed to disable the external DHCP server
for these USB connections as NetworkManager's dnsmasq handles them now.
Moreover, when the connection method is switched to a different mode,
the external DHCP server needs to be spawned again to make sure that
SSH connections are still possible to the device.
To achieve this, add a new NetworkManager Dispatcher event
'reapply' which is triggered when a connection is reapplied on a NM
device. This way, a dispatcher script can handle the case above by
inspecting the IP method in the dispatcher script.
(cherry picked from commit cef880c66f)
During srv_shutdown() we do
p.stdin.close()
p.kill()
Usually, the kill wins and the service just drops off the bus:
libnm-dbus[3201919]: <debug> [438617.45324] nmclient[40f7938626f3f5f0]: name owner changed: ":1.1" -> (null)
libnm-dbus[3201919]: <debug> [438617.45332] nmclient[40f7938626f3f5f0]: release all
at which point all objects in NMClient get destroyed and the signals get
emitted in the order:
libnm-dbus[3201919]: <trace> [438617.45574] nmclient[40f7938626f3f5f0]: [nmclient] emit "device-removed" signal for /org/freedesktop/NetworkManager/Devices/1
nmcli[out]: eth0: device removed
libnm-dbus[3201919]: <trace> [438617.45590] nmclient[40f7938626f3f5f0]: [nmclient] emit "any-device-removed" signal for /org/freedesktop/NetworkManager/Devices/1
libnm-dbus[3201919]: <trace> [438617.45593] nmclient[40f7938626f3f5f0]: [nmclient] emit "connection-removed" signal for /org/freedesktop/NetworkManager/Settings/Connectio>
nmcli[out]: con-1: connection profile removed
However, sometimes the stub service notices that stdin was closed and it
sends signals about shutting down:
libnm-dbus[3201061]: <trace> [438226.44965] nmclient[401639659459c316]: interfaces-removed: [/org/freedesktop/NetworkManager/Settings] receive interface remove event for >
libnm-dbus[3201061]: <trace> [438226.44966] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager/Settings]: changed-type 0x01 linked
libnm-dbus[3201061]: <trace> [438226.44967] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager/Settings]: changed-type 0x01 consumed
libnm-dbus[3201061]: <trace> [438226.44968] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager/Settings]: changed-type 0x02 linked
libnm-dbus[3201061]: <trace> [438226.44969] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager/Settings]: unregister NMClient from D-Bus object
libnm-dbus[3201061]: <trace> [438226.44971] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager/Settings]: drop D-Bus instance
libnm-dbus[3201061]: <trace> [438226.44971] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager/Settings]: set D-Bus object state unlinked
libnm-dbus[3201061]: <trace> [438226.44972] nmclient[401639659459c316]: [nmclient] emit "connection-removed" signal for /org/freedesktop/NetworkManager/Settings/Connectio>
nmcli[out]: con-1: connection profile removed
libnm-dbus[3201061]: <trace> [438226.44992] nmclient[401639659459c316]: interfaces-removed: [/org/freedesktop/NetworkManager] receive interface remove event for interface>
libnm-dbus[3201061]: <trace> [438226.44994] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager]: changed-type 0x01 linked
libnm-dbus[3201061]: <trace> [438226.44995] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager]: changed-type 0x01 consumed
libnm-dbus[3201061]: <trace> [438226.44996] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager]: changed-type 0x02 linked
libnm-dbus[3201061]: <trace> [438226.44998] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager]: unregister NMClient from D-Bus object
libnm-dbus[3201061]: <trace> [438226.44999] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager]: drop D-Bus instance
libnm-dbus[3201061]: <trace> [438226.45000] nmclient[401639659459c316]: [/org/freedesktop/NetworkManager]: set D-Bus object state unlinked
libnm-dbus[3201061]: <trace> [438226.45001] nmclient[401639659459c316]: [nmclient] emit "device-removed" signal for /org/freedesktop/NetworkManager/Devices/1
nmcli[out]: eth0: device removed
libnm-dbus[3201061]: <trace> [438226.45005] nmclient[401639659459c316]: [nmclient] emit "any-device-removed" signal for /org/freedesktop/NetworkManager/Devices/1
nmcli[out]: NetworkManager is stopped
libnm-dbus[3201061]: <debug> [438226.45545] nmclient[401639659459c316]: name owner changed: ":1.1" -> (null)
libnm-dbus[3201061]: <debug> [438226.45550] nmclient[401639659459c316]: release all
The fix is to accept the events in any order.
(cherry picked from commit 8548ab29ee)
The test stub service watches stdin, and if it gets closed the service
will shut down. Note that the service does not catch any signals, so
sending a signal will kill the service right away.
The previous code first closed stdin, and then killed the process.
That can result in different outcomes on D-Bus. Usually the signal
gets received first, and the test service just drops off the bus.
Sometimes it notices the closing of stdin and shuts actively down.
That can make a difference, especially for the test_monitor() test which
runs the monitor while stopping the service.
We could just always kill the stub service to get consistent behavior.
However, that doesn't seem very useful. Instead, randomize the behavior
to easier see how the behavior differs.
(cherry picked from commit fc282d5e05)
The main purpose is to simplify printf debugging and manual testing. We
can now trivially patch the code so that all output from nmcli gets
(additionally) written to a file. That is useful when debugging a unit
test in "test-client.py". Thereby we can duplicate all messages via
nm_utils_print(), which is in sync with the debug messages from libnm
and which honors LIBNM_CLIENT_DEBUG_FILE.
(cherry picked from commit b32e4c941a)
These will replace the direct calls to g_print()/g_printerr() in nmcli.
There are two purposes.
1) the new macros embody the concept of "printing something from nmcli".
It means, we can `git grep` for those functions, and find all the
relevant places, without hitting the irrelevant ones (e.g. tests that
also use g_print()).
2) by having one place, we can trivially change it. That is useful for
printf debugging. For example, "test-client.py" runs nmcli and
captures and compares the output. With libnm we can set
LIBNM_CLIENT_DEBUG and LIBNM_CLIENT_DEBUG_FILE to print libnm debug
messages to a file. But we cannot trivially synchronize the messages
from nmcli with that output (also because they are consumed by the test
and not immediately accessible). This would be easy, if we temporarily
could patch nmc_print*() to also log to nm_utils_print(). The new macros
will allow doing that at one place.
For example, patch the "#if 0" and run:
$ LIBNM_CLIENT_DEBUG=trace \
LIBNM_CLIENT_DEBUG_FILE='xxx.%p' \
NMTST_USE_VALGRIND=1 \
LIBTOOL="/bin/sh ./libtool"
./src/tests/client/test-client.sh -- -k monitor
(cherry picked from commit 4cf94f30c7)
nmc_print() will be used for something else. Rename. Also,
nmc_print_table() is the better name anyway because the function does a
lot of formatting and not simple printf().
(cherry picked from commit 4b2ded7a4a)
For debugging libnm, LIBNM_CLIENT_DEBUG can be very useful.
As the tests compare stdout/stderr from nmcli with expected output, just
enabling it will break the tests. However, in combination with
LIBNM_CLIENT_DEBUG_FILE this can be very useful.
Preserve and pass on the environment variables, if set.
(cherry picked from commit 1630009234)
With LIBNM_CLIENT_DEBUG we get debug logging for libnm, either to stdout
or to stderr.
"test-client.py" runs nmcli as a unit test. It thereby catches stdout
and stderr. That means, LIBNM_CLIENT_DEBUG would break the tests.
Now honor the LIBNM_CLIENT_DEBUG_FILE environment variable to specify a
file to which the debug logs get written. The pattern "%p" is replaced
by the process id.
As before, nm_utils_print(0, ...) also honors this environment variable
and uses the same logging destination.
(cherry picked from commit 6dbb215793)
Sort imports by name. Also avoid "from signal import SIGINT". I find
it ugly to import names in the current namespace. "SIGINT" should be
refered to by its full name, including the package/namespace.
(cherry picked from commit ee17346cee)
This will allow to find some memory leaks and memory corruptions.
The bulk of the nmcli calls are still not hooked up with valgrind.
Since we call nmcli a thousand time, we could not just run valgrind with
all of them. We would have instead to enable it randomly. This is
more work.
(cherry picked from commit debf78dbed)
The base class is not used, and it's not clear that it would be useful.
Sure, we could extend "test-client.py" will various non-nmcli tests. That
might make sense. And then it might make sense to have more unit test classes.
So far, we don't need that. Drop the unused base class NmTestBase.
(cherry picked from commit d1e6d53013)
Since glib 2.45, we are guaranteed that g_free() just calls free(), so
both can be used interchangeably. However, we still only depend on glib
2.40.
In any case, it's ugly to mix the two. Memory allocated by plain
malloc(), should be only freed with free(). The buffer in question comes
from readline, which allocates it using the system allocator.
Fixes: 995229181c ('cli: remove editor thread')
(cherry picked from commit 5dc07174d3)
Such leaks show up in valgrind, and are simply bugs.
Also, various callers (not all of them, which is another bug!) like to
take ownership of the returned string and free it. That means, we leave
a dangling pointer in the global variable, which is very ugly and error
prone.
Also, the callers like to free the string with g_free(), which is not
appropriate for the "rl_string" memory which was allocated by readline.
It must be freed with free(). Avoid that, by cloning the string using
the glib allocator.
Fixes: 995229181c ('cli: remove editor thread')
(cherry picked from commit 89734c7553)
The onlink flag is part of each next hop.
When NetworkManager configures ECMP routes, we won't support that. All
next hops of an ECMP route must share the same onlink flag. That is fine
and fixed by this commit.
What is not fine, is that we don't track the rtnh_flags flags in
NMPlatformIP4RtNextHop, and consequently our nmp_object_id_cmp() is
wrong.
Fixes: 5b5ce42682 ('nm-netns: track ECMP routes')
(cherry picked from commit 6ed966258c)
If the route with a next hop is already onlink, we don't need to add a
direct route to the gateway.
It also wouldn't work previously, because the onlink route to the
gateway that we would add, would have no gateway and the RTNH_F_ONLINK
set. Kernel would reject that with an error. We would have to clear the
RTNH_F_ONLINK flag, if there is no gateway.
(cherry picked from commit 93b46c8906)
For IPv6, kernel doesn't care. If the gateway is ::, you may or may
not set the onlink attribute. But for IPv4 routes, that gets rejected:
# ip route add 1.2.3.4/32 dev v onlink
Error: Invalid flags for nexthop - PERVASIVE and ONLINK can not be set.
Silently suppress setting the flag in that case and ignore the user
request. After all, the effect is probably the same (that is, the route
is onlink anyway).
(cherry picked from commit 8b14849877)
The dns-type must be included in the hash because it contributes to
the generated composite configuration. Without this, when the type of
a configuration changes (e.g. from DEFAULT to BEST), the DNS manager
would determine that there was no change and it wouldn't call
update_dns().
https://bugzilla.redhat.com/show_bug.cgi?id=2161957
Fixes: 8995d44a0b ('core: compare the DNS configurations before updating DNS')
(cherry picked from commit 46ccc82a81)
The tests failed in certain cases on gitlab-ci and were temporarily
disabled.
These issues should be fixed now and the test pass. Reenable.
(cherry picked from commit 5c324adc7c)
Kernel enforces that all nexthops must be reachable through a route.
L3Cfg is generating dependent onlink routes to solve this problem but
the IPv4 ECMP commit is happening before that.
To solve this we introduce two boolean fields "is_new" and "is_ready" to
know in which state is the L3Cfg affected. Initially, "is_new" is TRUE
and "is_ready" is FALSE. Here we schedule a commit on idle and we set
"is_new" to FALSE. When revisiting, we set "is_ready" to TRUE and then
we set the ECMP IPv4 routes.
When a reapply kicks in we reset the L3Cfg state by setting "is_new" to
TRUE.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1520
(cherry picked from commit 7a844ecba9)
Kernel enforces that all route nexthop are reachable but it doesn't care
if the drect route to the nexthop is in a different route table.
(cherry picked from commit f187e63fa8)
ECMP IPv4 route nexthops requires an onlink route but we should trust
l3cfg when generating and managing such routes.
This reverts commit 737cb5d424.
(cherry picked from commit cbf70b4dca)
We must trust l3cfg when generating dependent onlink routes for all kind
of routes not default routes only. This was done by
"nm_platform_ip_route_sync()" so there is not change in behaviour at
all.
"nm_platform_ip_route_sync()" could be needed for other situation where
l3cfg cannot add the dependent onlink routes, so we are not removing
that logic.
This reverts commit 6b4123db1c.
(cherry picked from commit 9c492c6fc4)
Certain ip-tunnel modules automatically create network interfaces (for
example, "ip_gre" module creates "gre0" and others).
Btw, that's not the same as `modprobe bonding max_bonds=1`, where
loading the module merely automatically creates a "bond0" interface. In
case of ip tunnel modules, these generated interfaces seem essential to
how the tunnel works, for example they cannot be deleted. I don't
understand the purpose of those interfaces, but they seem not just
regular tunnel interfaces (unlike, "bond0" which is a regular bond
interface, albeit automatically created).
Btw, if at the time when loading the module, an interface with such name
already exists, it will bump the name (for example, adding a "gre1"
interfaces, and so on). That adds to the ugliness of the whole thing,
but for our unit tests, that is no problem. Our unit tests run in a
separate netns, and we don't create conflicting interfaces. That is, an
interface named "gre0" is always the special tunnel interface and we
can/do rely on that.
Note that when the kernel module gets loaded, it adds those interfaces
to all netns. Thus, even if "test-route-linux" does not do anything with
ip tunnels, such an interface can always appear in a netns, simply by
running "test-link-linux" (or any other tool that creates a tunnel) in
parallel or even in another container.
Theoretically, we could just ensure that we load all the conflicting
ip-tunnel modules (with nmtstp_ensure_module()). There there are two
problems. First, there might be other tunnel modules that interfere but
are not covered by nmtstp_ensure_module(). Second, when kernel creates
those interfaces, it does not send correct RTM_NEWLINK notifications (a
bug), so our platform cache will not be correct, and
nmtstp_assert_platform() will fail.
The only solution is to detect and ignore those interfaces. Also,
ignore all interfaces of link-type "unknown". Those might be from other
modules that we don't know about and that exhibit the same problem.
(cherry picked from commit e99433866d)
Ubuntu 18.04 comes with iproute2-4.15.0-2ubuntu1.3. The
"/etc/iproute2/rt_protos" file from that version does not yet support
the "bgp" entry. Also the "babel" entry is only from 2014. Just choose
other entries. The point is that NetworkManager would ignore those, and
that applies to "zebra" and "bird" alike.
(cherry picked from commit 26592ebfe5)
This will make sure that the IP tunnel module is loaded. It does so by
creating (and deleting) a tunnel interface.
That is important, because those modules will create additional interfaces
that show up in `ip link` (like "gre0"), and those interfaces can interfere
with the tests.
Also add nmtstp_link_is_iptunnel_special() to detect whether an
interface is one of those special interfaces.
(cherry picked from commit 451cedf2bf)
"debug" is implied when setting NMTST_DEBUG, but not specifying
"no-debug". This change has thus no effect, but it seems clearer to be
explicit.
The "debug" flag affects nmtst_is_debug(). Note that tests *must* not
result in different code paths based on debug, they may only
1) print more debug logging
2) do more assertion checks.
Having more assertion checks can result in different outcome of the
test, that is, that the additional assertion fails first. That is
acceptable, because failing earlier is possibly closer to the issue and
helps debugging. Also, when the additional failure is fixed and passes,
we still will fail at the assertion we are trying to debug.
In particular, an access to nmtst_get_rand*()/nmtst_rand*() must not
depend on nmtst_is_debug(), because then different randomized paths
are taken based on whether debugging is enabled.
(cherry picked from commit 3f2ad76363)
Seems this test fails easily under gitlab-ci, if we set NMTST_SEED_RAND
to something else than "0". There is nothing particular special about
"0", except that a randomly different code paths are chosen.
A randomized test that doesn't pass on all systems with all random
paths, is broken. Disable for now. Needs to be fixed.
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2165141
(cherry picked from commit 14b1a7ba30)
Obviously, it would be nice if our unit tests are fast. However, with
valgrind and a busy machine, some of the tests can take a relatively
long time. In particular those, that are marked as "slow" (if you want
to skip them during development, do so via "NMTST_DEBUG=quick"
environment, or "CFLAGS=-DNMTST_TEST_QUICK=TRUE", see
"nm-test-utils.h").
Anyway. Our tests almost never hit the timeout, and if they do, the most
likely reason is that something was just slower then expected, and the
timeout is a bogus error.
Timeouts only act as last fail safe. It more important to avoid a false
(premature) timeout failure, than to minimize the wait time when the
test really hangs. Because a real hang is a bug anyway, that we will
discover and need to fix.
Increase the default test timeout for meson tests to 3 minutes.
Also, "test-route-linux" is known to take a long time. Increase that
timeout even further.
(cherry picked from commit 9ee42c0979)