Obviously, it would be nice if our unit tests are fast. However, with
valgrind and a busy machine, some of the tests can take a relatively
long time. In particular those, that are marked as "slow" (if you want
to skip them during development, do so via "NMTST_DEBUG=quick"
environment, or "CFLAGS=-DNMTST_TEST_QUICK=TRUE", see
"nm-test-utils.h").
Anyway. Our tests almost never hit the timeout, and if they do, the most
likely reason is that something was just slower then expected, and the
timeout is a bogus error.
Timeouts only act as last fail safe. It more important to avoid a false
(premature) timeout failure, than to minimize the wait time when the
test really hangs. Because a real hang is a bug anyway, that we will
discover and need to fix.
Increase the default test timeout for meson tests to 3 minutes.
Also, "test-route-linux" is known to take a long time. Increase that
timeout even further.
When we register/unregister a commit-type or when we add/remove a
config-data to NML3Cfg, that act only does the registration/addition.
Only on the next commit, are the changes actually done. The purpose
of that is to add/register multiple configurations and commit them later
when ready.
However, it would be wrong to not do the commit a short time after. The
configuration state is dirty with need to be committed, and that should
happen soon.
Worse, when a interface disappears, NMDevice will clear the ifindex and
the NML3Cfg instance, thereby unregistering all config data and commit
type. If we previously commited something, we need to do another follow-up
commit to cleanup that state.
That is for example important with ECMP routes, which are registered in
NMNetns. When NML3Cfg goes down, it always must unregister to properly
cleanup. Failure to do so, causes an assertion failure and crash. This
change fixes that.
Fix that by automatically schedule and idle commit on
register/unregister/add/remove of commit-type/config-data.
It should *always* be permissible to call a AUTO commit from
an idle handler, because various parties cannot use NML3Cfg
independently, and they cannot know when somebody else does a
commit.
Note that NML3Cfg remembers if it presiouvly did a commit
("commit_type_update_sticky"), so even if the last commit-type gets
unregistered, the next commit will still do a sticky update (one more
time).
The only remaining question is what happens during quitting. When
quitting, NetworkManager we may want to leave some interfaces up and
configured. If we were to properly cleanup the NML3Cfg we might need a
mechanism to handle that. However, currently we just leak everything
during quit, so that is not a concern now. It is something that needs
to be addressed in the future.
https://bugzilla.redhat.com/show_bug.cgi?id=2158394https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1505
Routes can be added with `ip route add|change|replace|append|prepend`.
Add a test that randomly tries to add such routes, and checks that
the cache stays consistent.
https://bugzilla.redhat.com/show_bug.cgi?id=2060684
nmtstp_env1_add_test_func() prepares a certain environment (with dummy
interface) that is used by some tests. Extend it, to allow creating more
than one interface (currently up to two).
The major point of NMDedupMultiIndex is that it can de-duplicate
the objects. It thus makes sense the everybody is using the same
instance. Make the multi-idx instance of NMPlatform configurable.
This is not used outside of unit tests, because the daemon currently
always creates one platform instance and everybody then re-uses the
instance of the platform.
While this is (currently) only used by tests, and that the performance
optimization of de-duplicating is irrelevant for tests, this is still
useful. The test can then check whether two separate NMPlatform objects
shared the same instance and whether it was de-duplicated.
We don't cache certain routes, for example based on the protocol. This is
a performance optimization to ignore routes that we usually don't care
about.
Still, if the user does `ip route replace` with such a route, then we
need to pass it to nmp_cache_update_netlink_route(), so that we can
properly remove the replaced route.
Knowing which route was replaces might be impossible, as our cache does
not contain all routes. Likely all that nmp_cache_update_netlink_route()
can to is to set "resync_required" for NLM_F_REPLACE. But for that it
should see the object first.
This also means, if we ever write a BPF filter to filter out messages
that contain NLM_F_REPLACE, because that would lead to cache inconsistencies.
The route table is part of the weak-id. You can see that with:
ip route replace unicast 1.2.3.4/32 dev eth0 table 57
ip route replace unicast 1.2.3.4/32 dev eth0 table 58
afterwards, `ip route show table all` will list both routes. The replace
operation is only per-table. Note that NMP_CACHE_ID_TYPE_ROUTES_BY_WEAK_ID
already got this right.
Fixes: 10ac675299 ('platform: add support for routing tables to platform cache')
CURLOPT_PROTOCOLS [0] was deprecated in libcurl 7.85.0 with
CURLOPT_PROTOCOLS_STR [1] as a replacement.
Well, technically it was only deprecated in 7.87.0, and retroactively
marked as deprecated since 7.85.0 [2]. But CURLOPT_PROTOCOLS_STR exists
since 7.85.0, so that's what we want to use.
This causes compiler warnings and build errors:
../src/core/nm-connectivity.c: In function 'do_curl_request':
../src/core/nm-connectivity.c:770:5: error: 'CURLOPT_PROTOCOLS' is deprecated: since 7.85.0. Use CURLOPT_PROTOCOLS_STR [-Werror=deprecated-declarations]
770 | curl_easy_setopt(ehandle, CURLOPT_PROTOCOLS, CURLPROTO_HTTP | CURLPROTO_HTTPS);
| ^~~~~~~~~~~~~~~~
In file included from ../src/core/nm-connectivity.c:13:
/usr/include/curl/curl.h:1749:3: note: declared here
1749 | CURLOPTDEPRECATED(CURLOPT_PROTOCOLS, CURLOPTTYPE_LONG, 181,
| ^~~~~~~~~~~~~~~~~
This patch is largely taken from systemd patch [2].
Based-on-patch-by: Frantisek Sumsal <frantisek@sumsal.cz>
[0] https://curl.se/libcurl/c/CURLOPT_PROTOCOLS.html
[1] https://curl.se/libcurl/c/CURLOPT_PROTOCOLS_STR.html
[2] 6967571bf2
[3] e61a4c0b7c
Fixes: 7a1734926a ('connectivity,cloud-setup: restrict curl protocols to HTTP and HTTPS')
In kernel, the valid range for the weight is 1-256 (on netlink this is
expressed as u8 in rtnh_hops, ranging 0-255).
We need an additional value, to represent
- unset weight, for non-ECMP routes in kernel.
- in libnm API, to express routes that should not be merged as ECMP
routes (the default).
Extend the type in NMPlatformIP4Route.weight to u16, and fix the code
for the special handling of the numeric range.
Also the libnm API needs to change. Modify the type of the attribute on
D-Bus from "b" to "u", to use a 32 bit integer. We use 32 bit, because
we already have common code to handle 32 bit unsigned integers, despite
only requiring 257 values. It seems better to stick to a few data types
(u32) instead of introducing more, only because the range is limited.
Co-Authored-By: Fernando Fernandez Mancera <ffmancera@riseup.net>
Fixes: 1bbdecf5e1 ('platform: manage ECMP routes')
There are two callers of available_connections_add(). One from
cp_connection_added_or_updated() (which is when a connection
gets added/modified) and one from nm_device_recheck_available_connections().
They both call first nm_device_check_connection_available() to see
whether the profile is available on the device. They certainly
need to pass the same check flags, otherwise a profile might
be available in some cases, and not in others.
I didn't actually test this, but I think this could result
in a profile wrongly not being listed as an available-connection.
Moreover, that might mean, that `nmcli connection up $PROFILE`
might work to find the device/profile, but `nmcli device up $DEVICE`
couldn't find the suitable profile (because the latter calls
nm_device_get_best_connection(), which iterates the
available-connections). I didn't test this, because regardless of
that, it seems obvious that the conditions for when we call
available_connections_add() must be the same from both places.
So the only question is what is the right condition, and it would
seem that _NM_DEVICE_CHECK_CON_AVAILABLE_FOR_USER_REQUEST is the right
flag.
Fixes: 02dbe670ca ('device: for available connections check whether they are available for user-request')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1496
iptables takes a file lock at /run/xtables.lock. By default, if
the file is locked, iptables will fail with error. When that happens,
the iptables rules won't be configured, and the shared mode
(for which we use iptables) will not be setup properly.
Instead, pass "--wait 2", to block. Yes, it's ugly that we use
blocking program invocations, but that's how it is. Also, iptables
should be fast to not be a problem in practice.
Next, support for "other_config" will be added. That is very similar
to "external_ids". Extend the existing code, to make that next update
simpler. The only purpose of this patch, is to reduce the diff of
when actually adding "other_config". Only in light of that, do some
of the changes here make sense.
We will add support for "other_config". This is in many aspects similar
to "external-ids". So first do a renaming, so that the code can be
sensibly reused. This is a separate patch, so that the followup commit
has less noise in the diff.
This function *only* renames (and reformats). No other changes.
"mutate" with operation "insert" does not update existing entries.
Delete them first.
Otherwise, a reapply that only change the value of an external-ids
entry does not work.
Note that https://www.rfc-editor.org/rfc/rfc7047 says about
"<mutations>":
If <mutator> is "insert", then each of the key-value pairs in
the map in <value> is added to <column> only if its key is not
already present. The required type of <value> is slightly
relaxed, in that it may have fewer than the minimum number of
elements specified by the column's type.
Fixes: 7055539c9f ('core/ovs: support setting OVS external-ids')
Doing an "update" is wrong, because that will replace all "other_config"
entries. We only want to reset the "hwaddr".
Note that https://www.rfc-editor.org/rfc/rfc7047 says about
"<mutations>":
If <mutator> is "insert", then each of the key-value pairs in
the map in <value> is added to <column> only if its key is not
already present. The required type of <value> is slightly
relaxed, in that it may have fewer than the minimum number of
elements specified by the column's type.
That means, we need to first delete, and then insert the key.
Fixes: 5d4c8521a3 ('ovs: set MAC address on the bridge for local interfaces')
New files must be written to the build directory, not to the source
one.
Fixes: 5ee2f3d1dc ('dhcp/tests: refactor tests for nm_dhcp_dhclient_save_duid()')
When the connection setting changes at the first place, then calling
the device reapply, the ip address got temporarily removed when DHCP
restarted. To avoid the ip address got temporarily removed, we should
preserve the previous lease and keep using it until the new lease comes
along.
This adjusts the change from commit ffbcf01589 ('test-ndisc-fake:
free l3cfg after creating fake-ndisc').
ndisc_new() already correctly handles the reference count of l3cfg via
"gs_unref_object". The party that took the wrong reference was
nm_fake_ndisc_new().
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
When committing ECMP routes we are cleaning up dirty routes and freeing
the EcmpTrackObj. We need to free EcmpTrackObj only when it is not
needed anymore so it is the last thing we do when cleaning up the
routes.
Considering nm_netns_l3cfg_acquire returns a l3cfg reference and only
keeps a weak reference we need to free l3cfg in fake-ndisc after
creating the object.
If an L3Cfg try to register an ECMP route that we are tracking we must
mark it as needs_update because it means it could be dropped from
kernel. When merging them if the merged route didn't change we should
commit the route anyway because we know it needed update.
When adding a static route, kernel enforce that the gateway is
reachable. To solve this, NetworkManager generates onlink routes for
each static route. As ECMP routes does not follow the same logic than
singlehop routes, we need to add the onlink route for each hop of ECMP
routes once merged.
NML3Cfg will take ownership of the onlink route added and will purge it
when it is not needed anymore.
NML3Cfg and NMNetns are already strongly related and cooperate.
An NML3Cfg instance is created via NMNetns, which is necessary
because NMNetns ensures that there is only one NML3Cfg instance per
ifindex and it won't ever make sense to have multiple NML3Cfg instances
per namespace.
Note that NMNetns tracks additional information for each NML3Cfg.
Previously, in a pointless attempt to separate code, it did so
by putting that information in another struct (L3CfgData).
But as the classes are strongly related, there really is no
reason why we cannot just attach this information to NML3Cfg
directly. Sure, we want that code has low coupling, high cohesion
but that doesn't mean we gain anything by putting data that is
strongly related to the NML3Cfg to another struct L3CfgData.
The advantage is we save some redundant data and an additional
L3CfgData. But the bigger reason is that with this change, it
will be possible to access the NMNetns specific data directly from
an NML3Cfg instance, without another dictionary lookup. Currently
such a lookup is never used, but it will be.
Basically, NML3Cfg and NMNetns shares some state. It is now in the
"internal_netns" field of the NML3Cfg instead of L3CfgData.
src/core/devices/nm-lldp-listener.c:911: check_after_deref:
Null-checking "self" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
Fixes: 04e72b6b4d ('lldp: use new libnm-lldp instead of systemd's sd_lldp_rx')
Older versions of iproute2 don't support the fwmark option. Remove it.
Fixes: 1cf8df2f35 ('platform: support VTI tunnels')
Fixes: b669a3ae46 ('platform: support VTI6 tunnels')
With multi-connect enabled, this can cause infinite retries to autoconnect,
see [1].
That has bad consequences for example in initrd, where
nm-wait-online-initrd.service would wait up to one hour before failing
and blocking boot.
This reverts commit 1656d82045.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2039734#c5
Fixes: 1656d82045 ('policy: track the autoconnect retries in devices for multi-connect')