Commit graph

31666 commits

Author SHA1 Message Date
Fernando Fernandez Mancera
79a9fcb166 l3-config-data: do not check route table to get direct routes
Kernel enforces that all route nexthop are reachable but it doesn't care
if the drect route to the nexthop is in a different route table.

(cherry picked from commit f187e63fa8)
2023-02-01 11:04:09 +01:00
Fernando Fernandez Mancera
d27a164d09 Revert "nm-netns: add onlink routes for ECMP routes"
ECMP IPv4 route nexthops requires an onlink route but we should trust
l3cfg when generating and managing such routes.

This reverts commit 737cb5d424.

(cherry picked from commit cbf70b4dca)
2023-02-01 11:04:09 +01:00
Fernando Fernandez Mancera
4073211595 Revert "l3cfg: do not add dependent routes for non-default routes"
We must trust l3cfg when generating dependent onlink routes for all kind
of routes not default routes only. This was done by
"nm_platform_ip_route_sync()" so there is not change in behaviour at
all.

"nm_platform_ip_route_sync()" could be needed for other situation where
l3cfg cannot add the dependent onlink routes, so we are not removing
that logic.

This reverts commit 6b4123db1c.

(cherry picked from commit 9c492c6fc4)
2023-02-01 11:04:09 +01:00
Thomas Haller
79c54e645c
platform/tests: merge branch 'th/platform-cache-test'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1513

(cherry picked from commit 90df190a1a)
2023-02-01 10:50:19 +01:00
Thomas Haller
a8f2625259
platform/tests: ignore ip-tunnel interfaces in nmtstp_assert_platform()
Certain ip-tunnel modules automatically create network interfaces (for
example, "ip_gre" module creates "gre0" and others).

Btw, that's not the same as `modprobe bonding max_bonds=1`, where
loading the module merely automatically creates a "bond0" interface. In
case of ip tunnel modules, these generated interfaces seem essential to
how the tunnel works, for example they cannot be deleted. I don't
understand the purpose of those interfaces, but they seem not just
regular tunnel interfaces (unlike, "bond0" which is a regular bond
interface, albeit automatically created).

Btw, if at the time when loading the module, an interface with such name
already exists, it will bump the name (for example, adding a "gre1"
interfaces, and so on). That adds to the ugliness of the whole thing,
but for our unit tests, that is no problem. Our unit tests run in a
separate netns, and we don't create conflicting interfaces. That is, an
interface named "gre0" is always the special tunnel interface and we
can/do rely on that.

Note that when the kernel module gets loaded, it adds those interfaces
to all netns. Thus, even if "test-route-linux" does not do anything with
ip tunnels, such an interface can always appear in a netns, simply by
running "test-link-linux" (or any other tool that creates a tunnel) in
parallel or even in another container.

Theoretically, we could just ensure that we load all the conflicting
ip-tunnel modules (with nmtstp_ensure_module()). There there are two
problems. First, there might be other tunnel modules that interfere but
are not covered by nmtstp_ensure_module(). Second, when kernel creates
those interfaces, it does not send correct RTM_NEWLINK notifications (a
bug), so our platform cache will not be correct, and
nmtstp_assert_platform() will fail.

The only solution is to detect and ignore those interfaces.  Also,
ignore all interfaces of link-type "unknown". Those might be from other
modules that we don't know about and that exhibit the same problem.

(cherry picked from commit e99433866d)
2023-02-01 10:50:17 +01:00
Thomas Haller
7268da807f
platform/tests: avoid recent route protocols in "/route/test_cache_consistency_routes" tests
Ubuntu 18.04 comes with iproute2-4.15.0-2ubuntu1.3. The
"/etc/iproute2/rt_protos" file from that version does not yet support
the "bgp" entry. Also the "babel" entry is only from 2014. Just choose
other entries. The point is that NetworkManager would ignore those, and
that applies to "zebra" and "bird" alike.

(cherry picked from commit 26592ebfe5)
2023-02-01 10:50:17 +01:00
Thomas Haller
487ebbe3c8
platform/tests: use nmtstp_ensure_module() in test_software_detect()
This helper function already loads the module and performs
additional checks. Use it.

(cherry picked from commit acc0cee28e)
2023-02-01 10:50:16 +01:00
Thomas Haller
30a0ea310b
platform/tests: add nmtstp_ensure_module() helper
This will make sure that the IP tunnel module is loaded. It does so by
creating (and deleting) a tunnel interface.

That is important, because those modules will create additional interfaces
that show up in `ip link` (like "gre0"), and those interfaces can interfere
with the tests.

Also add nmtstp_link_is_iptunnel_special() to detect whether an
interface is one of those special interfaces.

(cherry picked from commit 451cedf2bf)
2023-02-01 10:50:16 +01:00
Thomas Haller
19a192dbdb
platform/tests: fix nmtstp_link_{gre,ip6gre,ip6tnl,ipip}_add() to support missing parent
(cherry picked from commit 4966f9d784)
2023-02-01 10:50:15 +01:00
Thomas Haller
208218e141
gitlab-ci: use "meson test" for running unit tests
It seems that `meson test` is preferred over `ninja test`.  Also, pass
"--print-errorlogs" to meson, and pass "-v" to the build steps.

Note that `ninja test` already ends up calling `meson test
--print-errorlogs`, but it doesn't use "-v", so the logs are truncated.

(cherry picked from commit dba2fb5fff)
2023-02-01 10:50:15 +01:00
Thomas Haller
44f080879b
gitlab-ci: rerun meson test on failure with debugging
Like done for autotools. First we run the test without debugging option.
If it fails, we run it again to possibly trigger the failure again and
get better logs.

(cherry picked from commit 13d9cf75ed)
2023-02-01 10:50:15 +01:00
Thomas Haller
ea4b4b8775
gitlab-ci: explicitly set "NMTST_DEBUG=debug,..." for second debug run
"debug" is implied when setting NMTST_DEBUG, but not specifying
"no-debug". This change has thus no effect, but it seems clearer to be
explicit.

The "debug" flag affects nmtst_is_debug(). Note that tests *must* not
result in different code paths based on debug, they may only

 1) print more debug logging
 2) do more assertion checks.

Having more assertion checks can result in different outcome of the
test, that is, that the additional assertion fails first. That is
acceptable, because failing earlier is possibly closer to the issue and
helps debugging. Also, when the additional failure is fixed and passes,
we still will fail at the assertion we are trying to debug.

In particular, an access to nmtst_get_rand*()/nmtst_rand*() must not
depend on nmtst_is_debug(), because then different randomized paths
are taken based on whether debugging is enabled.

(cherry picked from commit 3f2ad76363)
2023-02-01 10:50:14 +01:00
Thomas Haller
a15ff56ec7
gitlab-ci: fix randomizing tests in "nm-ci-run.sh"
The code was just wrong. Usually in gitlab-ci, NMTST_SEED_RANDOM is
unset, so the previous code  would not have set it. Which means that our
tests run with NMTST_SEED_RANDOM="0".

Fuzzing (or randomizing tests) is very useful, we should do that for the
unit tests that run in gitlab-ci. Fix this.

But don't let the test choose a random number. Instead, let the calling
script choose it. That is, because we might run the tests more than once
(without debugging and no valgrind; in case of failure return with
debugging; with valgrind). Those runs should use the same seed.

This fixes commit 70487d9ff8 ('ci: randomize tests during our CI'),
but as fixing randomization can break previously running tests, we may
only want to backport this commit after careful evaluation.

(cherry picked from commit 3bad3f8b24)
2023-02-01 10:50:14 +01:00
Thomas Haller
dc588c951c
gitlab-ci: fix test script to abort on failing first test
Fixes: 89cfd34ae0 ('gitlab-ci: extend run-test.sh script to manually select certain build steps to run')
(cherry picked from commit 67da2b8e42)
2023-02-01 10:50:14 +01:00
Thomas Haller
64e818da80
l3cfg/tests: temporarily disable failing tests "/l3cfg/$N"
Seems this test fails easily under gitlab-ci, if we set NMTST_SEED_RAND
to something else than "0". There is nothing particular special about
"0", except that a randomly different code paths are chosen.

A randomized test that doesn't pass on all systems with all random
paths, is broken. Disable for now. Needs to be fixed.

See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2165141
(cherry picked from commit 14b1a7ba30)
2023-02-01 10:50:13 +01:00
Thomas Haller
ae4f5a1861
client/tests: temporarily disable failing test test_monitor()
(cherry picked from commit 65ea47580f)
2023-02-01 10:50:13 +01:00
Thomas Haller
108885721e
meson: increase "default_test_timeout" to 3 minutes
Obviously, it would be nice if our unit tests are fast. However, with
valgrind and a busy machine, some of the tests can take a relatively
long time. In particular those, that are marked as "slow" (if you want
to skip them during development, do so via "NMTST_DEBUG=quick"
environment, or "CFLAGS=-DNMTST_TEST_QUICK=TRUE", see
"nm-test-utils.h").

Anyway. Our tests almost never hit the timeout, and if they do, the most
likely reason is that something was just slower then expected, and the
timeout is a bogus error.

Timeouts only act as last fail safe. It more important to avoid a false
(premature) timeout failure, than to minimize the wait time when the
test really hangs. Because a real hang is a bug anyway, that we will
discover and need to fix.

Increase the default test timeout for meson tests to 3 minutes.

Also, "test-route-linux" is known to take a long time. Increase that
timeout even further.

(cherry picked from commit 9ee42c0979)
2023-02-01 10:50:11 +01:00
Thomas Haller
ae906e42da
platform: detect EINVAL as failure to set the MTU
Some drivers will reject an invalid MTU size with EINVAL.

Quote from [1]:

  While investigating, I did notice that do_change_link in
  nm-linux-platform.c really ought to count -EINVAL as an MTU out-of-range
  error and not just -ERANGE. Even if the hardware supports a large MTU,
  if the transmit FIFO is set too small, stmmac_change_mtu [2] will return
  -EINVAL. For example, on my device, the maxmtu is 9000 but in practice I
  can't set an MTU larger than 4096 unless I first run ethtool
  --set-channels eno1 tx 3.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1198#note_1738311
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c?h=v6.1#n5577

(cherry picked from commit 621b41ebfa)
2023-02-01 10:50:11 +01:00
Thomas Haller
4463a6c903
nmcli/style: fix clang-format style
Fixes: 8132045d5f ('nmcli: fix typos in nmcli output')
2023-01-27 08:32:54 +01:00
Michael Biebl
ae971c0019
nmcli: fix typos in nmcli output
Spotted by lintian:
  I: network-manager: spelling-error-in-binary writting writing [usr/bin/nmcli]
  I: network-manager: spelling-error-in-binary wihout without [usr/lib/x86_64-linux-gnu/NetworkManager/1.40.12/libnm-device-plugin-wifi.so]

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1515
(cherry picked from commit 8132045d5f)
2023-01-27 08:00:51 +01:00
Lubomir Rintel
4d75c43f1a dns-manager: always apply options from [global-dns]
Currently, the use of [global-dns] section for setting DNS options is
conditioned on presence of a nameserver in a [global-dns-domain-*] section.
Attempt to use the section for options alone results in an error:

  [global-dns]
  options=timeout:1

Or via D-Bus API:

  # busctl set-property org.freedesktop.NetworkManager \
      /org/freedesktop/NetworkManager org.freedesktop.NetworkManager \
      GlobalDnsConfiguration 'a{sv}' 2 \
          "options" as 1 "timeout:1" \
          "domains" a{sv} 0
  ...
  Nov 24 13:15:21 zmok.local NetworkManager[501184]: <debug> [1669292121.3904]
      manager: set global DNS failed with error: Global
      DNS configuration is missing the default domain

The insistence on existence of [global-dns-domain-*] would make sense if
other [global-dns-domain-...] sections were present.

However, the user might only want to set the options in resolv.conf and
still use connection-provide nameservers for the actual resolving.

Lift the limitation by allowing the [global-dns] to be used alone, while
still insist on [global-dns-domain-*] being there in presence of other
domain-specific options.

https://bugzilla.redhat.com/show_bug.cgi?id=2019306
(cherry picked from commit 1f0d1d78d2)
2023-01-26 10:11:45 +01:00
Lubomir Rintel
164cc81569 config-data: style fix
(cherry picked from commit 051819a78e)
2023-01-26 10:11:43 +01:00
Lubomir Rintel
e2246cb54b dns-manager: style fix
(cherry picked from commit f2f806f77d)
2023-01-26 10:11:42 +01:00
Lubomir Rintel
d976683417 release: bump version to 1.41.91 (1.42-rc2) (development) 2023-01-26 09:55:15 +01:00
Thomas Haller
9297f319f0 glib-aux/trivial: fix code comment about nmtst_is_debug()
Fixes: c1b57a2c72 ('glib-aux/tests: enable TRACE logging level when debugging nmtst tests')
(cherry picked from commit 93424dc2c2)
2023-01-26 09:20:18 +01:00
Thomas Haller
be848b3384 libnm: valide IPv4 ECMP routes in NMIPRoute as unicast routes
Kernel does not allow ECMP routes for route types other than unicast.
Reject that in NetworkManager settings too.

Fixes: 3cd02b6ed6 ('libnm,platform: fix range for "weight" property of next hops for routes')

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1507
(cherry picked from commit 4f719da32d)
2023-01-26 09:20:18 +01:00
Lubomir Rintel
bd4f5333e8 config: fix a reversed conditional
This effectively makes [*global-dns-domain-*] sections in configuration be
ignored unless [*global-dns] is also present. This happens because
nm_config_keyfile_has_global_dns_config() mixes the group names up and
attempts to loop up [.intern.global-dns-domain-*] in user configuration and
[global-dns-domain-*] in the internal one.

Fixes: da0ded4927 ('config: drop global-dns.enable option in favor of .config.enable')
(cherry picked from commit de1c06daab)
2023-01-26 09:20:18 +01:00
Beniamino Galvani
5b19818fdc merge: branch 'pr/1479'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1479

(cherry picked from commit 46f6e0bc29)
2023-01-23 18:02:27 +01:00
Alexander Lochmann
02c3f7c946 ndisc: Accept routes from on-link prefixes form ra
It is possible that an ra leads to two routes having
the same prefix as well as the same prefix length.
One of them, however, refers to the on-link prefix,
and the other one to a route from the route information field.
(Moreover, they might have different route preferences.)
Hence, if both routes differ in the on-link property,
both are added, and the route from the route information
option receives a metric penalty.
Fixed #1163.

(cherry picked from commit 11832e2ba3)
2023-01-23 18:02:27 +01:00
Thomas Haller
093bbd0fcf
release: bump version to 1.41.90 (1.42-rc1) 2023-01-19 20:15:14 +01:00
Thomas Haller
01730f5943
gitlab-ci: set OMP_NUM_THREADS=1 to avoid libgomp crash for msgmerge
It's not clear why this happens. But since recently in our gitlab-ci,
all the Fedora machines will fail. It happens in the step

  check_run_clean 6 && test $IS_FEDORA = 1 -o $IS_CENTOS = 1 && ./contrib/fedora/rpm/build_clean.sh -g -w crypto_gnutls -w debug -w iwd -w test -W meson

which explains why it only affects Fedora configurations.

It does not always fail, but the probability of failure is high.
The failure is:

  ...
  rm -f et.gmo && /usr/bin/msgmerge --for-msgfmt -o et.1po et.po NetworkManager.pot && /usr/bin/msgfmt -c --statistics --verbose -o et.gmo et.1po && rm -f et.1po
  libgomp: Thread creation failed: Resource temporarily unavailable
  make[3]: *** [Makefile:383: et.gmo] Error 1

Maybe some new resource restricting in gitlab. Let's add this workaround.
I don't really understand the cause, but this seems to avoid it, which is
good enough for me.
2023-01-19 19:31:28 +01:00
Thomas Haller
f2db9be627
platform/tests: temporarily disable failing check in "/route/test_cache_consistency_routes" (3)
This test is not (yet) stable. Disable for now.
2023-01-19 17:10:43 +01:00
Thomas Haller
20f71b392c
platform/tests: temporarily disable failing check in "/route/test_cache_consistency_routes" (2)
This test is not (yet) stable. Disable for now.
2023-01-19 16:27:28 +01:00
Thomas Haller
5be942eac5 libnm/doc: document "weight" attribute for IPv4 routes 2023-01-19 16:11:19 +01:00
Fernando Fernandez Mancera
d02be1c123 NEWS: update 2023-01-19 16:07:57 +01:00
Thomas Haller
1e883ab6e6
gitlab-ci: avoid clean step in "run-test.sh" for manual invocation
When we run `NM_TEST_SELECT_RUN=x ./.gitlab-ci/run-test.sh` to run one
step only, we should not do the final clean, so that the build artifacts
are preserved.
2023-01-19 15:04:46 +01:00
Thomas Haller
8ec69afd8b
platform/tests: temporarily disable failing check in "/route/test_cache_consistency_routes"
Hm, it seems to fail on some systems. Disable the test for now.
2023-01-19 13:25:52 +01:00
Fernando Fernandez Mancera
e266c420b2 l3cfg: schedule an update after every commit-type/config-data register/unregister
When we register/unregister a commit-type or when we add/remove a
config-data to NML3Cfg, that act only does the registration/addition.
Only on the next commit, are the changes actually done. The purpose
of that is to add/register multiple configurations and commit them later
when ready.

However, it would be wrong to not do the commit a short time after. The
configuration state is dirty with need to be committed, and that should
happen soon.

Worse, when a interface disappears, NMDevice will clear the ifindex and
the NML3Cfg instance, thereby unregistering all config data and commit
type. If we previously commited something, we need to do another follow-up
commit to cleanup that state.

That is for example important with ECMP routes, which are registered in
NMNetns. When NML3Cfg goes down, it always must unregister to properly
cleanup. Failure to do so, causes an assertion failure and crash. This
change fixes that.

Fix that by automatically schedule and idle commit on
register/unregister/add/remove of commit-type/config-data.
It should *always* be permissible to call a AUTO commit from
an idle handler, because various parties cannot use NML3Cfg
independently, and they cannot know when somebody else does a
commit.

Note that NML3Cfg remembers if it presiouvly did a commit
("commit_type_update_sticky"), so even if the last commit-type gets
unregistered, the next commit will still do a sticky update (one more
time).

The only remaining question is what happens during quitting. When
quitting, NetworkManager we may want to leave some interfaces up and
configured. If we were to properly cleanup the NML3Cfg we might need a
mechanism to handle that. However, currently we just leak everything
during quit, so that is not a concern now. It is something that needs
to be addressed in the future.

https://bugzilla.redhat.com/show_bug.cgi?id=2158394
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1505
2023-01-19 12:40:33 +01:00
Thomas Haller
adad1d4358
platform: merge branch 'th/platform-cache-consistency-routes'
https://bugzilla.redhat.com/show_bug.cgi?id=2060684

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1494
2023-01-19 11:54:10 +01:00
Thomas Haller
87522ad316
platform/tests: add test for cache consistency for IP routes
Routes can be added with `ip route add|change|replace|append|prepend`.
Add a test that randomly tries to add such routes, and checks that
the cache stays consistent.

https://bugzilla.redhat.com/show_bug.cgi?id=2060684
2023-01-19 11:53:08 +01:00
Thomas Haller
461279a9ce
platform/tests: add nmtstp_assert_platform() helper 2023-01-19 09:15:55 +01:00
Thomas Haller
d5e33bf660
platform/tests: extend nmtstp_env1_add_test_func() to create more interfaces
nmtstp_env1_add_test_func() prepares a certain environment (with dummy
interface) that is used by some tests. Extend it, to allow creating more
than one interface (currently up to two).
2023-01-19 08:56:22 +01:00
Thomas Haller
5579fca916
platform: allow setting multi_idx instance for NMPlatform
The major point of NMDedupMultiIndex is that it can de-duplicate
the objects. It thus makes sense the everybody is using the same
instance. Make the multi-idx instance of NMPlatform configurable.

This is not used outside of unit tests, because the daemon currently
always creates one platform instance and everybody then re-uses the
instance of the platform.

While this is (currently) only used by tests, and that the performance
optimization of de-duplicating is irrelevant for tests, this is still
useful. The test can then check whether two separate NMPlatform objects
shared the same instance and whether it was de-duplicated.
2023-01-19 08:56:21 +01:00
Thomas Haller
2c22c96235
platform: add NMP_OBJECT_TYPE_NAME() macro 2023-01-19 08:56:21 +01:00
Thomas Haller
7752b2e059
platform: abort handling routes in _rtnl_handle_msg() when resync is required
There really is nothing left to do. Skip the rest and do a resync.
2023-01-19 08:56:21 +01:00
Thomas Haller
6fc0dc3fcb
platform: resync route cache upon NLM_F_REPLACE flag
There really is no way around this. As we don't cache all the routes
(e.g. ignored based on rtm_protocol or rtm_type), we cannot know which
route was replaced, when we get a NLM_F_REPLACE message.

We need to request a new dump in that case, which can be expensive, if
there are a lot of routes or if replace happens frequently.

The only possible solutions would be:

1) NetworkManager caches all routes, but it also needs to make sure to
  get *everything* right. In particular, to understand every relevant
  route attribute (including those added in the future, which is
  impossible).

2) kernel provides a reasonable API (rhbz#1337855, rhbz#1337860) that
  allows to sufficiently understand what is going on based on the
  netlink notifications.
2023-01-19 08:56:21 +01:00
Thomas Haller
4ec2123aa2
platform: parse routes of any type to handle replace
When you issue

  ip route replace broadcast 1.2.3.4/32 dev eth0

then this route may well replace a (unicast) route that we have in
the cache.

Previously, we would right away ignore such messages in
_new_from_nl_route(), which means we miss the fact that a route gets
replaced.

Instead, we need to parse the message at least so far, that we can
detect and handle the replace.
2023-01-19 08:56:21 +01:00
Thomas Haller
854f2cc1fc
platform: better handle ip route replace for ignored routes
We don't cache certain routes, for example based on the protocol. This is
a performance optimization to ignore routes that we usually don't care
about.

Still, if the user does `ip route replace` with such a route, then we
need to pass it to nmp_cache_update_netlink_route(), so that we can
properly remove the replaced route.

Knowing which route was replaces might be impossible, as our cache does
not contain all routes. Likely all that nmp_cache_update_netlink_route()
can to is to set "resync_required" for NLM_F_REPLACE. But for that it
should see the object first.

This also means, if we ever write a BPF filter to filter out messages
that contain NLM_F_REPLACE, because that would lead to cache inconsistencies.
2023-01-19 08:56:21 +01:00
Thomas Haller
c64053e6e6
platform: minor cleanup in nmp_cache_update_netlink_route()
It reads nicer. It will also work better with the change that follows.
2023-01-19 08:56:21 +01:00
Thomas Haller
a3cea7f6fb
platform: fix nmp_lookup_init_route_by_weak_id() to honor the route-table
The route table is part of the weak-id. You can see that with:

  ip route replace unicast 1.2.3.4/32 dev eth0 table 57
  ip route replace unicast 1.2.3.4/32 dev eth0 table 58

afterwards, `ip route show table all` will list both routes. The replace
operation is only per-table. Note that NMP_CACHE_ID_TYPE_ROUTES_BY_WEAK_ID
already got this right.

Fixes: 10ac675299 ('platform: add support for routing tables to platform cache')
2023-01-19 08:56:21 +01:00