Commit graph

32019 commits

Author SHA1 Message Date
Thomas Haller
1feaf427d2
platform: rework handling of failed routes during nm_platform_ip_route_sync()
Previously, there was "temporary-not-available" mechanism in NML3Cfg,
which aimed to handle IPv6 routes with prefsrc. Theoretically, that
mechanism may have been extended to other use-cases, like IPv4 routes
with prefsrc. What it attempted to handle, is the inability to configure
such routes, unless the respective prefsrc address is configured and
non-tentative.  However, the address that we are waiting for, could also
be on another interface, so that mechanism wasn't applicable. This is
now replaced by _routes_watch_ip_addrs(). It seems there isn't anything
useful left for the "temporary-not-available" mechanism and it can go,
except...

We want to log a warning when we are unable to configure a route. Also,
in the future we might want to know when the IP configuration is
degradated due to inability to configure the desired routes (a condition
that  we might want to expose to the user, not only via logging; or we
may want to react on that).

However, with prefsrc routes we don't know right away whether the
inability to configure the route right away indicates an actual problem,
or whether that will resolve itself (e.g. after the address passes
DAD/ACD, after we received an DHCP lease or after the address was
configured on another interface).  Consequently, to know whether the
current inability to configure such a route is a problem, we need to
know the larger context.  nm_platform_ip_route_sync() does not have that
context.

Instead, nm_platform_ip_route_sync() needs only do debug log about
failure to configure routes. It  will now also  return all the failed
routes to NML3Cfg, which can decide whether that is a problem.

This reworks the previous "temporary-not-available" mechanism to track
the state of the failed routes, to eventually decide whether there is an
actual problem (and log about it).

Another problem this solves is that since commit ('platform: always
reconfigure IP routes even if removed externally'), we will eagerly
re-try to configure the same route over and over. We cannot just spam
the log with warnings about the same failure on every commit. We need to
remember that we already logged about the problem and rate limit
warnings otherwise. This is what the new mechanism also achieves.

Indeed, all this is mostly for the sole benefit of logging better
warnings (and not duplicated).
2023-03-21 15:58:55 +01:00
Thomas Haller
b8dba58892
l3cfg: don't return success/failure from _l3_commit_one()
It was unused anyway.

But also, what would we do with this? We are in the middle of a commit,
if something goes wrong, we cannot just abort but need to continue on
and make the best of it.

Maybe there are very specific error cases that we need to handle, but
those are not covered by a boolean return value. Instead, we might need
to take specific action.

The boolean success variable was meaningless. Drop it.
2023-03-21 15:58:53 +01:00
Thomas Haller
9564fc34fd
platform: log extack_msg for failures in nm_platform_ip_route_sync() 2023-03-21 15:58:52 +01:00
Thomas Haller
89efd94a08
platform: cleanup error handling in nm_platform_ip_route_sync()
Unindent the if-else blocks.
2023-03-21 15:58:50 +01:00
Thomas Haller
85816d1f19
core: skip watching prefsrc addresses if the address is ready 2023-03-21 15:58:49 +01:00
Thomas Haller
e4ac0c407d
core: watch IP addresses appearing/disappearing and recommit pref_src routes
Routes with pref_src (RTA_PREFSRC) can only be added when the
corresponding IP address is configured (and non-tentative, in case of
IPv6). Additionally, that address may be on any interface, not only on
the one we want to configure the route on. This means, when we first
activate a profile with a route that has a src attrbute, then that src
address might only be configured later. For example, with IPv6, it takes
a while for the address to become non-tentative. Or the address might
come from DHCP, and not be present initially. Or the address might even
be configured on another interface/profile. That means, while we might
be unable to configure the route now, we may become able any time later.

Solve that by subscribing to NMNetns to get notifications whenever such
an address gets added. In that case, schedule an idle commit, which may
then succeed.
2023-03-21 15:58:47 +01:00
Thomas Haller
71b2d4c33a
core: remove unused tag-less API from nm_netns_watcher*()
The implementation came with two flavors, where watcher could either
specify a tag or no tag. That resulted in different usage patterns and
behavior.

Handles with tag are indexed by a dictionary and de-duplicated. Also the intended
pattern is to delete them with nm_netns_watcher_remove_all(),
Currently, nm_netns_watcher_remove_handle() was not permissible to tag-full handles,
because of the de-duplication and because handles had no ref-counting
implemented (the latter would be fixable, so
nm_netns_watcher_remove_handle() would be made to work).

On the other hand, handles without tag are never de-duplicated. They are
also not indexed, so nm_netns_watcher_remove_all() doesn't work for
them. They could only be removed via nm_netns_watcher_remove_handle().

Currently, the only user of the API will use tag-full handles. Drop the
unused API. This is done as a separate commit, to potentially revert and
restore tag-less handles (after they were already implemented).
2023-03-21 15:58:46 +01:00
Thomas Haller
6d804b149c
core: add "watch" infrastructure to NMNetns
NML3Cfg will want to know when an address changes -- on any interface.

We want to support gazillion of interfaces, a naive approach is not
going to scale. Instead, NMNetns already subscribes to all platform
signals, it should dispatch events for address changes.

Add a mechanism how users (NML3Cfg) can register watches, and get called
back when the event happens.
2023-03-21 15:58:45 +01:00
Thomas Haller
ffa394832d
platform: move NMPlatformSignalChangeType to "nmp-base.h" header 2023-03-21 15:58:44 +01:00
Thomas Haller
7fa63c23b4
platform,l3cfg: remove force-commit flag for addresses/routes
We no longer need this. We now always force-commit routes and addresses.
See the previous commit.
2023-03-21 15:58:43 +01:00
Thomas Haller
6fadba5560
platform: don't add onlink route to gateway in nm_platform_ip_route_sync()
Kernel rejects adding routes that have a gateway, if there is no direct
(onlink) route to that gateway. The exact conditions are non-trivial due
to the complexities of routing, but that's it basically.

Anyway. In NetworkManager we don't want to have such non-obvious
interdependencies. If the user configures a route with a gateway, but
"forgets" to configure a direct route to the gateway, we don't assume
that the user configured the wrong route. Instead, we assume the user
forgot to configure the additional route and add it automatically.  That
is for convenience, but also because (as said) the rules for this are
non-trivial. Moreover, it's problematic to report an error in routing
during activation.  Should we fail activation altogether? Should we just
log an error and otherwise silently proceed? Logging is not a sensible
behavior that the (possibly non-human) user can meaningfully handle. So
we instead try to make it work.

Previously, nm_platform_ip_route_sync() had the workaround of when we
failed to configure a route and it looked like it might be due to the
missing onlink route, we would add a suitable /32 / /128 route. The
problem is that we want that NML3Cfg is aware of what routes we want to
configure. The lower layer nm_platform_ip_route_sync() adding additional
routes makes that difficult (maybe nm_platform_ip_route_sync() could
return the additional routes that it added, but it doesn't).

The better solution seems to be that
nm_l3_config_data_add_dependent_onlink_routes() adds the required routes
in NML3Cfg during commit. This is done since commit 4073211595
('Revert "l3cfg: do not add dependent routes for non-default routes"').

Further, since commit ('platform: always reconfigure IP routes even if
removed externally') we also always try to re-add the routes we want,
regardless of whether they appear to be deleted by the user.

So a suitable onlink route really should be always there, and there is
no more need for this workaround.
2023-03-21 15:58:42 +01:00
Thomas Haller
7ca95cee15
platform: always reconfigure IP routes even if removed externally
NML3Cfg is stateful, that means it remembers which address/route it
configured earlier. That is important because the API users of NML3Cfg
only say what the want to configure now, and NML3Cfg needs to remove
addresses/routes that it configured earlier but are no longer to be
present. Also, NetworkManager wants to allow the user to add
addresses/routes externally with `ip addr|route add` and NetworkManager
not removing it. This is a common use case for dispatcher scripts, but
in general, we want to allow other components to add addresses/routes.

We try something similar with the removal of routes/addresses managed by
NetworkManager. When NetworkManager adds a route/address, which later
disappears, then we assume that the user intentionally removed the
address/route and take the hint to not re-add it.

However, it doesn't work. It is problematic for two reasons:

- kernel can automatically remove routes. For example, deleting an IPv4
  address that is the prefsrc of a route, will cause kernel to delete
  that route. Sure, we may be unable to re-configure the route at this
  moment, but we shouldn't remember indefinitely that the route is
  supposed to be absent. Rather, we should re-add it when possible.

- kernel is a pain with validating consistencies of routes. For example,
  when a route has a nexthop gateway, then the gateway must be onlink
  (directly reachable), or kernel refuses to add it with "Nexthop has
  invalid gateway". Of course, when removing the onlink route kernel is
  fine leaving the gateway route behind, which it would otherwise refuse
  to add.
  Anyway. Such interdependencies for when kernel rejects adding a route
  with "Nexthop has invalid gateway" are non-trivial. We try to work
  around that by always adding the necessary onlink routes. See
  nm_l3_config_data_add_dependent_onlink_routes(). But if the user
  externally removed the dependent onlink route, and NetworkManager
  remembers to not re-adding it, then the efforts from
  nm_l3_config_data_add_dependent_onlink_routes() are ignored. This
  causes ripple effects and NetworkManager will also be unable to add the
  nexthop route.

Trying to preserve absence of routes that NetworkManager would like to
configure is not tenable. Don't do it anymore. There was anyway no
guarantee that on the next update NetworkManager wouldn't try to re-add
the route in question. For example, if the route came from DHCP, and the
lease temporarily went away and came back, then NetworkManager probably
would have (correctly) forgotten that the user wished that the route be
absent. This did not work reliably and it just causes problems.
2023-03-21 15:58:41 +01:00
Thomas Haller
b386381ffc
platform: add nm_platform_ip_route_get_pref_src() helper 2023-03-21 15:58:40 +01:00
Thomas Haller
7d6c1a0f43
platform: fix assertion in _ip_route_add() to return correct error code 2023-03-21 15:58:40 +01:00
Thomas Haller
cf21da4ed1
glib-aux/prioq: rename NM_PRIOQ_FOREACH_ITEM() to nm_prioq_for_each()
Our for-each macros are usually lower-case and are spelled differently. Rename.
Also add to clang-format as this is a for-each macro.
2023-03-21 15:58:39 +01:00
Thomas Haller
d840ddd959
glib-aux/prioq: assert for valid index in find_item() of NMPrioq
NMPrioq is taken from systemd's "prioq.c". It is a nice data structure,
that accepts and an index pointer, to directly access elements inside
the heap.

Previously, the API didn't require a consistent index, while the data is
not inside the heap. nm_prioq_{update,shuffle,remove}()) call find_item(),
which silently accepts wrong indexes and assumes the element is not in
the heap.

Keeping the index in sync with the data seems error prone. Accepting any
index without asserting may be convenient for the user (as the user is
not required to pre-initialize the index with NM_PRIOQ_IDX_NULL).
However, it also misses to catch potential bugs.

Now the index must be kept consistent, in particular also if the element
is not enqueued. This means, you must initialize them with
NM_PRIOQ_IDX_NULL.
2023-03-21 15:58:39 +01:00
Thomas Haller
78489e7cbb
glib-aux/prioq: refactor find_item() in NMPrioq 2023-03-21 15:58:38 +01:00
Thomas Haller
46703557fb
glib-aux/prioq: ensure to clear "idx" when removing item from NMPrioq
NMPrioq is copied from systemd's "prioq.c". It uses the index pointer to
find elements in the heap directly. However, it's not very careful about
clearing or maintaining the index pointer if the element is not in the
queue. Like, find_item() accepts a bogus index and only finds the data
if the data and the index matches. That seems error prone.

In the first step, when we remove an item from the queue, ensure to
reset the index to NM_PRIOQ_IDX_NULL.
2023-03-21 15:58:38 +01:00
Thomas Haller
037b0a2b2a
glib-aux/prioq: minor cleanups for NMPrioq 2023-03-21 15:58:38 +01:00
Thomas Haller
cf1963a379
glib-aux/prioq: rework asserts in NMPrioq 2023-03-21 15:58:37 +01:00
Thomas Haller
f701ff28e8
glib-aux/prioq: add and use internal PrioqItem typedef
Seems nicer to read.
2023-03-21 15:58:37 +01:00
Thomas Haller
68ccc46b04
glib-aux/prioq: add nm_prioq_update()
This combines remove/reshuffle/put in one.
2023-03-21 15:58:36 +01:00
Thomas Haller
684760a671
glib-aux: add nm_g_timeout_reschedule() helper 2023-03-21 15:58:36 +01:00
Thomas Haller
9b48c7f373
glib-aux: optimize nm_ip_addr_is_null()
The nm_ip_addr_*() APIs are supposed to work with unaligned input,
so we could use them while parsing binary data (where the field
may not be properly aligned). For that reason, it used to first
copy the argument to a local (properly aligned) variable.

Rework that a bit, and use unaligned_read_ne32() for IPv4.
2023-03-21 15:58:36 +01:00
Thomas Haller
9530553311
glib-aux: add NMIPAddrTyped struct for tagging NMIPAddr union
NMIPAddr is a union, so you always need to know the right addr_family.
Often we track the addr_family separately or know it implicitly. Then we
don't need to glue them together.

But also often we need to associate the address_family with the address
union. Add a struct that bundles the NMIPAddr with the addr_family,
making it a tagged union. The benefit is that we now also have one
implementation for equal, cmp and hash-update, instead of reimplementing
them.
2023-03-21 15:58:35 +01:00
Thomas Haller
c4a8fce1a7
platform: fix returning extrack_msg from platform add addr/route functions
Fixes: d755b50808 ('platform: return extack message from add address/route operations')
2023-03-21 15:58:35 +01:00
Thomas Haller
575e35d1ca
log,dhcp: avoid deprecated GTimeVal API and use g_get_real_time()
GTimeVal is deprecated because it's not year 2038 safe (on architectures
where gulong is 32 bit). Don't use it. It's easy to replace.

See-also: e3f88f311f

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1572
2023-03-21 10:21:28 +01:00
Thomas Haller
2c9faea63c
contrib,tools: move "nm-in-container.sh" script to "tools"
This script seems very useful to me. Give it a more prominent place and
move it out from "contrib/scripts". Also do some further renaming.
2023-03-20 08:59:59 +01:00
Beniamino Galvani
b3583ec69c merge: branch 'bg/device-check-compatible'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1522
2023-03-16 12:01:13 +01:00
Beniamino Galvani
7b9a9352c2 manager: relax check when creating virtual devices
For each connection that corresponds to a software device, we create a
"unrealized" device that then becomes realized just before the
connection starts activating. Currently, in certain conditions NM
creates two devices with the same name and type, one realized and one
not; this is not expected and can lead to other issues especially when
a software device is reactivated.

Avoid that by relaxing the check in system_create_virtual_device(): if
a device exists with the same name and type, we don't want to create
another even if the type-specific parameters differ.
2023-03-16 12:00:41 +01:00
Beniamino Galvani
99756757c8 device: honor the @check_properties flag 2023-03-16 12:00:41 +01:00
Beniamino Galvani
71ee9d9033 device: add @check_properties argument to check_connection_compatible()
No change in behavior for now.
2023-03-16 12:00:40 +01:00
Thomas Haller
a6802cfa74
nm-in-container: disable handling of "/etc/resolv.conf" in container and use 8.8.8.8.
By default, podman bind mounts a "/etc/resolv.conf" file. That prevents
NetworkManager (inside the container) to update the file, which leads to
warnings in the log and certain NM-ci tests won't pass due to that.

Disable handling of "/etc/resolv.conf" in podman. But also pre-deploy a
default resolv.conf, with the google name server 8.8.8.8.  I don't
understand why, but even with "--dns=none", writing "/etc/resolv.conf"
while building the container doesn't take effect.  Instead, write a
usable "/etc/resolv.conf" from "/etc/rc.d/rc.local".
2023-03-16 08:33:00 +01:00
Thomas Haller
c7675c73ee
ifcfg-rh: merge branch 'th/ifcfg-hostname'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1568
2023-03-15 17:06:30 +01:00
Thomas Haller
8a12713568
ifcfg-rh: fix preserving hostname setting in reader
The writer already got this right, to always ensure that at least one
hostname key is set iff the hostname setting is present.
2023-03-15 17:06:10 +01:00
Thomas Haller
209d48513c
ifcfg-rh: set errno from svParseBoolean()/svGetValueBoolean()/svGetValueTernary() 2023-03-15 17:06:10 +01:00
Thomas Haller
b4856e825c
ifcfg-rh: return ENOKEY errno from svGetValueEnum() for missing key
ENOENT is about files. ENOKEY seems a better code.
2023-03-15 17:04:37 +01:00
Thomas Haller
d453188ed2
nm-in-container: add nm-deploy.sh script for reinstalling NetworkManager 2023-03-15 16:50:57 +01:00
Thomas Haller
0c233fb216
nm-in-container: symlink NM/NM-ci directories in nm-in-container 2023-03-15 16:50:57 +01:00
Thomas Haller
44a3ccccf4
glib-aux,cloud-setup: merge branch 'th/cloud-setup-utils-poll'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1566
2023-03-13 17:13:19 +01:00
Thomas Haller
5a564ca17c
glib-aux,cloud-setup: move nm{cs,}_utils_poll() to libnm-glib-aux
The idea is to have useful and correct helper functions, that are
generic and reusable.

nmcs_utils_poll() was done with that intent, and it is a generally
useful function. As the implementation shows, it's not entirely trivial
to get all the parameters right, when it comes to glib-integration
(GMainContext and GTask) and polling.

Whether something like a generic poll helper is a useful thing at all,
may be a valid question. E.g. you need several hooks, and the usage is
still not trivial. Regardless of that, "nm-cloud-setup-utils.c" already
had such a helper. This is only about moving it to a place where it is
actually accessible to others.

And, if it turns out to be a good idea after all, then somebody else
could use it.
2023-03-13 17:13:01 +01:00
Thomas Haller
90f7d7d77b
cloud-setup: add register-object callback to nmcs_utils_poll()
nmcs_utils_poll() calls nmcs_wait_for_objects_register(), which is
specific to nm-cloud-setup.

nmcs_utils_poll() should move to libnm-glib-aux, so it should not have a
direct dependency on nm-cloud-setup code. Add instead a hook that the
caller can use for registering the object.
2023-03-13 17:13:01 +01:00
Thomas Haller
1ee35cb638
cloud-setup/trivial: fix gtkdoc comment for nmcs_utils_poll()
@probe_start_fcn is not called the first time synchronously. Fix the comment.
While at it, reword a bit.
2023-03-13 17:13:01 +01:00
Thomas Haller
841e06be5e
cloud-setup: don't pass separate user-data when polling in nm_http_client_poll_req()
nmcs_utils_poll() accepts two different user-data. One is passed to the
probe callbacks (and returned by nmcs_utils_poll_finish()). The other
one is passed to the callback.

Having separate user data might be useful. It's not useful for
nm_http_client_poll_req(), which both passes the same.

It's confusing to pass the same data as different user-data. Don't do
that. Use only one way.
2023-03-13 17:12:54 +01:00
Beniamino Galvani
82a9a493af ip-tunnel: merge branch 'pr/1565'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1565
2023-03-13 09:45:03 +01:00
Joao Machado
1505ca3626 platform/tests: ip6gre & ip6gretap test cases (ip6 tunnel flags) 2023-03-13 08:41:45 +00:00
Joao Machado
7c54d26ad6 libnmc-setting/docs: how to disable ip-tunnel.encapsulation-limit (ip6) 2023-03-13 08:41:45 +00:00
Joao Machado
a239317a6b libnm-core-impl: allow ip6 tunnel flags for ip6gre & ip6gretap 2023-03-13 08:41:45 +00:00
Thomas Haller
9bf193f1a8
contrib: escape shell arguments in "nm-setup-git.sh" output 2023-03-09 16:56:18 +01:00
Thomas Haller
3c548dd081
release: improve hint about documentation in "release.sh"
A "minor" release can still be the latest release. It depends
on which minor release you do. The script isn't smart enough
to understand the difference, so make the hint a bit clearer.
2023-03-09 09:50:03 +01:00