This new endpoint type has been recently added to the kernel in v6.18
[1]. It will be used to create new subflows from the associated address
to additional addresses announced by the other peer. This will be done
if allowed by the MPTCP limits, and if the associated address is not
already being used by another subflow from the same MPTCP connection.
Note that the fullmesh flag takes precedence over the laminar one.
Without any of these two flags, the path-manager will create new
subflows to additional addresses announced by the other peer by
selecting the source address from the routing tables, which is harder to
configure if the announced address is not known in advance.
The support of the new flag is easy: simply by declaring a new flag for
NM, and adding it in the related helpers and existing checks looking at
the different MPTCP endpoint. The documentation now references the new
endpoint type.
Note that only the new 'define' has been added in the Linux header file:
this file has changed a bit since the last sync, now split in two files.
Only this new line is needed, so the minimum has been modified here.
Link: https://git.kernel.org/torvalds/c/539f6b9de39e [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
(cherry picked from commit 2b03057de0)
After ACD_WAIT_PROBING_EXTRA_TIME_MSEC has elapsed,
_l3_acd_data_timeout_schedule_probing_restart() keeps rescheduling the
timer with a zero interval, resulting in 100% CPU usage. This
continues until the probe is destroyed after
ACD_WAIT_PROBING_EXTRA_TIME2_MSEC.
When computing the interval, we need to use
(ACD_WAIT_PROBING_EXTRA_TIME_MSEC + ACD_WAIT_PROBING_EXTRA_TIME2_MSEC)
as the expiry time.
acd_data->probing_timestamp_msec indicates when the probing
started. It is used in different places to calculate the timeout for
certain operations. In particular, it is used to detect that the probe
creation took too long when handling the ACD_STATE_CHANGE_MODE_TIMEOUT
event.
If we reset this timestamp at every timer event, we'll never hit the
probe creation timeout. Therefore, the l3cfg will keep trying forever
to create the probe.
See: https://lists.freedesktop.org/archives/networkmanager/2025-July/000418.html
Fix this by not updating the timestamp during a timeout event.
Fixes: a09f9cc616 ('l3cfg: ensure the probing timeout is initialized on probe start')
An IPv6 endpoint is not usable until the address is non-tentative. Add
a mechanism to wait until the address is ready.
(cherry picked from commit 227cd6307b)
Skip the configuration of the MPTCP endpoint when the address is in
the l3cd but is not yet configured in the platform. This typically
happens when IPv4 DAD is enabled and the address is being probed.
If we configure the endpoint without the address set, the kernel will
try to use the endpoint immediately but it will fail. Then, the
endpoint will not be used ever again after the address is added.
(cherry picked from commit 6bf859af79)
The name suggests that the function always removes all the watchers
with the given tag; instead it removes only "dirty" ones when the
"all" parameter is FALSE. Split the function in two variants.
(cherry picked from commit b6e67c6abc)
Add the DNS routing rules explicitly instead of tracking them via the
NMGlobalTracker mechanism. Since we do not plan to ever remove them,
there is no reason to track the rules. Also, the current
implementation is buggy because in some situations the rules are
wrongly removed when they should not.
Fixes: bf3ecd9031 ('l3cfg: fix DNS routes')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2125
The current approach is flawed. During a commit of the L3
configuration we do a RTM_GETROUTE to find the next-hop to the DNS
server on the current interface, in order to create the DNS route to
inject into the l3cd. However, we haven't added routes to kernel yet
and so the result of the RTM_GETROUTE is going to be wrong.
In some cases, for example when IPv4 DAD is enabled, the bug can't be
easily noticed because we perform multiple commits for the interface,
and the regular routes are already set in kernel from the 2nd commit
on.
To fix the problem, do the following: during a commit we first add
addresses and routes to platform. Then, we create a list of DNS routes
to configure, we collect the old DNS routes, and do a comparison. If
they changed, we need to add the DNS routes to platform in a 2nd step.
Note that in the previous approach we tracked the routes in the
committed-l3cd object of the l3cfg, and so they were applied to kernel
automatically. Because of the 2-step requirement, that no longer works
and we must apply the DNS routes manually.
Fixes: 5449b18a94 ('core: support automatically adding DNS routes')
Don't try to add the routing rule that points to the table containing
DNS routes at every commit.
Instead, look into the platform cache to see if the rule already
exists and add it only when needed.
This function would be useful when performing operations related to the
IPv4 addresses configured on the l3cfg. E.g this function will be used
for getting the IPv4 to announce on a GARP on bonding-slb when one of
the ports failover.
We always sync routes in the main table, but routes in tables other
than main are only pruned if were added by NM, by default. Get the list
of routes to prune from other tables using obj_state->os_nm_configured,
as this tracks what routes were effectively added by NM.
The list should be the same that the one obtained from l3cfg_old. It
could be different if we commited the l3cfg with an NMIPRouteTableSyncMode
of NM_IP_ROUTE_TABLE_SYNC_MODE_MAIN, thus not deleting some routes at
commit time. However, since the previous commit, we never do it.
What all this shows is that starting to use different NMIPRouteTableSyncModes
is probably a bad idea: it will be a source of bugs of routes not being
always synced as users expect, and the use case for them is still to be
known.
By default, on reapply we were only syncing the main routes table. This
causes that routes added by NM to other tables are not removed on
reapply. This was done to preserve routes added externally, but routes
added by NM itself should be removed.
Add a new route table syncing mode "main + NM routes". This mode
maintains the normal behaviour of syncing completely the main table,
and for other tables removes only routes that were added by us, leaving
the rest untouched. Use this mode by default, as this is what a user
would expect on reapply.
Note: this might not work if NM is restarted between the profile being
modified and the reapply, because NM forgets what routes were added by
itself because of the restart. This is a rare corner case, though.
Use the D-Bus property "VersionInfo" to expose a capability flag
indicating that this bug is fixed. It is the first capability that we
expose in this way. However, it is convenient to do it this way as it's
something that clients like nmstate needs to know, so they can decide
whether a conn down is needed or not. It is not enough to decide that by
version number because it might be fixed via a downstream patch in distros
like RHEL.
https://issues.redhat.com/browse/RHEL-67324https://issues.redhat.com/browse/RHEL-66262
Fixes: e9c17fcc9b ('l3cfg: default to 'main' route table sync mode')
The difference between FULL and ALL was not obvious without reading the
documentation. Moreover, a new mode is going to be introduced so the
confusion could grow. Rename to a more explicit name.
After upgrading to RHEL-9.4, customers have reported that `ip monitor`
repeatedly logs the same route additions every 30 seconds. This issue
appears to stem from NetworkManager continually retrying to add the same
routes due to keep retrying Address Conflict Detection (ACD) on NOARP
interfaces.
To prevent unnecessary route additions and reduce log noise, this change
modifies NetworkManager's behavior to stop retrying ACD on interfaces
with the NOARP flag.
This fix addresses route instability and excessive logging for affected
NOARP configurations.
https://issues.redhat.com/browse/RHEL-59125
When the "ipvX.routed-dns" property is set to true, add a route for
each DNS server via the current interface. The feature works in the
following way.
A new routing rule is created ("priority $PRIO not fwmark $MARK lookup
$TABLE") where $PRIO, $MARK and $TABLE are fixed values and are the
same for all interfaces. This rule is evaluated before standard rules
and tries to look up routes in table $TABLE, where NM adds the routes
to DNS servers.
To determine the next-hop to the name server, NM issues a RTM_GETROUTE
netlink request to kernel, specifying to return the route via the
current interface. In order to avoid results from $TABLE, NM also sets
the fwmark as $MARK in the request.
During a commit of layer-3 configuration, multiple signals are
emitted:
- if the combined l3cd configuration changes, we first emit a
L3CD_CHANGED signal, with flag `commited` FALSE;
- if the previously committed configuration is different from the one
we want to commit, we emit again the same signal with `commited`
TRUE;
- a PRE_COMMIT signal
- a POST_COMMIT signal
The usefulness of the first and third signals is questionable: there
is no need to signal that the configuration changes if we are not
going to commit it. Also, PRE_COMMIT is redundant as we just emitted
L3CD_CHANGED. Nobody is using those 2 signals.
Simplify this by leaving only PRE_COMMIT and POST_COMMIT, which are
always emitted during a commit and provide information on the l3cd
changes.
This commit doesn't change behavior.
When handling event TIMEOUT, "acd_data->probing_timeout_msec" needs to
be always initialized before jumping to "handle_start_probing:";
otherwise, an assertion failure is triggered at:
static void
_l3_acd_data_timeout_schedule_probing_restart(AcdData *acd_data, gint64 now_msec)
{
...
nm_assert(acd_data->probing_timeout_msec > 0);
Even if the ACD data is already in state PROBE, that doesn't mean that
the timeout is already initialized because the PROBE state can also be
reached from a INSTANCE_RESET event; and depending on the previous
state "acd_data->probing_timeout_msec" could be uninitialized.
Fixes-test: @iptunnel_restart
Fixes: b8f9d7b5dd ('l3cfg: rework ACD handling in NML3Cfg to support handling conflicts')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2023
The name "dhcp_enabled" is misleading because the flag is set for
method=auto, which doesn't necessarily imply DHCP. Also, it doesn't
convey what the flag is used for. Rename it to
"allow_routes_without_address".
(cherry picked from commit b31febea22)
The decision to configure or not configure routes without addresses only
related to what method is configured - DHCP and non-DHCP cases. For DHCP
case, the deamon waits until addresses appear first before configuring
the static routes to preserve the behavior mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=2102212, otherwise, the
daemon can configure the routes immediately for non-DHCP case.
Dynamic added routes i.e ECMP single-hop routes, are not managed by l3cd
as the other ones. Therefore, they need to be tracked properly and
marked as dirty when they are.
Without this, the state of those ECMP single hop routes is not properly
tracked, and they are for example not removed by NML3Cfg when they
should.
Usually, routes to be configured originate from the combined
NML3ConfigData and are resolved early during a commit. For example,
_obj_states_update_all() creates for each such route an obj_state_hash
entry. Let's call those static, or "non-dynamic".
Later, we can receive additional routes. We get them back from NMNetns
during nm_netns_ip_route_ecmp_commit() (_commit_collect_routes()).
Let's call them "dynamic".
For those routes, we also must track an obj-state.
Now we have two reasons why an obj-state is tracked. The "non-dynamic"
and the "dynamic" one. Add two flags "os_dynamic" and "os_non_dynamic"
to the ObjStateData and consider the flags at the necessary places.
Co-authored-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
It wouldn't work otherwise. The object state is used to track routes
and compare them to what we find in platform.
A "metric_any" is useful at higher layers, to track a route where the
metric is decided by somebody else. But at the point when we add such an
object to the object-state, a fixed metric must be chosen.
Assert for that.
If a commit is invoked without any change to the l3cd or to the ACD
data, in _l3cfg_update_combined_config() we skip calling
_l3_acd_data_add_all(), which should clear the dirty flag from ACDs.
Therefore, in case of such no-op commits the ACDs still marked as
dirty - but valid - are removed via:
_l3_commit()
_l3_acd_data_process_changes()
_l3_acd_data_prune()
_l3_acd_data_prune_one()
Invoking a l3cfg commit without any actual changes is allowed, see the
explanation in commit e773559d9d ('device: schedule an idle commit
when setting device's sys-iface-state').
The bug is visible by running test 'bond_addreses_restart_persistence'
with IPv4 ACD/DAD is enabled by default: after restart IPv6 completes
immediately, the devices becomes ACTIVATED, the sys-iface-state
transitions from ASSUME to MANAGED, a commit is done, and it
incorrectly prunes the ACD data. The result is that the IPv4 address
is never added again.
Fix this by doing the pruning only when we update the dirty flags.
This is a respin of commit ed565f9146 ('l3cfg: fix pruning of ACD
data') that was reverted because it was causing a crash. The crash was
caused by unconditionally clearing `acd_data_pruning_needed` in
_l3cfg_update_combined_config(), while we need to do it only when
actually committing the configuration.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1749
The commit causes the following assertion failure:
0 0x00007f4187e22884 in __pthread_kill_implementation () from target:/lib64/libc.so.6
1 0x00007f4187dd1afe in raise () from target:/lib64/libc.so.6
2 0x00007f4187dba87f in abort () from target:/lib64/libc.so.6
3 0x00007f4188386f4e in g_assertion_message (domain=domain@entry=0x6fc1bc "nm", file=file@entry=0x722e94 "../src/core/nm-l3cfg.c", line=line@entry=2134,
func=func@entry=0x727730 <__func__.49> "_l3_acd_data_add_all", message=message@entry=0x23b3bb0 "assertion failed: (acd_data->info.track_infos[i]._priv.acd_dirty_track)")
at ../glib/gtestutils.c:3450
4 0x00007f41883f1597 in g_assertion_message_expr (domain=domain@entry=0x6fc1bc "nm", file=file@entry=0x722e94 "../src/core/nm-l3cfg.c", line=line@entry=2134,
func=func@entry=0x727730 <__func__.49> "_l3_acd_data_add_all", expr=expr@entry=0x726450 "acd_data->info.track_infos[i]._priv.acd_dirty_track") at ../glib/gtestutils.c:3476
5 0x0000000000587209 in _l3_acd_data_add_all (self=self@entry=0x23a7020, infos=infos@entry=0x0, infos_len=infos_len@entry=0, reapply=reapply@entry=1)
at ../src/core/nm-l3cfg.c:2134
6 0x0000000000587702 in _l3cfg_update_combined_config (self=self@entry=0x23a7020, to_commit=to_commit@entry=1, reapply=reapply@entry=1, out_old=out_old@entry=0x7ffd09ea4ca8,
out_changed_combined_l3cd=out_changed_combined_l3cd@entry=0x7ffd09ea4c7c) at ../src/core/nm-l3cfg.c:3858
7 0x000000000058a202 in _l3_commit (self=0x23a7020, commit_type=commit_type@entry=NM_L3_CFG_COMMIT_TYPE_REAPPLY, is_idle=is_idle@entry=0) at ../src/core/nm-l3cfg.c:5046
8 0x000000000058a49f in nm_l3cfg_commit (self=<optimized out>, commit_type=commit_type@entry=NM_L3_CFG_COMMIT_TYPE_REAPPLY) at ../src/core/nm-l3cfg.c:5115
9 0x00000000004856cd in nm_device_l3cfg_commit (self=self@entry=0x23ab870, commit_type=commit_type@entry=NM_L3_CFG_COMMIT_TYPE_REAPPLY, commit_sync=commit_sync@entry=1)
at ../src/core/devices/nm-device.c:4155
10 0x00000000004b1814 in nm_device_cleanup (self=self@entry=0x23ab870, reason=reason@entry=NM_DEVICE_STATE_REASON_NEW_ACTIVATION,
cleanup_type=cleanup_type@entry=CLEANUP_TYPE_DECONFIGURE) at ../src/core/devices/nm-device.c:15884
11 0x00000000004b26c9 in _set_state_full (self=self@entry=0x23ab870, state=state@entry=NM_DEVICE_STATE_DISCONNECTED, reason=NM_DEVICE_STATE_REASON_NEW_ACTIVATION,
quitting=quitting@entry=0) at ../src/core/devices/nm-device.c:16291
12 0x00000000004b2fe4 in nm_device_state_changed (self=self@entry=0x23ab870, state=state@entry=NM_DEVICE_STATE_DISCONNECTED, reason=<optimized out>)
at ../src/core/devices/nm-device.c:16505
13 0x00000000004b69de in queued_state_set (user_data=user_data@entry=0x23ab870) at ../src/core/devices/nm-device.c:16532
14 0x00007f41883bf4fd in g_idle_dispatch (source=0x23a88e0, callback=0x4b6956 <queued_state_set>, user_data=0x23ab870) at ../glib/gmain.c:6163
15 0x00007f41883c34fc in g_main_dispatch (context=0x22c4d10) at ../glib/gmain.c:3460
16 g_main_context_dispatch (context=0x22c4d10) at ../glib/gmain.c:4200
17 0x00007f41884216b8 in g_main_context_iterate.isra.0 (context=0x22c4d10, block=1, dispatch=1, self=<optimized out>) at ../glib/gmain.c:4276
18 0x00007f41883c2aff in g_main_loop_run (loop=0x22c3b50) at ../glib/gmain.c:4479
19 0x0000000000423a37 in main (argc=<optimized out>, argv=<optimized out>) at ../src/core/main.c:519
This reverts commit ed565f9146.
If a commit is invoked without any change to the l3cd or to the ACD
data, in _l3cfg_update_combined_config() we skip calling
_l3_acd_data_add_all(), which should clear the dirty flag from ACDs.
Therefore, in case of such no-op commits the ACDs still marked as
dirty - but valid - are removed via:
_l3_commit()
_l3_acd_data_process_changes()
_l3_acd_data_prune()
_l3_acd_data_prune_one()
Invoking a l3cfg commit without any actual changes is allowed, see the
explanation in commit e773559d9d ('device: schedule an idle commit
when setting device's sys-iface-state').
The bug is visible by running test 'bond_addreses_restart_persistence'
with IPv4 ACD/DAD is enabled by default: after restart IPv6 completes
immediately, the devices becomes ACTIVATED, the sys-iface-state
transitions from ASSUME to MANAGED, a commit is done, and it
incorrectly prunes the ACD data. The result is that the IPv4 address
is never added again.
Fix this by doing the pruning only when we update the dirty flags.
Interfaces with IFF_NOARP don't support Address Conflict Detection,
which is based on ARP. Trying to start ACD on them would result in
ENOBUFS always being returned by send(), and n-acd handles such error
by retrying indefinitely.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
On interfaces not supporting ACD (for example, layer3 interfaces), the
probe fails to be created with message:
l3cfg[...,ifindex=2]: acd[172.25.17.1, init]: probe-good (interface does not support acd, initial post-commit)
l3cfg[...,ifindex=2]: acd[172.25.17.1, ready]: set state to ready (probe is ready, waiting for address to be configured)
During the post-commit event, if the address is not yet configured, we
need to schedule a new commit to actually add it.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
When a collision is detected by the Address Conflict Detection
mechanism, store the conflicting MAC address in NML3AcdAddrInfo, so
that it is available to listeners of NML3Cfg for events of type
NM_L3_CONFIG_NOTIFY_TYPE_ACD_EVENT.
Since l3cfg rework, NetworkManager tracks IP routes early, not not only
when IP configuration is ready. That means, with `ipv4.method=auto` and
static `ipv4.routes`, then routes are most likely already configured
before the IP address is obtained via DHCP.
That may be desirable in some cases, but for many cases it's probably
wrong.
Instead, only configure the routes (with an ifindex) when we also have
an IP address.
https://bugzilla.redhat.com/show_bug.cgi?id=2102212https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1574
Previously, there was "temporary-not-available" mechanism in NML3Cfg,
which aimed to handle IPv6 routes with prefsrc. Theoretically, that
mechanism may have been extended to other use-cases, like IPv4 routes
with prefsrc. What it attempted to handle, is the inability to configure
such routes, unless the respective prefsrc address is configured and
non-tentative. However, the address that we are waiting for, could also
be on another interface, so that mechanism wasn't applicable. This is
now replaced by _routes_watch_ip_addrs(). It seems there isn't anything
useful left for the "temporary-not-available" mechanism and it can go,
except...
We want to log a warning when we are unable to configure a route. Also,
in the future we might want to know when the IP configuration is
degradated due to inability to configure the desired routes (a condition
that we might want to expose to the user, not only via logging; or we
may want to react on that).
However, with prefsrc routes we don't know right away whether the
inability to configure the route right away indicates an actual problem,
or whether that will resolve itself (e.g. after the address passes
DAD/ACD, after we received an DHCP lease or after the address was
configured on another interface). Consequently, to know whether the
current inability to configure such a route is a problem, we need to
know the larger context. nm_platform_ip_route_sync() does not have that
context.
Instead, nm_platform_ip_route_sync() needs only do debug log about
failure to configure routes. It will now also return all the failed
routes to NML3Cfg, which can decide whether that is a problem.
This reworks the previous "temporary-not-available" mechanism to track
the state of the failed routes, to eventually decide whether there is an
actual problem (and log about it).
Another problem this solves is that since commit ('platform: always
reconfigure IP routes even if removed externally'), we will eagerly
re-try to configure the same route over and over. We cannot just spam
the log with warnings about the same failure on every commit. We need to
remember that we already logged about the problem and rate limit
warnings otherwise. This is what the new mechanism also achieves.
Indeed, all this is mostly for the sole benefit of logging better
warnings (and not duplicated).
It was unused anyway.
But also, what would we do with this? We are in the middle of a commit,
if something goes wrong, we cannot just abort but need to continue on
and make the best of it.
Maybe there are very specific error cases that we need to handle, but
those are not covered by a boolean return value. Instead, we might need
to take specific action.
The boolean success variable was meaningless. Drop it.
Routes with pref_src (RTA_PREFSRC) can only be added when the
corresponding IP address is configured (and non-tentative, in case of
IPv6). Additionally, that address may be on any interface, not only on
the one we want to configure the route on. This means, when we first
activate a profile with a route that has a src attrbute, then that src
address might only be configured later. For example, with IPv6, it takes
a while for the address to become non-tentative. Or the address might
come from DHCP, and not be present initially. Or the address might even
be configured on another interface/profile. That means, while we might
be unable to configure the route now, we may become able any time later.
Solve that by subscribing to NMNetns to get notifications whenever such
an address gets added. In that case, schedule an idle commit, which may
then succeed.
NML3Cfg is stateful, that means it remembers which address/route it
configured earlier. That is important because the API users of NML3Cfg
only say what the want to configure now, and NML3Cfg needs to remove
addresses/routes that it configured earlier but are no longer to be
present. Also, NetworkManager wants to allow the user to add
addresses/routes externally with `ip addr|route add` and NetworkManager
not removing it. This is a common use case for dispatcher scripts, but
in general, we want to allow other components to add addresses/routes.
We try something similar with the removal of routes/addresses managed by
NetworkManager. When NetworkManager adds a route/address, which later
disappears, then we assume that the user intentionally removed the
address/route and take the hint to not re-add it.
However, it doesn't work. It is problematic for two reasons:
- kernel can automatically remove routes. For example, deleting an IPv4
address that is the prefsrc of a route, will cause kernel to delete
that route. Sure, we may be unable to re-configure the route at this
moment, but we shouldn't remember indefinitely that the route is
supposed to be absent. Rather, we should re-add it when possible.
- kernel is a pain with validating consistencies of routes. For example,
when a route has a nexthop gateway, then the gateway must be onlink
(directly reachable), or kernel refuses to add it with "Nexthop has
invalid gateway". Of course, when removing the onlink route kernel is
fine leaving the gateway route behind, which it would otherwise refuse
to add.
Anyway. Such interdependencies for when kernel rejects adding a route
with "Nexthop has invalid gateway" are non-trivial. We try to work
around that by always adding the necessary onlink routes. See
nm_l3_config_data_add_dependent_onlink_routes(). But if the user
externally removed the dependent onlink route, and NetworkManager
remembers to not re-adding it, then the efforts from
nm_l3_config_data_add_dependent_onlink_routes() are ignored. This
causes ripple effects and NetworkManager will also be unable to add the
nexthop route.
Trying to preserve absence of routes that NetworkManager would like to
configure is not tenable. Don't do it anymore. There was anyway no
guarantee that on the next update NetworkManager wouldn't try to re-add
the route in question. For example, if the route came from DHCP, and the
lease temporarily went away and came back, then NetworkManager probably
would have (correctly) forgotten that the user wished that the route be
absent. This did not work reliably and it just causes problems.