The hash/cmp functions were wrong with respect to IPv4 routes. Fix that.
- the weight was cast to a guint8, which is too small to hold all
values.
- NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID comparison would normalize a zero
weight to one
NM_CMP_DIRECT(NM_MAX(a->weight, 1u), NM_MAX(b->weight, 1u));
That was very wrong.
- the handling of the weight depends on the n_nexthops parameter.
See _ip4_route_weight_normalize().
The remarkable thing is that upper layers find it useful to track IPv4
single-hop routes with a non-zero weight. Consequently, this is honored
by NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID to treat single-hop routes
different, when their weight differs. However, adding such a route in
kernel will not work. To kernel, single-hop routes have no weight. While
the route exists as distinct in our hash tables, according to the
implemented identity, it never exists in kernel (or NMPlatform cache).
Well, you can call nm_platform_ip_route_add() with such a route, but the
result will have a weight of zero (making it a different route). See also
nm_platform_ip_route_normalize().
This works all mostly fine. The only thing to take care is that you
cannot look into the NMPlatform cache and ever find a IPv4 single-hop
route with a non-zero weight. If you preform such a lookup, realize that
such routes don't exist in platform. You can however normalize the
weight to zero first (nm_platform_ip_route_normalize()) and look for a
similar route with a zero weight.
Fixes: 1bbdecf5e1 ('platform: manage ECMP routes')
(cherry picked from commit bc53ad4976)
This was forgotten to implement. But we cannot just forget about it.
Libnm emits a warning about unknown properties, exactly to catch such
bugs. Properties that are not implemented, must be marked to be ignored.
Next, support for this property will be added. But that introduces new
API, which cannot be backported. Hence, first fix the problem by marking
the property as ignored. This is a backportable change.
$ LIBNM_CLIENT_DEBUG="warning" G_DEBUG=fatal-warnings nmcli
(process:270215): nm-WARNING **: 15:22:56.125: libnm-dbus: <warn > nmclient[8094a8c217aae461]: get-managed-objects: [/org/freedesktop/NetworkManager/Devices/5]: ignore unknown property org.freedesktop.NetworkManager.Device.IPTunnel.FwMark
Trace/breakpoint trap (core dumped)
Fixes: 351c562491 ('devices: support VTI tunnels')
(cherry picked from commit b47c94666c)
This commit fixes the build process for the documentation that was previously
unable to build separately via meson due to a dependency issue.
Previously, trying to build the API documentation via `ninja NetworkManager-doc`
failed due to missing dependencies (for example, `nm-dbus-types.xml` was not built).
I believe this happens due to some different handling of static paths vs. custom_target
by meson in this case.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1801
Fixes: 03637ad8b5 ('build: add initial support for meson build system')
(cherry picked from commit ffb34d2485)
When connection down is explicitly called on the controller, the port
connection should also be deactivated with the reason user-requested,
otherwise any following connection update on the controller profile
will unblock the port connection and unnessarily make the port to
autoconnet again.
Fixes: 645a1bb0ef ('core: unblock autoconnect when master profile changes')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/1568
(cherry picked from commit 21a6d7a0b6)
When doing reapply on linux bridge interface, NetworkManager will reset
the VLAN filtering and default PVID which cause PVID been readded to all
bridge ports regardless they are managed by NetworkManager.
This is because Linux kernel will re-add PVID to bridge port upon the
changes of bridge default-pvid value.
To fix the issue, this patch introduce netlink parsing code for
`vlan_filtering` and `default_pvid` of NMPlatformLnkBridge, and use that
to compare desired VLAN filtering settings, skip the reset of VLAN
filter if `default_pvid` and `vlan_filtering` are unchanged.
Signed-off-by: Gris Ge <fge@redhat.com>
(cherry picked from commit 02c34d538c)
(cherry picked from commit f990f9b4e4)
When IPv6 is disabled in kernel but ipv6.method is set to auto, NetworkManager repeatedly attempts
IPv6 configuration internally, resulting in unnecessary warning messages being output infinitely.
platform-linux: do-add-ip6-address[2: fe80::5054:ff:fe7c:4293]: failure 95 (Operation not supported)
ipv6ll[e898db403d9b5099,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
platform-linux: do-add-ip6-address[2: fe80::5054:ff:fe7c:4293]: failure 95 (Operation not supported)
ipv6ll[e898db403d9b5099,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
platform-linux: do-add-ip6-address[2: fe80::5054:ff:fe7c:4293]: failure 95 (Operation not supported)
ipv6ll[e898db403d9b5099,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
To prevent this issue, let's disable IPv6 in NetworkManager when it is disabled in the kernel.
In order to do it in activate_stage3_ip_config() only once during activation,
the firewall initialization needed to be moved earlier. Otherwise, the IPv6 disablement could occur
twice during activation because activate_stage3_ip_config() is also executed from subsequent of fw_change_zone().
(cherry picked from commit 50a6386c3b)
Instead of doing the broken `podman run` and `podman start` approach,
build an image ("nm-code-format:f38"), cache it, and use it to run
"nm-code-format.sh" via `podman run`. We should build and keep a
container image, not a container.
The benefit is that this allows to hand over the command line arguments
to "nm-code-format.sh". In particular the "-u" and "-F" options, which
are life savers.
This means,
$ contrib/scripts/nm-code-format-container.sh -u
works.
Try also
$ contrib/scripts/nm-code-format-container.sh -h
which tells you that you are running inside the container, and how to
delete/renew the container image.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1798
(cherry picked from commit a6e085b3e8)
This valgrind error is raised by Ubuntu 23:10, with valgrind 1:3.21.0-0ubuntu1 and
glibc 2.38-1ubuntu6:
==141967== Source and destination overlap in memcpy_chk(0x1ffefffe98, 0x1ffefffe92, 8)
==141967== at 0x4851042: __memcpy_chk (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==141967== by 0x502A9CC: memmove (string_fortified.h:36)
==141967== by 0x502A9CC: inet_pton6 (inet_pton.c:226)
==141967== by 0x502A9CC: __inet_pton_length (inet_pton.c:56)
==141967== by 0x502A9CC: inet_pton (inet_pton.c:69)
==141967== by 0x114FA3: nmtst_inet6_from_string_p (nm-test-utils.h:1764)
==141967== by 0x114FA3: _nmtst_assert_ip6_address (nm-test-utils.h:1856)
==141967== by 0x114FA3: test_stable_privacy (test-utils.c:26)
==141967== by 0x4DEC85D: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7800.0)
==141967== by 0x4DEC78A: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7800.0)
==141967== by 0x4DECD2A: g_test_run_suite (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7800.0)
==141967== by 0x4DECE07: g_test_run (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7800.0)
==141967== by 0x4F070CF: (below main) (libc_start_call_main.h:58)
==141967==
{
<insert_a_suppression_name_here>
Memcheck:Overlap
fun:__memcpy_chk
fun:memmove
fun:inet_pton6
fun:__inet_pton_length
fun:inet_pton
fun:nmtst_inet6_from_string_p
fun:_nmtst_assert_ip6_address
fun:test_stable_privacy
obj:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7800.0
obj:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.7800.0
fun:g_test_run_suite
fun:g_test_run
fun:(below main)
}
This is because memmove() can call __memcpy_chk(), which triggers the
false positive in valgrind.
It probably affects other places too, but for us it's sufficient to
restrict the valgrind suppression to calls from inet_pton(). This
suppression will probably break with LTO enabled.
See-also: https://bugs.kde.org/show_bug.cgi?id=402833https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1802
(cherry picked from commit 9f1d4b650c)
If a ovs interface has the cloned-mac-address property set, we pass
the desired MAC to ovsdb when creating the db entry, and openvswitch
will eventually assign it to the interface.
Note that usually the link will not have the desired MAC when it's
created. Therefore, currently we also change the MAC via netlink
before proceeding with IP configuration. This is important to make
sure that ARP announcements, DHCP client-id, etc. will use the correct
MAC address.
This doesn't work when using the "netdev" (userspace) datapath, as the
attempts to change the MAC of the tun interface via netlink fail,
leading to an activation failure.
To properly handle both cases in the same way, adopt a different
strategy: now we don't set the MAC address explicitly via netlink but
we only wait until ovs does that.
(cherry picked from commit acf485196c)
Add a helper function to check whether the ovs link is ready. In the
next commit, a new condition will be added to the helper.
(cherry picked from commit 3ad82e2726)
The function checks that priv->wait_link.waiting is set. Since the
flag is only set in stage3, it is wrong to schedule stage2 again.
(cherry picked from commit 01a6a2dc15)
Also, add prefix "ovs-wait-link" to all messages related to waiting
for the ovs link, so that they can be easily spotted in logs.
(cherry picked from commit 49a7bd110d)
The code to determine if we are using the netdev datapath is logically
separated from the code to start IP configuration; move it to its own
function to make the code easier to follow.
(cherry picked from commit a7a06163be)
The deactivation can happen while we are waiting for the ifindex, and
it can happen via two code paths, depending on the state. For a
regular deactivation, method deactivate_async() is called. Otherwise,
if the device goes directly to UNMANAGED or UNAVAILABLE, deactivate()
is called. We need to make sure that signal and source handlers are
disconnected, so that they are not called at the wrong time.
Fixes: 99a6c6eda6 ('ovs, dpdk: fix creating ovs-interface when the ovs-bridge is netdev')
(cherry picked from commit 164a343574)
This reverts commit 05c31da4d9.
In the linked commit the callback was made repeating on the assumption
that forward progress would result in the callback getting canceled in
cb_data_complete. However, this assumption does not hold since a timeout
callback does not guarantee completion (or error out) of a request.
curl tweaked some internals in v8.4.0 and started giving 0 timeouts, and
a repeating callback is firing back-to-back without making any progress
in doing so.
Revert the change and make the callback non-repeating again.
Fixes: 05c31da4d9 ('connectivity: don't cancel curl timerfunction from timeout')
(cherry picked from commit abc6e1cf25)
Previously, we first sort by the device's state, then by the active
connection's state. Contrast to `nmcli connection`, which first sorts
by the active connection's state.
It means, the sort order is somewhat different. Fix that.
In most cases, that shouldn't make a difference, because the
device's state and the active-connection's state should
correspond. However, it matters as we now treat external activations
different, and that is tied to the active connection.
(cherry picked from commit 38ad9e5211)
EXTERNAL connections are special. Sort them later. This affects output
of `nmcli connection` and `nmcli device`.
(cherry picked from commit a5f9f2fbfc)
Additional logic will be added, that makes the switch() approach
more cumbersome. Use a sorted array instead to find the priority.
(cherry picked from commit 8ccd1f7bfe)
CMP() is a confusing pattern. Sure enough, the sort order was wrong, for
example, `nmcli connection` would show
$ nmcli -f STATE,UUID,DEVICE c
STATE UUID DEVICE
activating 3098c902-c59c-45f4-9e5a-e4cdb79cfe1b nm-bond
activated e4fc23ac-54ab-4b1a-932a-ebed12c96d9b eth1
("activating" shown before "activated").
With `nmcli device`, we sort with compare_devices(). This first sorts by
device state (with "connected" being sorted first). Only when the device
state is equal, we fallback to nmc_active_connection_cmp(). So with
`nmcli device` we usually get "connected" devices first, and we don't
really notice that there is a problem with nmc_active_connection_cmp().
On the other hand, `nmcli connection` likes to sort first via
nmc_active_connection_cmp(), which gets it wrong. Profiles in
"activating" state are sorted first. That's inconsistent with `nmcli
device`, but it's also not what is intended.
Fix that.
Note the change in the test output. Both eth1 and eth0 are connected to
to the same profile, but one "eth0" the active-connection's state is
DEACTIVATING, while on "eth1" it's ACTIVATED (but both device's states
are "CONNECTED"). That's why "eth1" is now sorted first (as desired).
Fixes: a1b25a47b0 ('cli: rework printing of `nmcli connection` for multiple active connections')
(cherry picked from commit 8e1330964d)
- test for "-order" option with `nmcli connection show`.
- test for order of activated devices. Optimally, the devices
should be in activating vs. activated state. I fail to do that,
the mock implementation is cumbersome to use. It still seems useful
to have this (maybe it could be improved).
(cherry picked from commit ca5fb29b7e)
If a commit is invoked without any change to the l3cd or to the ACD
data, in _l3cfg_update_combined_config() we skip calling
_l3_acd_data_add_all(), which should clear the dirty flag from ACDs.
Therefore, in case of such no-op commits the ACDs still marked as
dirty - but valid - are removed via:
_l3_commit()
_l3_acd_data_process_changes()
_l3_acd_data_prune()
_l3_acd_data_prune_one()
Invoking a l3cfg commit without any actual changes is allowed, see the
explanation in commit e773559d9d ('device: schedule an idle commit
when setting device's sys-iface-state').
The bug is visible by running test 'bond_addreses_restart_persistence'
with IPv4 ACD/DAD is enabled by default: after restart IPv6 completes
immediately, the devices becomes ACTIVATED, the sys-iface-state
transitions from ASSUME to MANAGED, a commit is done, and it
incorrectly prunes the ACD data. The result is that the IPv4 address
is never added again.
Fix this by doing the pruning only when we update the dirty flags.
This is a respin of commit ed565f9146 ('l3cfg: fix pruning of ACD
data') that was reverted because it was causing a crash. The crash was
caused by unconditionally clearing `acd_data_pruning_needed` in
_l3cfg_update_combined_config(), while we need to do it only when
actually committing the configuration.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1749
(cherry picked from commit 5b16c128bb)
The commit causes the following assertion failure:
0 0x00007f4187e22884 in __pthread_kill_implementation () from target:/lib64/libc.so.6
1 0x00007f4187dd1afe in raise () from target:/lib64/libc.so.6
2 0x00007f4187dba87f in abort () from target:/lib64/libc.so.6
3 0x00007f4188386f4e in g_assertion_message (domain=domain@entry=0x6fc1bc "nm", file=file@entry=0x722e94 "../src/core/nm-l3cfg.c", line=line@entry=2134,
func=func@entry=0x727730 <__func__.49> "_l3_acd_data_add_all", message=message@entry=0x23b3bb0 "assertion failed: (acd_data->info.track_infos[i]._priv.acd_dirty_track)")
at ../glib/gtestutils.c:3450
4 0x00007f41883f1597 in g_assertion_message_expr (domain=domain@entry=0x6fc1bc "nm", file=file@entry=0x722e94 "../src/core/nm-l3cfg.c", line=line@entry=2134,
func=func@entry=0x727730 <__func__.49> "_l3_acd_data_add_all", expr=expr@entry=0x726450 "acd_data->info.track_infos[i]._priv.acd_dirty_track") at ../glib/gtestutils.c:3476
5 0x0000000000587209 in _l3_acd_data_add_all (self=self@entry=0x23a7020, infos=infos@entry=0x0, infos_len=infos_len@entry=0, reapply=reapply@entry=1)
at ../src/core/nm-l3cfg.c:2134
6 0x0000000000587702 in _l3cfg_update_combined_config (self=self@entry=0x23a7020, to_commit=to_commit@entry=1, reapply=reapply@entry=1, out_old=out_old@entry=0x7ffd09ea4ca8,
out_changed_combined_l3cd=out_changed_combined_l3cd@entry=0x7ffd09ea4c7c) at ../src/core/nm-l3cfg.c:3858
7 0x000000000058a202 in _l3_commit (self=0x23a7020, commit_type=commit_type@entry=NM_L3_CFG_COMMIT_TYPE_REAPPLY, is_idle=is_idle@entry=0) at ../src/core/nm-l3cfg.c:5046
8 0x000000000058a49f in nm_l3cfg_commit (self=<optimized out>, commit_type=commit_type@entry=NM_L3_CFG_COMMIT_TYPE_REAPPLY) at ../src/core/nm-l3cfg.c:5115
9 0x00000000004856cd in nm_device_l3cfg_commit (self=self@entry=0x23ab870, commit_type=commit_type@entry=NM_L3_CFG_COMMIT_TYPE_REAPPLY, commit_sync=commit_sync@entry=1)
at ../src/core/devices/nm-device.c:4155
10 0x00000000004b1814 in nm_device_cleanup (self=self@entry=0x23ab870, reason=reason@entry=NM_DEVICE_STATE_REASON_NEW_ACTIVATION,
cleanup_type=cleanup_type@entry=CLEANUP_TYPE_DECONFIGURE) at ../src/core/devices/nm-device.c:15884
11 0x00000000004b26c9 in _set_state_full (self=self@entry=0x23ab870, state=state@entry=NM_DEVICE_STATE_DISCONNECTED, reason=NM_DEVICE_STATE_REASON_NEW_ACTIVATION,
quitting=quitting@entry=0) at ../src/core/devices/nm-device.c:16291
12 0x00000000004b2fe4 in nm_device_state_changed (self=self@entry=0x23ab870, state=state@entry=NM_DEVICE_STATE_DISCONNECTED, reason=<optimized out>)
at ../src/core/devices/nm-device.c:16505
13 0x00000000004b69de in queued_state_set (user_data=user_data@entry=0x23ab870) at ../src/core/devices/nm-device.c:16532
14 0x00007f41883bf4fd in g_idle_dispatch (source=0x23a88e0, callback=0x4b6956 <queued_state_set>, user_data=0x23ab870) at ../glib/gmain.c:6163
15 0x00007f41883c34fc in g_main_dispatch (context=0x22c4d10) at ../glib/gmain.c:3460
16 g_main_context_dispatch (context=0x22c4d10) at ../glib/gmain.c:4200
17 0x00007f41884216b8 in g_main_context_iterate.isra.0 (context=0x22c4d10, block=1, dispatch=1, self=<optimized out>) at ../glib/gmain.c:4276
18 0x00007f41883c2aff in g_main_loop_run (loop=0x22c3b50) at ../glib/gmain.c:4479
19 0x0000000000423a37 in main (argc=<optimized out>, argv=<optimized out>) at ../src/core/main.c:519
This reverts commit ed565f9146.
(cherry picked from commit 6a63d79fe6)
It seems more useful to have a best effort approach and configure
everything we can; in that way we achieve at least some connectivity,
and then sysadmin can check the logs in case something is
missing. Currently instead, the whole activation fails (so, no address
is configured) if just one of the addresses fails DAD.
Ideally, we should have a way to make this configurable; but for now,
implement the more useful behavior as default.
(cherry picked from commit a45024714f)
Clarify that the value is the *maximum* interval; the actual value is
randomized and can be as low as half the specified one.
(cherry picked from commit 536805231a)
IPv4 and IPv6 DAD work slightly differently: for IPv4 the presence or
absence of carrier doesn't have any effect on the duration of the
probe; for IPv6, DAD never completes without carrier because kernel
never removes the tentative flag.
In both cases, we shouldn't ignore the DAD result because that would
mean that we complete the ipmanual method without addresses actually
configured.
(cherry picked from commit 1f73034719)
We don't know the reason why the DHCP client is being stopped. It is
wrong to schedule a commit of type "update" because the device could
be now unmanaged. Schedule instead a commit of type "auto", which
automatically determines the type of commit based on registered
handles.
(cherry picked from commit a49913504d)
If a commit is invoked without any change to the l3cd or to the ACD
data, in _l3cfg_update_combined_config() we skip calling
_l3_acd_data_add_all(), which should clear the dirty flag from ACDs.
Therefore, in case of such no-op commits the ACDs still marked as
dirty - but valid - are removed via:
_l3_commit()
_l3_acd_data_process_changes()
_l3_acd_data_prune()
_l3_acd_data_prune_one()
Invoking a l3cfg commit without any actual changes is allowed, see the
explanation in commit e773559d9d ('device: schedule an idle commit
when setting device's sys-iface-state').
The bug is visible by running test 'bond_addreses_restart_persistence'
with IPv4 ACD/DAD is enabled by default: after restart IPv6 completes
immediately, the devices becomes ACTIVATED, the sys-iface-state
transitions from ASSUME to MANAGED, a commit is done, and it
incorrectly prunes the ACD data. The result is that the IPv4 address
is never added again.
Fix this by doing the pruning only when we update the dirty flags.
(cherry picked from commit ed565f9146)
Interfaces with IFF_NOARP don't support Address Conflict Detection,
which is based on ARP. Trying to start ACD on them would result in
ENOBUFS always being returned by send(), and n-acd handles such error
by retrying indefinitely.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit 7548ff57d3)
On interfaces not supporting ACD (for example, layer3 interfaces), the
probe fails to be created with message:
l3cfg[...,ifindex=2]: acd[172.25.17.1, init]: probe-good (interface does not support acd, initial post-commit)
l3cfg[...,ifindex=2]: acd[172.25.17.1, ready]: set state to ready (probe is ready, waiting for address to be configured)
During the post-commit event, if the address is not yet configured, we
need to schedule a new commit to actually add it.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit 687051368f)
Currently, all the probes of an acd instance share the same seed
state. This means that the state is updated by all the probes, and as
a consequence they get different jitters for the wait timeouts;
therefore the order in which addresses become available (and are
configured on the interface) is not deterministic.
Keep a separate seed state for each probe, initialized from the acd
seed. This ensures that all the probes use the same timeouts when
sending probe requests, and that in case of no collision, addresses
are available in the order of probe start.
n-acd pull request: https://github.com/nettools/n-acd/pull/10
(cherry picked from commit 23727917b2)
Currently, IPv4 shared mode fails to start when DAD is enabled because
dnsmasq tries to bind to an address that is not yet configured on the
interface. Delay the start of dnsmasq until the shared4 l3cd is ready.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit e97ebb2441)