NMClient is strongly tied to the GMainContext with which it was created.
Several operations must only be called from within the context. There
was an assertion for that.
However, creating (and init_async()) should be allowed to call not
from within the GMainContext. So if the current context has no owner
(is not acquired), then it's also OK.
Fix the assertion for that.
Fixes: ce0e898fb4 ('libnm: refactor caching of D-Bus objects in NMClient')
(cherry picked from commit ae0cc9618c)
Found with `git grep 'GError.*[^,)];'| grep ' *= *NULL;' -v`
Fixes: ce0e898fb4 ('libnm: refactor caching of D-Bus objects in NMClient')
(cherry picked from commit a9d521bf8c)
When cloud-init job (metadata service crawler) starts, it sends the
SIGTERM signal to nm-cloud-setup and force the nm-cloud-setup to
restart, however, because the error is not initialized as NULL in
`_init_start_cancelled_cb()` before it is set, nm-cloud-setup will hit
a dumped core.
TO fix it, initialize the error as NULL in `_init_start_cancelled_cb()`.
https://bugzilla.redhat.com/show_bug.cgi?id=2027674
Fixes: ce0e898fb4 ('libnm: refactor caching of D-Bus objects in NMClient')
Backtrace:
#0 g_logv (log_domain=0x7f833a872071 "GLib", log_level=G_LOG_LEVEL_WARNING, format=<optimized out>, args=<optimized out>) at ../glib/gmessages.c:1413
#1 0x00007f833a81f043 in g_log (log_domain=<optimized out>, log_level=<optimized out>, format=<optimized out>) at ../glib/gmessages.c:1451
#2 0x00007f833ab97230 in nm_utils_error_set_cancelled (is_disposing=<optimized out>, instance_name=<optimized out>, error=0x7ffff79cb980) at src/libnm-glib-aux/nm-shared-utils.c:2599
#3 nm_utils_error_set_cancelled (is_disposing=0, instance_name=0x0, error=0x7ffff79cb980) at src/libnm-glib-aux/nm-shared-utils.c:2590
#4 _init_start_cancelled_cb (cancellable=<optimized out>, user_data=0x5640ca292150) at src/libnm-client-impl/nm-client.c:7324
#5 _init_start_cancelled_cb (cancellable=<optimized out>, user_data=0x5640ca292150) at src/libnm-client-impl/nm-client.c:7307
#6 0x00007f833a93094a in _g_closure_invoke_va (param_types=0x0, n_params=<optimized out>, args=0x7ffff79cbb40, instance=0x5640ca267020, return_value=0x0, closure=0x5640ca29d430)
at ../gobject/gclosure.c:873
#7 g_signal_emit_valist (instance=0x5640ca267020, signal_id=<optimized out>, detail=0, var_args=var_args@entry=0x7ffff79cbb40) at ../gobject/gsignal.c:3406
#8 0x00007f833a930a93 in g_signal_emit (instance=instance@entry=0x5640ca267020, signal_id=<optimized out>, detail=detail@entry=0) at ../gobject/gsignal.c:3553
#9 0x00007f833a9a6475 in g_cancellable_cancel (cancellable=0x5640ca267020) at ../gio/gcancellable.c:513
#10 g_cancellable_cancel (cancellable=0x5640ca267020) at ../gio/gcancellable.c:487
#11 0x00005640ca1a8bd4 in sigterm_handler (user_data=0x5640ca267020) at src/nm-cloud-setup/main.c:599
#12 0x00007f833a819d4f in g_main_dispatch (context=0x5640ca268ef0) at ../glib/gmain.c:3337
#13 g_main_context_dispatch (context=0x5640ca268ef0) at ../glib/gmain.c:4055
#14 0x00007f833a86e608 in g_main_context_iterate.constprop.0 (context=0x5640ca268ef0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4131
#15 0x00007f833a819463 in g_main_loop_run (loop=0x5640ca24fdb0) at ../glib/gmain.c:4329
#16 0x00005640ca1a6d04 in nmc_client_new_waitsync (cancellable=0x5640ca267020, out_nmc=0x7ffff79cbfa0, error=0x7ffff79cbf98, first_property_name=0x5640ca1b11db "instance-flags",
first_property_name=0x5640ca1b11db "instance-flags") at src/libnm-client-aux-extern/nm-libnm-aux.c:129
#17 0x00005640ca1a3863 in main (argc=1, argv=<optimized out>) at src/nm-cloud-setup/main.c:639
(cherry picked from commit 549761b0ad)
Before, we would just ignore the errors when we passed an invalid value
to a property alias:
$ nmcli c add type ethernet mac Hello
Connection 'ethernet-1' (242eec76-7147-411a-a50b-336cf5bc8137) successfully added.
$ nmcli c show 242eec76-7147-411a-a50b-336cf5bc8137 |grep 802-3-ethernet.mac-address:
802-3-ethernet.mac-address: --
...or crash, because the GError would still be around:
$ nmcli c add type ethernet mac Hello ethernet.mac-address World
(process:734670): GLib-WARNING **: 14:52:51.436: GError set over the top of a previous GError or uninitialized memory.
This indicates a bug in someone's code. You must ensure an error is NULL before it's set.
The overwriting error message was: Error: failed to modify 802-3-ethernet.mac-address: 'World' is not a valid Ethernet MAC.
Error: failed to modify 802-3-ethernet.mac-address: 'Hello' is not a valid Ethernet MAC.
Now we catch it early enough:
$ nmcli c add type ethernet mac Hello
Error: failed to modify 802-3-ethernet.mac-address: 'Hello' is not a valid Ethernet MAC.
Fixes: 40032f4614 ('cli: fix resetting values via property alias')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1134
(cherry picked from commit a7ef068186)
nm_l3_config_data_new_clone() takes non-positive ifindex to use
the ifindex of the l3cd. But it also asserts that the ifindex
is not negative. Fix that assertion failure, by setting the ifindex
to zero.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/907
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit 5402a72179)
Hope third time is the charm.
The idea here is to remove the OVSDB entry if the device actually went away
violently (like, the it was actually removed from the platform), but keep it if
we're shutting down.
Fixes-test: @ovs_nmstate
Fixes: 966413e78f ('ovs-port: avoid removing the OVSDB entry if we're shutting down')
Fixes: ecc73eb239 ('ovs-port: always remove the OVSDB entry on slave release')
https://bugzilla.redhat.com/show_bug.cgi?id=2055665
(cherry picked from commit 65fdfb2500)
Currently it is possible to specify a list of default settings plugins
to be used when configuration doesn't contain the main.plugins key.
We want to add a mechanism that allows to automatically load any
plugin found in the plugins directory without needing
configuration. This mechanism is useful when plugins are shipped in a
different, optional subpackage, to automatically use them.
With such mechanism, the actual list of plugins will be determined
(in order of evaluation):
1. via explicit user configuration in /etc, if any
2. via distro configuration in /usr, if any
3. using the build-time default, if any
4. looking for known plugins in /usr/lib
(cherry picked from commit 392daa5dab)
When we have a bridge interface with ports attached externally (that is,
not by NetworkManager itself), then it can make sense that during
checkpoint rollback we want to keep those ports attached.
During rollback, we may need to deactivate the bridge device and
re-activate it. Implement this, by setting a flag before deactivating,
which prevents external ports to be detached. The flag gets cleared,
when the device state changes to activated (the following activation)
or unmanaged.
This is an ugly solution, for several reasons.
For one, NMDevice tracks its ports in the "slaves" list. But what
it does is ugly. There is no clear concept to understand what it
actually tacks. For example, it tracks externally added interfaces
(nm_device_sys_iface_state_is_external()) that are attached while
not being connected. But it also tracks interfaces that we want to attach
during activation (but which are not yet actually enslaved). It also tracks
slaves that have no actual netdev device (OVS). So it's not clear what this
list contains and what it should contain at any point in time. When we skip
the change of the slaves states during nm_device_master_release_slaves_all(),
it's not really clear what the effects are. It's ugly, but probably correct
enough. What would be better, if we had a clear purpose of what the
lists (or several lists) mean. E.g. a list of all ports that are
currently, physically attached vs. a list of ports we want to attach vs.
a list of OVS slaves that have no actual netdev device.
Another problem is that we attach state on the device
("activation_state_preserve_external_ports"), which should linger there
during the deactivation and reactivation. How can we be sure that we don't
leave that flag dangling there, and that the desired following activation
is the one we cared about? If the follow-up activation fails short (e.g. an
unmanaged command comes first), will we properly disconnect the slaves?
Should we even? In practice, it might be correct enough.
Also, we only implement this for bridges. I think this is where it makes
the most sense. And after all, it's an odd thing to preserve unknown,
external things during a rollback -- unknown, because we have no knowledge
about why these ports are attached and what to do with them.
Also, the change doesn't remember the ports that were attached when the
checkpoint was created. Instead, we preserve all ports that are attached
during rollback. That seems more useful and easier to implement. So we
don't actually rollback to the configuration when the checkpoint was
created. Instead, we rollback, but keep external devices.
Also, we do this now by default and introduce a flag to get the previous
behavior.
https://bugzilla.redhat.com/show_bug.cgi?id=2035519https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/ # 909
(cherry picked from commit 98b3056604)
For devices that configure IP by themselves (by returning
"->ready_for_ip_config() = TRUE" and implementing
->act_stage3_ip_config()), we skip manual configuration. Currently,
manual configuration is the only one that sets flag HAS_DNS_PRIORITY
into the resulting l3cd.
So, the merged l3cd for such devices misses a dns-priority and is
ignored by the DNS manager.
Explicitly initialize the priority to 0; in this way, the default
value for the device will be set in the final l3cd during the merge.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/931
(cherry picked from commit b2e559fab2)
This also causes leaks with recent glib, which can be found via valgrind.
Fixes: c7b3862503 ('platform: add network namespace support to platform')
(cherry picked from commit 1a1c22e38c)
We can get a platform signal for any number of reasons. In particular,
we can get a signal that the object is present in platform, while the object
is tracked as zombie.
"Zombies" are objects that were actively configured by NetworkManager, but
now no longer and thus will need to be removed. We remember them as objects
that we need to delete.
The assertion was wrong. We don't need to handle the case "in_platform"
and linked in "os_zombie_lst" specially. If we get a signal that the
object exists while being a zombie, that is fine and not something to
handle specially.
Backtrace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f6a208f1db5 in __GI_abort () at abort.c:79
#2 0x00007f6a212ed123 in g_assertion_message (domain=<optimized out>, file=<optimized out>, line=<optimized out>,
func=0x560e23ada2c0 <__func__.39909> "_obj_states_externally_removed_track", message=<optimized out>) at gtestutils.c:2533
#3 0x00007f6a2134620e in g_assertion_message_expr (domain=domain@entry=0x560e23b781a0 "nm", file=file@entry=0x560e23acec60 "src/core/nm-l3cfg.c", line=line@entry=920,
func=func@entry=0x560e23ada2c0 <__func__.39909> "_obj_states_externally_removed_track", expr=expr@entry=0x560e23ad1980 "c_list_is_empty(&obj_state->os_zombie_lst)") at gtestutils.c:2556
#4 0x0000560e23853f38 in _obj_states_externally_removed_track (self=self@entry=0x560e25f168e0, obj=<optimized out>, obj@entry=0x560e25e466a0, in_platform=in_platform@entry=1)
at src/core/nm-l3cfg.c:920
#5 0x0000560e2385b8ea in _nm_l3cfg_notify_platform_change (self=0x560e25f168e0, change_type=change_type@entry=NM_PLATFORM_SIGNAL_CHANGED, obj=0x560e25e466a0) at src/core/nm-l3cfg.c:1364
#6 0x0000560e23861251 in _platform_signal_cb (platform=<optimized out>, obj_type_i=<optimized out>, ifindex=<optimized out>, platform_object=0x560e25e466b8, change_type_i=2,
p_self=<optimized out>) at ./src/libnm-platform/nmp-object.h:443
#7 0x00007f6a1c4a914e in ffi_call_unix64 () at ../src/x86/unix64.S:76
#8 0x00007f6a1c4a8aff in ffi_call (cif=cif@entry=0x7fffac40e570, fn=fn@entry=0x560e23861100 <_platform_signal_cb>, rvalue=<optimized out>, avalue=avalue@entry=0x7fffac40e480)
at ../src/x86/ffi64.c:525
#9 0x00007f6a217fee85 in g_cclosure_marshal_generic (closure=<optimized out>, return_gvalue=<optimized out>, n_param_values=<optimized out>, param_values=<optimized out>,
invocation_hint=<optimized out>, marshal_data=<optimized out>) at gclosure.c:1490
#10 0x00007f6a217fe3bd in g_closure_invoke (closure=0x560e25df53c0, return_value=0x0, n_param_values=5, param_values=0x7fffac40e7a0, invocation_hint=0x7fffac40e720) at gclosure.c:804
#11 0x00007f6a21811945 in signal_emit_unlocked_R (node=node@entry=0x7f6a00008870, detail=detail@entry=0, instance=instance@entry=0x560e25ddd080, emission_return=emission_return@entry=0x0,
instance_and_params=instance_and_params@entry=0x7fffac40e7a0) at gsignal.c:3636
#12 0x00007f6a2181aa56 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fffac40e9c0) at gsignal.c:3392
#13 0x00007f6a2181b093 in g_signal_emit (instance=instance@entry=0x560e25ddd080, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3448
#14 0x0000560e2392deea in nm_platform_cache_update_emit_signal (self=0x560e25ddd080, cache_op=NMP_CACHE_OPS_UPDATED, obj_old=<optimized out>, obj_new=<optimized out>)
at src/libnm-platform/nm-platform.c:8824
#15 0x0000560e238fd520 in event_handler_recvmsgs () at src/libnm-platform/nm-linux-platform.c:7183
#16 0x0000560e238fdcbf in event_handler_read_netlink () at src/libnm-platform/nm-linux-platform.c:9403
#17 0x0000560e238ffab3 in delayed_action_handle_one () at src/libnm-platform/nm-linux-platform.c:6238
#18 0x0000560e238ffcae in delayed_action_handle_all () at src/libnm-platform/nm-linux-platform.c:6256
#19 0x0000560e23901acc in do_delete_object () at src/libnm-platform/nm-linux-platform.c:7392
#20 0x0000560e2390227c in ip4_address_delete () at src/libnm-platform/nm-linux-platform.c:8782
#21 0x0000560e23922709 in nm_platform_ip4_address_delete (self=self@entry=0x560e25ddd080, ifindex=ifindex@entry=150, address=16843009, plen=<optimized out>, peer_address=16843009)
at src/libnm-platform/nm-platform.c:3574
#22 0x0000560e239275ab in nm_platform_ip_address_sync (self=0x560e25ddd080, addr_family=addr_family@entry=2, ifindex=150, known_addresses=<optimized out>, known_addresses@entry=0x0,
addresses_prune=0x560e25e81aa0) at src/libnm-platform/nm-platform.c:3984
#23 0x0000560e23855e17 in _l3_commit_one (self=0x560e25f168e0, addr_family=2, commit_type=<optimized out>, l3cd_old=<optimized out>, changed_combined_l3cd=<optimized out>)
at src/core/nm-l3cfg.c:4256
#24 0x0000560e2385fc5c in _l3_commit (self=0x560e25f168e0, commit_type=NM_L3_CFG_COMMIT_TYPE_REAPPLY, is_idle=<optimized out>) at src/core/nm-l3cfg.c:4353
#25 0x0000560e239c6a6d in nm_device_cleanup (self=0x560e25e985e0, reason=<optimized out>, cleanup_type=CLEANUP_TYPE_DECONFIGURE) at src/core/devices/nm-device.c:15082
#26 0x0000560e239c7522 in _set_state_full (self=0x560e25e985e0, state=<optimized out>, reason=<optimized out>, quitting=0) at src/core/devices/nm-device.c:15467
#27 0x0000560e239cd482 in queued_state_set (user_data=user_data@entry=0x560e25e985e0) at src/core/devices/nm-device.c:15706
#28 0x00007f6a2131b27b in g_idle_dispatch (source=0x560e25ebab60, callback=0x560e239cd3d0 <queued_state_set>, user_data=0x560e25e985e0) at gmain.c:5579
#29 0x00007f6a2131e95d in g_main_dispatch (context=0x560e25d97bc0) at gmain.c:3193
#30 g_main_context_dispatch (context=context@entry=0x560e25d97bc0) at gmain.c:3873
#31 0x00007f6a2131ed18 in g_main_context_iterate (context=0x560e25d97bc0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3946
#32 0x00007f6a2131f042 in g_main_loop_run (loop=0x560e25d730f0) at gmain.c:4142
#33 0x0000560e237c06ec in main (argc=<optimized out>, argv=<optimized out>) at src/core/main.c:509
Fixes: 929eae245d ('l3cfg: implement NM_L3CFG_CONFIG_FLAGS_ASSUME_CONFIG_ONCE and rework object state')
(cherry picked from commit 849a4eee5c)
After the first time committing, the routes and addresses are removed
directly by bypassing the l3cfg in `nm_device_cleanup()`, then when
committing the second time, the l3cfg think that some addresses are
still configured but they are actually already disappeared from the
kernel already.
To fix it, commit the l3cd changes through l3cfg instead of removing
the addresses/routes directly.
https://bugzilla.redhat.com/show_bug.cgi?id=2043514
Fixes-test: @nmcli_general_activate_static_connection_carrier_not_ignored
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1115
(cherry picked from commit 9f6114afe8)
The DPDK port will not have a link after the devbind which is needed for
configuring an interface to be a DPDK port. The MTU is being committed
during the link change but for DPDK ports there is no link.
The DPDK port MTU should be set on ovsdb right after the interface is
added to ovsdb. This way the users will be able to set MTU for DPDK
ports and modify it.
Please see the following results:
```
port 2: iface0 (dpdk: configured_rx_queues=1, configured_rxq_descriptors=2048, configured_tx_queues=3,
configured_txq_descriptors=2048, lsc_interrupt_mode=false, mtu=2000, requested_rx_queues=1,
requested_rxq_descriptors=2048, requested_tx_queues=3, requested_txq_descriptors=2048, rx_csum_offload=true, tx_tso_offload=false)
```
(cherry picked from commit 59c60cccf5)
Since commit ecc73eb239 ('ovs-port: always remove the OVSDB entry on
slave release'), ovs port were removing the ovsdb entry upon being
un-enslaved, no matter what the reason for un-enslavement was. The idea
was to remove the stale ovsdb entry upon forcible device removal.
This cleanup is specific to OpenVSwitch, since for other device types,
the device master is the property of the slave and thus goes away along
with the device.
Turns out we're now removing the ovsdb entry even when the device
actually doesn't go away, but we're pretending it does because the
daemon is shutting down.
To add insult to injury, we generally end up removing one entry,
because the other ovsdb calls end up in a queue and don't get serviced
before the daemon shuts down. The result is a mess. (This patch
doesn't solve that -- if someone terminates the daemon with in-flight
ovsdb calls they're still out of luck).
Let's do the cleanup now only if the device was actually physically
removed.
Fixes-test: @NM_reboot_openvswitch_vlan_configuration
Fixes: ecc73eb239 ('ovs-port: always remove the OVSDB entry on slave release')
https://bugzilla.redhat.com/show_bug.cgi?id=2055665https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1117
(cherry picked from commit 897977e960)
As we iterate over "self->num_freqs", we must not modify "freqs",
otherwise, the second and subsequenty frequencies in self->freqs[i]
cannot match.
Fixes: dd8c546ff0 ('2007-12-27 Dan Williams <dcbw@redhat.com>')
Fixes: ba8527ca58 ('wifi: preliminary nl80211 patch')
(cherry picked from commit 4f9f0587d5)
pppd is a delicate flower. On orderly shutdown, it likes to tell the
other side. This seems to take at least a second even when no real
network latency is at play, on busy systems 1.5 seconds easily ends up
being inadequate.
A violent shutdown is generally okay apart from that it can leave
garbage (port lock) behind and the other side potentially confused for a
while.
As it happens, this interacts badly with modemu.pl which is used for
testing: the pseudo terminal in PPP line discipline mode has no idea
that the remote disconnected and while ModemManager is learning that
something wrong the hard way (AT command timing out, because the remote
still expects to talk PPP), the test times out.
Let's increase the timeout to something more reasonable.
https://bugzilla.redhat.com/show_bug.cgi?id=2049596https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1103
(cherry picked from commit 47ff99515f)
Don't progress to the IP ready state until all objects are committed
to platform. Note that l3cfg has a 20 seconds timeout after which
unavailable objects are considered "definitely unavailable" and are
removed from the list.
Fixes-test: @ipv6_routes_with_src
https://bugzilla.redhat.com/show_bug.cgi?id=2043133
(cherry picked from commit f15b3f15a7)
l3cfg has a "temp_not_available" list of objects that couldn't be
added to platform, but can be added once some preconditions become
true (for example, a IPv6 route with a "src" attribute requires a
non-tentative src address to be present).
Retry to commit those objects once all addresses have completed
ACD/DAD.
(cherry picked from commit 9a090fdf7b)
nm_l3_config_data_get_nameservers() returns a pointer to "struct in6_addr". Not
a pointer to pointers.
#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:389
#1 0x00007f8060dd9109 in memcpy (__len=<optimized out>, __src=0xfd, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:29
#2 g_array_append_vals (len=1, data=0xfd, farray=0x55dd69332130) at ../glib/garray.c:522
#3 g_array_append_vals (farray=0x55dd69332130, data=0xfd, len=1) at ../glib/garray.c:509
#4 0x000055dd68d2a27d in _garray_inaddr_add (p_arr=<optimized out>, addr_family=<optimized out>, addr=0xfd) at src/core/nm-l3-config-data.c:295
#5 0x000055dd68ef6510 in nm_l3_config_data_add_nameserver (nameserver=<optimized out>, addr_family=10, self=0x55dd6949f900) at src/core/nm-l3-config-data.c:1442
#6 nm_device_copy_ip6_dns_config (self=0x55dd693c4420, from_device=<optimized out>) at src/core/devices/nm-device.c:10468
#7 0x00007f8060f28aba in _g_closure_invoke_va (param_types=0x0, n_params=<optimized out>, args=0x7fffed43d610, instance=0x55dd693c4420, return_value=0x0, closure=0x55dd693cdb10)
at ../gobject/gclosure.c:893
#8 g_signal_emit_valist (instance=0x55dd693c4420, signal_id=<optimized out>, detail=0, var_args=var_args@entry=0x7fffed43d610) at ../gobject/gsignal.c:3406
#9 0x00007f8060f28c03 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at ../gobject/gsignal.c:3553
#10 0x000055dd68efd1fb in _dev_ipac6_start (self=0x55dd693c4420) at src/core/devices/nm-device.c:11348
#11 0x000055dd68efd698 in _dev_ipac6_start_continue (self=0x55dd693c4420) at src/core/devices/nm-device.c:11373
#12 _dev_ipll6_set_llstate (self=0x55dd693c4420, llstate=<optimized out>, lladdr=<optimized out>) at src/core/devices/nm-device.c:10576
#13 0x000055dd68e7915e in _emit_changed_on_idle_cb (user_data=user_data@entry=0x55dd6941ca50) at src/core/nm-l3-ipv6ll.c:221
#14 0x00007f8060e0639b in g_idle_dispatch (source=0x55dd693eea30, callback=0x55dd68e78fd0 <_emit_changed_on_idle_cb>, user_data=0x55dd6941ca50) at ../glib/gmain.c:5897
#15 0x00007f8060e0a05f in g_main_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:3381
#16 g_main_context_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:4099
#17 0x00007f8060e5f2a8 in g_main_context_iterate.constprop.0 (context=0x55dd6922c800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4175
#18 0x00007f8060e09773 in g_main_loop_run (loop=0x55dd69211010) at ../glib/gmain.c:4373
#19 0x000055dd68d09c7b in main (argc=<optimized out>, argv=<optimized out>) at src/core/main.c:509
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit a2c8a3228b)
Add support for IPv6 multipath routes, by treating them as single-hop
routes. Otherwise, we can easily end up with an inconsistent platform
cache.
Background:
-----------
Routes are hard. We have NMPlatform which is a cache of netlink objects.
That means, we have a hash table and we cache objects based on some
identity (nmp_object_id_equal()). So those objects must have some immutable,
indistinguishable properties that determine whether an object is the
same or a different one.
For routes and routing rules, this identifying property is basically a subset
of the attributes (but not all!). That makes it very hard, because tomorrow
kernel could add an attribute that becomes part of the identity, and NetworkManager
wouldn't recognize it, resulting in cache inconsistency by wrongly
thinking two different routes are one and the same. Anyway.
The other point is that we rely on netlink events to maintain the cache.
So when we receive a RTM_NEWROUTE we add the object to the cache, and
delete it upon RTM_DELROUTE. When you do `ip route replace`, kernel
might replace a (different!) route, but only send one RTM_NEWROUTE message.
We handle that by somehow finding the route that was replaced/deleted. It's
ugly. Did I say, that routes are hard?
Also, for IPv4 routes, multipath attributes are just a part of the
routes identity. That is, you add two different routes that only differ
by their multipath list, and then kernel does as you would expect.
NetworkManager does not support IPv4 multihop routes and just ignores
them.
Also, a multipath route can have next hops on different interfaces,
which goes against our current assumption, that an NMPlatformIP4Route
has an interface (or no interface, in case of blackhole routes). That
makes it hard to meaningfully support IPv4 routes. But we probably don't
have to, because we can just pretend that such routes don't exist and
our cache stays consistent (at least, until somebody calls `ip route
replace` *sigh*).
Not so for IPv6. When you add (`ip route append`) an IPv6 route that is
identical to an existing route -- except their multipath attribute -- then it
behaves as if the existing route was modified and the result is the
merged route with more next-hops. Note that in this case kernel will
only send a RTM_NEWROUTE message with the full multipath list. If we
would treat the multipath list as part of the route's identity, this
would be as if kernel deleted one routes and created a different one (the
merged one), but only sending one notification. That's a bit similar to
what happens during `ip route replace`, but it would be nightmare to
find out which route was thereby replaced.
Likewise, when you delete a route, then kernel will "subtract" the
next-hop and sent a RTM_DELROUTE notification only about the next-hop that
was deleted. To handle that, you would have to find the full multihop
route, and replace it with the remainder after the subtraction.
NetworkManager so far ignored IPv6 routes with more than one next-hop, this
means you can start with one single-hop route (that NetworkManger sees
and has in the platform cache). Then you create a similar route (only
differing by the next-hop). Kernel will merge the routes, but not notify
NetworkManager that the single-hop route is not longer a single-hop
route. This can easily cause a cache inconsistency and subtle bugs. For
IPv6 we MUST handle multihop routes.
Kernels behavior makes little sense, if you expect that routes have an
immutable identity and want to get notifications about addition/removal.
We can however make sense by it by pretending that all IPv6 routes are
single-hop! With only the twist that a single RTM_NEWROUTE notification
might notify about multiple routes at the same time. This is what the
patch does.
The Patch
---------
Now one RTM_NEWROUTE message can contain multiple IPv6 routes
(NMPObject). That would mean that nmp_object_new_from_nl() needs to
return a list of objects. But it's not implemented that way. Instead,
we still call nmp_object_new_from_nl(), and the parsing code can
indicate that there is something more, indicating the caller to call
nmp_object_new_from_nl() again in a loop to fetch more objects.
In practice, I think all RTM_DELROUTE messages for IPv6 routes are
single-hop. Still, we implement it to handle also multi-hop messages the
same way.
Note that we just parse the netlink message again from scratch. The alternative
would be to parse the first object once, and then clone the object and
only update the next-hop. That would be more efficient, but probably
harder to understand/implement.
https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20
(cherry picked from commit dac12a8d61)
To parse the RTA_MULTIHOP message, "policy" is not right (which is used
to parse the overall message). Instead, we don't really have a special
policy that we should use.
This was not a severe issue, because the allocated buffer (with
G_N_ELEMENTS(policy) elements) was larger than need be. And apparently,
using the wrong policy also didn't cause us to reject important
messages.
(cherry picked from commit 997d72932d)
IPv4:
routes
A list of IPv4 destination addresses, prefix length, optional IPv4
next hop addresses, optional route metric, optional attribute. The
valid syntax is: "ip[/prefix] [next-hop] [metric]
[attribute=val]...[,ip[/prefix]...]". For example "192.0.2.0/24
10.1.1.1 77, 198.51.100.0/24".
Various attributes are supported:
• "cwnd" - an unsigned 32 bit integer.
• "initcwnd" - an unsigned 32 bit integer.
• "initrwnd" - an unsigned 32 bit integer.
• "lock-cwnd" - a boolean value.
• "lock-initcwnd" - a boolean value.
• "lock-initrwnd" - a boolean value.
• "lock-mtu" - a boolean value.
• "lock-window" - a boolean value.
• "mtu" - an unsigned 32 bit integer.
• "onlink" - a boolean value.
• "scope" - an unsigned 8 bit integer. IPv4 only.
• "src" - an IPv4 address.
• "table" - an unsigned 32 bit integer. The default depends on
ipv4.route-table.
• "tos" - an unsigned 8 bit integer. IPv4 only.
• "type" - one of unicast, local, blackhole, unavailable,
prohibit. The default is unicast.
• "window" - an unsigned 32 bit integer.
For details see also `man ip-route`.
Format: a comma separated list of routes
IPv6:
routes
A list of IPv6 destination addresses, prefix length, optional IPv6
next hop addresses, optional route metric, optional attribute. The
valid syntax is: "ip[/prefix] [next-hop] [metric]
[attribute=val]...[,ip[/prefix]...]".
Various attributes are supported:
• "cwnd" - an unsigned 32 bit integer.
• "from" - an IPv6 address with optional prefix. IPv6 only.
• "initcwnd" - an unsigned 32 bit integer.
• "initrwnd" - an unsigned 32 bit integer.
• "lock-cwnd" - a boolean value.
• "lock-initcwnd" - a boolean value.
• "lock-initrwnd" - a boolean value.
• "lock-mtu" - a boolean value.
• "lock-window" - a boolean value.
• "mtu" - an unsigned 32 bit integer.
• "onlink" - a boolean value.
• "src" - an IPv6 address.
• "table" - an unsigned 32 bit integer. The default depends on
ipv6.route-table.
• "type" - one of unicast, local, blackhole, unavailable,
prohibit. The default is unicast.
• "window" - an unsigned 32 bit integer.
For details see also `man ip-route`.
Format: a comma separated list of routes
(cherry picked from commit 7b1e9a5c3d)
Certain route types (blackhole, unreachable, prohibit) are not tied to
an interface. They are thus global and we need to track them system wide
(or better: per network namespace). That is done by NMPRouteManager.
For the routing rules, it's NMDevice itself to track/untrack the rules.
That is done for historical reasons, at the time, NML3Cfg did not exit.
Now with NML3Cfg, it seems that also NML3Cfg should be the part that
handles nodev routes. One reason is that we want to move IP
functionality out of NMDevice. So callers (NMDevice) would just add
blackhole routes to the NML3ConfigData and let NML3Cfg handle them.
Still, to handle these routes is rather different from regular routes.
Normally, NML3Cfg tracks an object state (ObjStateData) for each address/route,
and it hooks into platform signals to update the os_plobj field. Those signals
are dispatched by NMNetns and are only per-ifindex. Hence, NML3Cfg
wouldn't be notified about those nodev routes. Consequently, there
os_plobj could not be (efficiently) maintained and there is no
ObjStateData for such routes.
Instead, all that NML3Cfg does is have the routes in the NML3ConfigData and
tell NMPRouteManager about them. Seems simple enough. The only question
is when should NMPRouteManager sync? For now, we sync when the
track/untracking brings any changes and during reapply. Which is
probably fine.
(cherry picked from commit 9ab53e561a)
Specifically, in nm_utils_ip_route_attribute_to_platform() and in
_l3_config_data_add_obj() handle such new route type. For the moment,
they cannot be stored in a valid NMSettingIPConfig, but later this will
be necessary.
(cherry picked from commit 6255e0dcac)
This will be required next, when we will have also routes without a
device. Split the generation of the route list out.
(cherry picked from commit e32bc6d248)
The general idea is that when we have entries tracked by the
route-manager, that we can mark them all as dirty. Then, calling the
"track" function will reset the dirty flag. Finally, there is a method
to delete all dirty entries.
As we can lookup an entry with O(1) (using dictionaries), we can
sync the list of tracked objects with O(n). We just need to track
all the ones we care about, and then delete those that were not touched
(that is, are still dirty).
Previously, we had to explicitly mark all entries as dirty. We can do
better. Just let nmp_route_manager_untrack_all() mark the survivors as
dirty right away. This way, we can save iterating the list once.
It also makes sense because the only purpose of the dirty flag is to
aid this prune mechanism with track/untrack-all. So, untrack-all can
just help out, and leave the remaining entries dirty, so that the next
track does the right thing.
(cherry picked from commit 9e90bb0817)
We now track up to three kinds of object types in NMPRouteManager.
There is only one place, where we need to iterate over all objects of
the same type (e.g. all ipv4-routes), and that is nmp_route_manager_sync().
Previously, we only had one GHashTable with all the object, and when
iterating we had to skip over them after checking the type. That has some
overhead, but OK.
The ugliness with iterating over a GHashTable is that the order is non
deterministic. We should have a defined order in which things happen. To
achieve that, track three different CList, one for each object type.
Also, I expect that to be slightly faster, as you only have to iterate
over the list you care about.
(cherry picked from commit f315ca9e84)
NM_HASH_OBFUSCATE_PTR() is some snake-oil to not log raw pointer values.
It obviously makes debugging harder.
But we don't need to generate differently obfuscated pointer values.
At least, let most users use the same obfuscation, so that the values
are comparable.
(cherry picked from commit 3e6c8d220a)
Let's just always allocate the hash tables. We will likely need them,
and three hash tables are relatively cheap.
(cherry picked from commit 5b3e96451b)
Routes of type blackhole, unreachable, prohibit don't have an
ifindex/device. They are thus in many ways similar to routing rules,
as they are global. We need a mediator to keep track which routes
to configure.
This will be very similar to what NMPRulesManager already does for
routing rules. Rename the API, so that it also can be used for routes.
Renaming the file will be done next, so that git's rename detection
doesn't get too confused.
(cherry picked from commit ea4f6d7994)