From libnl3, we only used the helper function to parse/generate netlink
messages and the socket functions to send/receive messages. We don't
need an external dependency to do that, it is simple enough.
Drop the libnl3 dependency, and replace all missing code by directly
copying it from libnl3 sources. At this point, I mostly tried to
import the required bits to make it working with few modifications.
Note that this increases the binary size of NetworkManager by 4736 bytes
for contrib/rpm build on x86_64. In the future, we can simplify the code
further.
A few modifications from libnl3 are:
- netlink errors NLE_* are now in the domain or regular errno.
The distinction of having to bother with two kinds of error
number domains was annoying.
- parts of the callback handling is copied partially and unused parts
are dropped. Especially, the verbose/debug handlers are not used.
In following commits, the callback handling will be significantly
simplified.
- the complex handling of seleting ports was simplified. We now always
let kernel choose the right port automatically.
Platform invokes change events while reading netlink events. However,
platform code is not re-entrant and calling into platform again is not
allowed (aside operations that do not process the netlink socket, like
lookup of the platform cache).
That basically means, we have to always process events in an idle
handler. That is not a too strong limitation, because we anyway don't
know the call context in which the platform event is emitted and we
should avoid unguarded recursive calls into platform.
Otherwise, we get hit an assertion/crash in nm-iface-helper:
1 raise()
2 abort()
3 g_assertion_message()
4 g_assertion_message_expr()
5 do_delete_object()
6 ip6_address_delete()
>>> 7 nm_platform_ip6_address_delete()
8 nm_platform_ip6_address_sync()
9 nm_ip6_config_commit()
10 ndisc_config_changed()
11 ffi_call_unix64()
12 ffi_call()
13 g_cclosure_marshal_generic_va()
14 _g_closure_invoke_va()
15 g_signal_emit_valist()
16 g_signal_emit()
>>> 17 nm_ndisc_dad_failed()
18 ffi_call_unix64()
19 ffi_call()
20 g_cclosure_marshal_generic()
21 g_closure_invoke()
22 signal_emit_unlocked_R()
23 g_signal_emit_valist()
24 g_signal_emit()
>>> 25 nm_platform_cache_update_emit_signal()
26 event_handler_recvmsgs()
27 event_handler_read_netlink()
28 delayed_action_handle_one()
29 delayed_action_handle_all()
30 do_delete_object()
31 ip6_address_delete()
32 nm_platform_ip6_address_delete()
33 nm_platform_ip6_address_sync()
>>> 34 nm_ip6_config_commit()
35 ndisc_config_changed()
36 ffi_call_unix64()
37 ffi_call()
38 g_cclosure_marshal_generic_va()
39 _g_closure_invoke_va()
40 g_signal_emit_valist()
41 g_signal_emit()
42 check_timestamps()
43 receive_ra()
44 ndp_call_eventfd_handler()
45 ndp_callall_eventfd_handler()
46 event_ready()
47 g_main_context_dispatch()
48 g_main_context_iterate.isra.22()
49 g_main_loop_run()
>>> 50 main()
NMPlatform already has a check to assert against recursive calls
in delayed_action_handle_all():
g_return_val_if_fail (priv->delayed_action.is_handling == 0, FALSE);
priv->delayed_action.is_handling++;
...
priv->delayed_action.is_handling--;
Fixes: f85728ecffhttps://bugzilla.redhat.com/show_bug.cgi?id=1546656
When NM quits it destroys all singletons including NMOvsdb, which
invokes callbacks for every pending method call. In the shutdown,
extra care must be taken to not access objects that are already in a
inconsistent state; for example here, the callback changes the device
state, and this causes an access to data that has already been
cleared:
#0 _g_log_abort (breakpoint=breakpoint@entry=1) at gmessages.c:554
#1 g_logv (log_domain=0x5635653b6817 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffb4b2c1e0) at gmessages.c:1362
#2 g_log (log_domain=log_domain@entry=0x5635653b6817 "NetworkManager", log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, format=format@entry=0x7fbb3f58fa4a "%s: assertion '%s' failed") at gmessages.c:1403
#3 g_return_if_fail_warning (log_domain=log_domain@entry=0x5635653b6817 "NetworkManager", pretty_function=pretty_function@entry=0x5635653b6b00 <__func__.34463> "nm_device_factory_manager_find_factory_for_connection", expression=expression@entry=0x5635653b6719 "factories_by_setting") at gmessages.c:2702
#4 nm_device_factory_manager_find_factory_for_connection (connection=connection@entry=0x56356627e0e0) at src/devices/nm-device-factory.c:243
#5 nm_manager_get_connection_iface (self=0x563566241080 [NMManager], connection=connection@entry=0x56356627e0e0, out_parent=out_parent@entry=0x0, error=error@entry=0x0) at src/nm-manager.c:1458
#6 check_connection_compatible (self=<optimized out>, connection=0x56356627e0e0) at src/devices/nm-device.c:4679
#7 check_connection_compatible (device=0x56356647b1b0 [NMDeviceOvsInterface], connection=0x56356627e0e0) at src/devices/ovs/nm-device-ovs-interface.c:95
#8 _nm_device_check_connection_available (self=0x56356647b1b0 [NMDeviceOvsInterface], connection=0x56356627e0e0, flags=NM_DEVICE_CHECK_CON_AVAILABLE_NONE, specific_object=0x0) at src/devices/nm-device.c:12102
#9 nm_device_check_connection_available (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface], connection=0x56356627e0e0, flags=flags@entry=NM_DEVICE_CHECK_CON_AVAILABLE_NONE, specific_object=specific_object@entry=0x0) at src/devices/nm-device.c:12131
#10 nm_device_recheck_available_connections (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface]) at src/devices/nm-device.c:12238
#11 _set_state_full (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface], state=state@entry=NM_DEVICE_STATE_FAILED, reason=reason@entry=NM_DEVICE_STATE_REASON_OVSDB_FAILED, quitting=quitting@entry=0) at src/devices/nm-device.c:13065
#12 nm_device_state_changed (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface], state=state@entry=NM_DEVICE_STATE_FAILED, reason=reason@entry=NM_DEVICE_STATE_REASON_OVSDB_FAILED) at src/devices/nm-device.c:13328
#13 del_iface_cb (error=<optimized out>, user_data=0x56356647b1b0) at src/devices/ovs/nm-device-ovs-port.c:160
#14 _transact_cb (self=self@entry=0x5635662b9ba0 [NMOvsdb], result=result@entry=0x0, error=0x563566259a10, user_data=user_data@entry=0x5635662ff320) at src/devices/ovs/nm-ovsdb.c:1449
#15 ovsdb_disconnect (self=self@entry=0x5635662b9ba0 [NMOvsdb]) at src/devices/ovs/nm-ovsdb.c:1331
#16 dispose (object=0x5635662b9ba0 [NMOvsdb]) at src/devices/ovs/nm-ovsdb.c:1558
#17 g_object_unref (_object=0x5635662b9ba0) at gobject.c:3293
#18 _nm_singleton_instance_destroy () at src/nm-core-utils.c:138
#19 _dl_fini () at dl-fini.c:253
#20 __run_exit_handlers (status=status@entry=0, listp=0x7fbb3e1ad6c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#21 __GI_exit (status=status@entry=0) at exit.c:99
#22 main (argc=1, argv=0x7fffb4b2cc38) at src/main.c:468
Add a new error code to indicate to callbacks that we are quitting and
no further action must be taken. This is preferable to having
additional references because it allows us to free the resources owned
by callbacks immediately, while references can easily create loops.
https://bugzilla.redhat.com/show_bug.cgi?id=1543871
Example: when dhcpv4 lease renewal fails, if ipv4.may-fail was "yes",
check also if we have a successful ipv6 conf: if not fail.
Previously we just ignored the other ip family status.
Convert the string representation of ipv4.dhcp-client-id property already in
NMDevice to a GBytes. Next, we will support more client ID modes, and we
will need the NMDevice context to generate the client id.
GByteArray is a mutable array of bytes. For every practical purpose, the hwaddr
property of NMDhcpClient is an immutable sequence of bytes. Thus, make it a
GBytes.
The two boolean properties do not need to be ever reset. It's nice
to initialize such properties in the constructor and don't mutate
them afterwards.
Instead of adding two boolean GObject properties, add a new flags property
that can encode these two values. In the end, properties are too
cumbersome, let's combine them.
Optimally, NMDhcpClient would be stateless and all paramters would
be passed on as argument. Clearly that is not feasable, because there
are so many paramters, and in many cases they need to be cached for the
lifetime of the client instance.
Instead of passing info_only paramter to ip6_start() and cache it
both in NMDhcpClient and NMDhcpSystemd, keep it in NMDhcpClient at
one place.
In the next commit, we will initialize info-only only once during the
constructor, so it is immutable and somewhat stateless.
The parent's stop() implementation does nothing interesting
for NMDhcpSystem. Still, call it, it's just unexpected to
not chain up the parent implementation, if all other subclasses
do it.
In general, if the parent's implementation is not suitable to be called
by the derived class, that should be handled differently then just not
chaining up. Otherwise it's inconsistent and confusing.
Don't use message data after calling curl_multi_remove_handle(). Fixes
the following asan error:
=================================================================
==13238==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000091ad0 at pc 0x55750f8d9a10 bp 0x7ffeb7f5f210 sp 0x7ffeb7f5f200
READ of size 8 at 0x608000091ad0 thread T0
#0 0x55750f8d9a0f in curl_check_connectivity (/usr/sbin/NetworkManager+0x190a0f)
#1 0x55750f8da7dd in curl_socketevent_cb (/usr/sbin/NetworkManager+0x1917dd)
#2 0x7f73cb64e8f8 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4a8f8)
#3 0x7f73cb64ec57 (/lib64/libglib-2.0.so.0+0x4ac57)
#4 0x7f73cb64ef29 in g_main_loop_run (/lib64/libglib-2.0.so.0+0x4af29)
#5 0x55750f85c3f4 (/usr/sbin/NetworkManager+0x1133f4)
#6 0x7f73c9f19384 in __libc_start_main (/lib64/libc.so.6+0x22384)
#7 0x55750f85d7f7 (/usr/sbin/NetworkManager+0x1147f7)
0x608000091ad0 is located 48 bytes inside of 88-byte region [0x608000091aa0,0x608000091af8)
freed by thread T0 here:
#0 0x7f73cd61f508 in __interceptor_free (/lib64/libasan.so.4+0xde508)
#1 0x7f73ca710eaa in curl_multi_remove_handle (/lib64/libcurl.so.4+0x32eaa)
previously allocated by thread T0 here:
#0 0x7f73cd61fa88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88)
#1 0x7f73ca710b3d in curl_multi_add_handle (/lib64/libcurl.so.4+0x32b3d)
SUMMARY: AddressSanitizer: heap-use-after-free (/usr/sbin/NetworkManager+0x190a0f)
After writing the connection to disk and rereading it, in addition to
restoring agent-owned secrets in the cache we must also restore
agent-owned secrets from the original connections since they are lost
during the write.
Reported-by: Märt Bakhoff <anon@sigil.red>
https://bugzilla.gnome.org/show_bug.cgi?id=793324
The bridge test (and no other either) no longer sets sysfs properties,
so this whole madness is no longer needed. That is good, because Linux
got somewhat stricter (at least in 4.15) about mounting sysfs and the
whole thing wouldn't work with containers where /sys is red-only from
the start.
If we're going to use a 'no content' URL (HTTP 204) to check
connectivity, do not try to match prefix when the content is being
received. This issue was making the check not work properly, as the content
returned by the captive portal was assumed as expected (given that
g_str_has_prefix(str,"") always returns TRUE!).
Also, rework a log message that was being emitted on portal detection
to avoid specifying that the reason is the content being shorter than
expected, as that same logic now applies to the case where too much
content is received and none was expected.
Fixes: 88416394f8https://mail.gnome.org/archives/networkmanager-list/2018-February/msg00009.html
The ifcfg-rh plugin provides its own D-Bus service which initscripts
query to determine whether NetworkManager handles an ifcfg file.
Rework the D-Bus glue to hook GDBus with NetworkManager to use
GDBusConnection directly. Don't use generated code, don't use
GDBusInterfaceSkeleton.
We still keep "src/settings/plugins/ifcfg-rh/nm-ifcfg-rh.xml"
and still compile the static generated code. We don't actually need
them anymore, maybe the should be dropped later.
This is a proof of concept for reworking the D-Bus glue in
NetworkManager core to directly use GDBusConnection. Reworking core is
much more complicated, because there we also have properties, and a
class hierarchy.
Arguably, for the trivial ifcfg-rh service all this hardly matters, because
the entire D-Bus service only consists of one method, which is unlikely to
be extended in the future.
Now we get rid of layers of glue code, that were hard to comprehend.
Did you understand how nm_exported_object_skeleton_create() works
and uses the generated code and GDBusInterfaceSkeleton to hook into
GDBusConnection? Congratulations in that case. In my opinion, these
layers of code don't simplify but complicate the code.
The change also reduces the binary size of "libnm-settings-plugin-ifcfg-rh.so"
(build with contrib/rpm --without debug) by 8312 bytes (243672 vs. 235360).
In _dbus_setup(), call _dbus_clear(). It feels more correct to do that.
Although, technically, we never even call _dbus_setup() if there is
anything to clear.
Also, minor refactoring of config_changed_cb(). It's not entirely clear
whether we need that code, or how to handle D-Bus disconnecting, if at all.
Avoid calling nm_device_iwd_set_dbus_object (device, NULL) if the
dbus_object was NULL already. Apparently gdbus guarantees that a
name-owner notification either has a NULL old owner or a NULL new owner
but can also have both old and new owner NULL.
Reuse the apparent workaround from libnm/nm-client.c in which the
GDbusObjectManagerClient is recreated every time the name owner
pops up, instead of creating it once and using that object forever.
Resubscribe to all the signals on the new object. The initial
GDbusObjectManager we create is only used to listed for the name-owner
changes.
There's nothing in gdbus docs that justifies doing that but there
doesn't seem to be any way to reliably receive all the signals from
the dbus service the normal way. The signals do appear on dbus-monitor
and the gdbus apparently subscribes to those signals with AddMatch()
correctly but they sometimes won't be received by the client code,
unless this workaround is applied.
While making changes to got_object_manager, don't destroy the
cancellable there as it is supposed to be used throughout the
NMIwdManager life.
Disconnect from NMManager signals in our cleanup, make sure the
NMManager singleton is not destroyed before we are by keeping a
reference until we've disconnected from its signals.
Add very simple periodic scanning because IWD itself only does periodic
scanning when it is in charge of autoconnecting (by policy). Since we
keep IWD out of the autoconnect state in order to use NM's autoconnect
logic, we need to request the scanning. The policy in this patch is to
use a simple 10s period between the end of one scan the requesting of
another while not connected, and 20s when connected. This is so that
users can expect similar results from both wifi backends but without
duplicating the more elaborate code in the wpa_supplicant backend which
can potentially be moved to a common superclass.
Although IFA_F_TEMPORARY is numerically equal to IFA_F_SECONDARY,
their meaning is different. One applies to IPv6 temporary addresses,
and the other to IPv4 secondary addresses.
During _addr_array_clean_expired() we want to ignore and clear
IPv6 temporary addresses, but not IPv4 secondary addresses.
Fixes: f2c4720bca
While the numerical values of IFA_F_SECONDARY and IFA_F_TEMPORARY
are identical, their meaning is not.
IFA_F_SECONDARY is only relevant for IPv4 addresses, while
IFA_F_TEMPORARY is only relevant for IPv6 addresses.
IFA_F_TEMPORARY is automatically set by kernel for the addresses
that it generates as part of IFA_F_MANAGETEMPADDR. It cannot be
actively set by user-space.
IFA_F_SECONDARY is automatically set by kernel depending on the order
in which the addresses for the same subnet are added.
This essentially reverts 8b4f11927 (core: avoid IFA_F_TEMPORARY alias for
IFA_F_SECONDARY).
In device_ipx_changed() we remember the addresses for which it appears
that DAD failed. Later, on an idle handler, we process them during
queued_ip6_config_change().
Note that nm_plaform_ip6_address_sync() might very well decide to remove
some or all addresses and re-add them immidiately later. It might do so,
to get the address priority/ordering right. At that point, we already
emit platform signals that the device disappeared, and track them in
dad6_failed_addrs.
Hence, later during queued_ip6_config_change() we must check again
whether the address is really not there and not still doing DAD.
Otherwise, we wrongly claim that DAD failed and remove the address,
generate a new one, and the same issue might happen again.
dad6_failed_addrs is populated with addresses from the platform cache.
Inside the cache, all addresses have addr_source NM_IP_CONFIG_SOURCE_KERNEL,
because addr_source property for addresses is only a property that is
used NetworkManager internally.
NMPObjects are never modified after being put into the cache.
Hence, it is safe and encouraged to just keep a reference to them,
instead of cloning them.
Interestingly, NMPlatform's change signals have a platform_object
pointer, which is not the pointer to the NMPObjects itself, but
down-cast to the NMPlatformObject instance. It does so, because commonly
callers want to have a pointer to the NMPlatformObject instance, instead
of the outer NMPObjects. However, NMP_OBJECT_UP_CAST() is guaranteed
to work one would expect.
The order in which we add addresses to dad6_failed_addrs does not
matter. Hence, use g_slist_prepend() which is O(1), instead
g_slist_append() with O(n).
We want to add addresses in a particular order so that source address
selection works.
Note that @known_addresses contains the desired addresses in order of
least-important first, while @plat_addresses contains them in opposite
order. Previously, this inverted order was not considered, and we
essentially ended up removing and re-adding all addresses every time.
Fix that. While at it, get rid of the O(n^2) runtime complexity, and
make it O(n) by iterating both lists simultaneously.
Temporary addresses (RFC4941) are not handled by NetworkManager directly, but by
kernel. If they are in the @known_addresses list, clear them out early.
They shall be ignored.