Now that every call to nm_device_set_ip_iface() and nm_device_set_ip_ifindex()
is checked, and setting an interface that does not exist causes the device
state to fail, we no longer need to allow setting an ip-iface if we are
unable to retrieve the ip-ifindex.
Depending on the bearer's configuration method, the data-port is
either a networking interface, or an tty for ppp.
Let's treat them strictily separate.
Also, rework how NM_MODEM_DATA_PORT was used in both contexts.
Instead, use the that we actually care about.
Also, when nm_device_set_ip_ifindex() fails, fail activation
right away.
Also, we early try to resolve the network interface's name to
an ifindex. If that fails, the device is already gone and we
fail early.
nm_device_modem_new() is only called with a newly created
NMModemBroadband or NMModemOfono instance.
See the callers
- NMModemManager:handle_new_modem()
- NMWwanFactory:modem_added_cb()
- NMDeviceModem:nm_device_modem_new()
Hence, at that point, the modem cannot yet have a data-port
or ip-iface set, because that is only obtained later.
- don't even bother to look into the platform cache, but use
if_indextoname() / if_nametoindex(). In most cases, we obtained
the ifindex/ifname not from the platform cache in the first
place. Hence, there is a race, where the interface might not
exist.
However, try to process events of the platform cache, hoping
that the cache contains an interface for the given ifindex/ifname.
- let set_ip_ifindex() and set_ip_iface() both return a boolean
value to indicate whether a ip-interface is set or not. That is,
whether we have a positive ip_ifindex. That seems more interesting
information, then to return whether anything changed.
- as before, set_ip_ifindex() can only clear an ifindex/ifname,
or error out without doing anything. That is different from
set_ip_iface(), which will also set an ifname if no ifindex
can be resolved. That is curreently ugly, because then ip-ifindex
and ip-iface don't agree. That shall be improved in the future
by:
- trying to set an interface that cannot be resolved shall
lead to a disconnect in any case.
- we shall make less use of the ip-iface and rely more on the
ifindex.
The error should be freed by callback functions, but only
_monitor_bridges_cb() actually does it. Simplify this by letting the
caller own the error.
Fixes: 830a5a14cb
- refactor the loop in event_handler_read_netlink() to mark pending
requests as answered by adding a new helper function
delayed_action_wait_for_nl_response_complete_check()
- delayed_action_wait_for_nl_response_complete_all() can be implemented
in terms of delayed_action_wait_for_nl_response_complete_check()
- if nm_platform_netns_push() fails, also complete all pending requests
with a new error code WAIT_FOR_NL_RESPONSE_RESULT_FAILED_SETNS.
Now that we cleaned up nl_recv(), we have full control over which error
variables are returned when. We no longer need to check "errno"
directly, and we no longer need the NLE_USER_* workaround.
- adjust some coding style (space after function name).
- ensure to use g_free(), as we no longer use malloc
but the g_malloc aliases. Nowadays, glib's malloc
is identical to malloc from the standard library and
so this is no issue in practice. Still it's bad
style to mix g_malloc() with free().
- use cleanup attribute for memory handling.
Glib is not out of memory safe, meaning it always aborts the program
when an allocation fails. It is not possible to meaningfully handle
out of memory when using glib.
Replace all allocation functions for netlink message with their glib
counter part and remove the NULL checks.
With libnl3, each socket has it's own callback structure.
One would often take that callback structure, clone it, modify it
and invoke a receive operation with it.
We don't need this complexity. We got rid of all default handlers,
hence, by default all callbacks are unset.
The only callbacks that are set, are those that we specify immediately
before invoking the receive operation. Just pass the callback structure
at that point.
Also, no more ref-counting, and cloning of the callback structure. It is
so simple, just stack allocate one if you need it.
The only difference between nl_recvmsgs_report() and nl_recvmsgs() is
the return value on success. libnl3 couldn't change that for backward
compatibility reasons. We can merge them.
Originally, these were error numbers from libnl3. These error numbers
are separate from errno, which is unfortunate, because sometimes we
care about the native errno returned from kernel.
Now, refactor them so that the error numbers are in the shared realm
of errno, but failures from kernel or underlying API are still returned
via their native errno.
- NLE_INVAL doesn't exist anymore. Passing invalid arguments to a function
is commonly a bug. g_return_*(NLE_BUG) is the right answer to that.
- NLE_NOMEM and NLE_AGAIN is replaced by their errno counterparts.
- drop several error numbers. If nobody cares about these numbers,
there is no reason to have a specific error number for them.
NLE_UNSPEC is sufficient.
From libnl3, we only used the helper function to parse/generate netlink
messages and the socket functions to send/receive messages. We don't
need an external dependency to do that, it is simple enough.
Drop the libnl3 dependency, and replace all missing code by directly
copying it from libnl3 sources. At this point, I mostly tried to
import the required bits to make it working with few modifications.
Note that this increases the binary size of NetworkManager by 4736 bytes
for contrib/rpm build on x86_64. In the future, we can simplify the code
further.
A few modifications from libnl3 are:
- netlink errors NLE_* are now in the domain or regular errno.
The distinction of having to bother with two kinds of error
number domains was annoying.
- parts of the callback handling is copied partially and unused parts
are dropped. Especially, the verbose/debug handlers are not used.
In following commits, the callback handling will be significantly
simplified.
- the complex handling of seleting ports was simplified. We now always
let kernel choose the right port automatically.
Platform invokes change events while reading netlink events. However,
platform code is not re-entrant and calling into platform again is not
allowed (aside operations that do not process the netlink socket, like
lookup of the platform cache).
That basically means, we have to always process events in an idle
handler. That is not a too strong limitation, because we anyway don't
know the call context in which the platform event is emitted and we
should avoid unguarded recursive calls into platform.
Otherwise, we get hit an assertion/crash in nm-iface-helper:
1 raise()
2 abort()
3 g_assertion_message()
4 g_assertion_message_expr()
5 do_delete_object()
6 ip6_address_delete()
>>> 7 nm_platform_ip6_address_delete()
8 nm_platform_ip6_address_sync()
9 nm_ip6_config_commit()
10 ndisc_config_changed()
11 ffi_call_unix64()
12 ffi_call()
13 g_cclosure_marshal_generic_va()
14 _g_closure_invoke_va()
15 g_signal_emit_valist()
16 g_signal_emit()
>>> 17 nm_ndisc_dad_failed()
18 ffi_call_unix64()
19 ffi_call()
20 g_cclosure_marshal_generic()
21 g_closure_invoke()
22 signal_emit_unlocked_R()
23 g_signal_emit_valist()
24 g_signal_emit()
>>> 25 nm_platform_cache_update_emit_signal()
26 event_handler_recvmsgs()
27 event_handler_read_netlink()
28 delayed_action_handle_one()
29 delayed_action_handle_all()
30 do_delete_object()
31 ip6_address_delete()
32 nm_platform_ip6_address_delete()
33 nm_platform_ip6_address_sync()
>>> 34 nm_ip6_config_commit()
35 ndisc_config_changed()
36 ffi_call_unix64()
37 ffi_call()
38 g_cclosure_marshal_generic_va()
39 _g_closure_invoke_va()
40 g_signal_emit_valist()
41 g_signal_emit()
42 check_timestamps()
43 receive_ra()
44 ndp_call_eventfd_handler()
45 ndp_callall_eventfd_handler()
46 event_ready()
47 g_main_context_dispatch()
48 g_main_context_iterate.isra.22()
49 g_main_loop_run()
>>> 50 main()
NMPlatform already has a check to assert against recursive calls
in delayed_action_handle_all():
g_return_val_if_fail (priv->delayed_action.is_handling == 0, FALSE);
priv->delayed_action.is_handling++;
...
priv->delayed_action.is_handling--;
Fixes: f85728ecffhttps://bugzilla.redhat.com/show_bug.cgi?id=1546656
When NM quits it destroys all singletons including NMOvsdb, which
invokes callbacks for every pending method call. In the shutdown,
extra care must be taken to not access objects that are already in a
inconsistent state; for example here, the callback changes the device
state, and this causes an access to data that has already been
cleared:
#0 _g_log_abort (breakpoint=breakpoint@entry=1) at gmessages.c:554
#1 g_logv (log_domain=0x5635653b6817 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffb4b2c1e0) at gmessages.c:1362
#2 g_log (log_domain=log_domain@entry=0x5635653b6817 "NetworkManager", log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, format=format@entry=0x7fbb3f58fa4a "%s: assertion '%s' failed") at gmessages.c:1403
#3 g_return_if_fail_warning (log_domain=log_domain@entry=0x5635653b6817 "NetworkManager", pretty_function=pretty_function@entry=0x5635653b6b00 <__func__.34463> "nm_device_factory_manager_find_factory_for_connection", expression=expression@entry=0x5635653b6719 "factories_by_setting") at gmessages.c:2702
#4 nm_device_factory_manager_find_factory_for_connection (connection=connection@entry=0x56356627e0e0) at src/devices/nm-device-factory.c:243
#5 nm_manager_get_connection_iface (self=0x563566241080 [NMManager], connection=connection@entry=0x56356627e0e0, out_parent=out_parent@entry=0x0, error=error@entry=0x0) at src/nm-manager.c:1458
#6 check_connection_compatible (self=<optimized out>, connection=0x56356627e0e0) at src/devices/nm-device.c:4679
#7 check_connection_compatible (device=0x56356647b1b0 [NMDeviceOvsInterface], connection=0x56356627e0e0) at src/devices/ovs/nm-device-ovs-interface.c:95
#8 _nm_device_check_connection_available (self=0x56356647b1b0 [NMDeviceOvsInterface], connection=0x56356627e0e0, flags=NM_DEVICE_CHECK_CON_AVAILABLE_NONE, specific_object=0x0) at src/devices/nm-device.c:12102
#9 nm_device_check_connection_available (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface], connection=0x56356627e0e0, flags=flags@entry=NM_DEVICE_CHECK_CON_AVAILABLE_NONE, specific_object=specific_object@entry=0x0) at src/devices/nm-device.c:12131
#10 nm_device_recheck_available_connections (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface]) at src/devices/nm-device.c:12238
#11 _set_state_full (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface], state=state@entry=NM_DEVICE_STATE_FAILED, reason=reason@entry=NM_DEVICE_STATE_REASON_OVSDB_FAILED, quitting=quitting@entry=0) at src/devices/nm-device.c:13065
#12 nm_device_state_changed (self=self@entry=0x56356647b1b0 [NMDeviceOvsInterface], state=state@entry=NM_DEVICE_STATE_FAILED, reason=reason@entry=NM_DEVICE_STATE_REASON_OVSDB_FAILED) at src/devices/nm-device.c:13328
#13 del_iface_cb (error=<optimized out>, user_data=0x56356647b1b0) at src/devices/ovs/nm-device-ovs-port.c:160
#14 _transact_cb (self=self@entry=0x5635662b9ba0 [NMOvsdb], result=result@entry=0x0, error=0x563566259a10, user_data=user_data@entry=0x5635662ff320) at src/devices/ovs/nm-ovsdb.c:1449
#15 ovsdb_disconnect (self=self@entry=0x5635662b9ba0 [NMOvsdb]) at src/devices/ovs/nm-ovsdb.c:1331
#16 dispose (object=0x5635662b9ba0 [NMOvsdb]) at src/devices/ovs/nm-ovsdb.c:1558
#17 g_object_unref (_object=0x5635662b9ba0) at gobject.c:3293
#18 _nm_singleton_instance_destroy () at src/nm-core-utils.c:138
#19 _dl_fini () at dl-fini.c:253
#20 __run_exit_handlers (status=status@entry=0, listp=0x7fbb3e1ad6c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#21 __GI_exit (status=status@entry=0) at exit.c:99
#22 main (argc=1, argv=0x7fffb4b2cc38) at src/main.c:468
Add a new error code to indicate to callbacks that we are quitting and
no further action must be taken. This is preferable to having
additional references because it allows us to free the resources owned
by callbacks immediately, while references can easily create loops.
https://bugzilla.redhat.com/show_bug.cgi?id=1543871
Example: when dhcpv4 lease renewal fails, if ipv4.may-fail was "yes",
check also if we have a successful ipv6 conf: if not fail.
Previously we just ignored the other ip family status.
Convert the string representation of ipv4.dhcp-client-id property already in
NMDevice to a GBytes. Next, we will support more client ID modes, and we
will need the NMDevice context to generate the client id.
GByteArray is a mutable array of bytes. For every practical purpose, the hwaddr
property of NMDhcpClient is an immutable sequence of bytes. Thus, make it a
GBytes.
The two boolean properties do not need to be ever reset. It's nice
to initialize such properties in the constructor and don't mutate
them afterwards.
Instead of adding two boolean GObject properties, add a new flags property
that can encode these two values. In the end, properties are too
cumbersome, let's combine them.
Optimally, NMDhcpClient would be stateless and all paramters would
be passed on as argument. Clearly that is not feasable, because there
are so many paramters, and in many cases they need to be cached for the
lifetime of the client instance.
Instead of passing info_only paramter to ip6_start() and cache it
both in NMDhcpClient and NMDhcpSystemd, keep it in NMDhcpClient at
one place.
In the next commit, we will initialize info-only only once during the
constructor, so it is immutable and somewhat stateless.
The parent's stop() implementation does nothing interesting
for NMDhcpSystem. Still, call it, it's just unexpected to
not chain up the parent implementation, if all other subclasses
do it.
In general, if the parent's implementation is not suitable to be called
by the derived class, that should be handled differently then just not
chaining up. Otherwise it's inconsistent and confusing.
Don't use message data after calling curl_multi_remove_handle(). Fixes
the following asan error:
=================================================================
==13238==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000091ad0 at pc 0x55750f8d9a10 bp 0x7ffeb7f5f210 sp 0x7ffeb7f5f200
READ of size 8 at 0x608000091ad0 thread T0
#0 0x55750f8d9a0f in curl_check_connectivity (/usr/sbin/NetworkManager+0x190a0f)
#1 0x55750f8da7dd in curl_socketevent_cb (/usr/sbin/NetworkManager+0x1917dd)
#2 0x7f73cb64e8f8 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4a8f8)
#3 0x7f73cb64ec57 (/lib64/libglib-2.0.so.0+0x4ac57)
#4 0x7f73cb64ef29 in g_main_loop_run (/lib64/libglib-2.0.so.0+0x4af29)
#5 0x55750f85c3f4 (/usr/sbin/NetworkManager+0x1133f4)
#6 0x7f73c9f19384 in __libc_start_main (/lib64/libc.so.6+0x22384)
#7 0x55750f85d7f7 (/usr/sbin/NetworkManager+0x1147f7)
0x608000091ad0 is located 48 bytes inside of 88-byte region [0x608000091aa0,0x608000091af8)
freed by thread T0 here:
#0 0x7f73cd61f508 in __interceptor_free (/lib64/libasan.so.4+0xde508)
#1 0x7f73ca710eaa in curl_multi_remove_handle (/lib64/libcurl.so.4+0x32eaa)
previously allocated by thread T0 here:
#0 0x7f73cd61fa88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88)
#1 0x7f73ca710b3d in curl_multi_add_handle (/lib64/libcurl.so.4+0x32b3d)
SUMMARY: AddressSanitizer: heap-use-after-free (/usr/sbin/NetworkManager+0x190a0f)
After writing the connection to disk and rereading it, in addition to
restoring agent-owned secrets in the cache we must also restore
agent-owned secrets from the original connections since they are lost
during the write.
Reported-by: Märt Bakhoff <anon@sigil.red>
https://bugzilla.gnome.org/show_bug.cgi?id=793324
The bridge test (and no other either) no longer sets sysfs properties,
so this whole madness is no longer needed. That is good, because Linux
got somewhat stricter (at least in 4.15) about mounting sysfs and the
whole thing wouldn't work with containers where /sys is red-only from
the start.