nm_ppp_manager_stop() wants to ensure that the pppd process is really
gone. For that it uses nm_utils_kill_child_async() to first send
SIGTERM, and sending SIGKILL after a timeout.
Later, we want to fix shutdown of NetworkManager to iterate the mainloop
during shutdown, so that such operations are still handled. However, we
can only delay shutdown for a certain time. After a timeout (NM_SHUTDOWN_TIMEOUT_MS
plus NM_SHUTDOWN_TIMEOUT_MS_GRACE) we really have to give up and
terminate.
That means, the right amount of time between sending SIGTERM and SIGKILL
is exactly NM_SHUTDOWN_TIMEOUT_MS. Hopefully that is of course
sufficient in the first place. If not, send SIGKILL afterwards, and give
a bit more time (NM_SHUTDOWN_TIMEOUT_MS_GRACE) to reap the child.
And if all this time is still not enough, something is really odd and we
abort waiting, with a warning in the logfile.
Since we don't properly handle shutdown yet, the description above is
not really true. But with this patch, we fix it from point of view of
NMPPPManager.
Previously, there were two functions nm_ppp_manager_stop_sync() and
nm_ppp_manager_stop_async().
However, stop-sync() would still kill the process asynchronously (with a
2 seconds timeout before sending SIGKILL).
On the other hand, stop-async() did pretty much the same thing as
sync-code, except also using the GAsyncResult.
Merge the two functions. Stopping the instance for the most part can be
done entirely synchrnous. The only thing that is asynchronous, is
to wait for the process to terminate. For that, add a new callback
argument to nm_ppp_manager_stop(). This replaces the GAsyncResult
pattern.
Also, always ensure that NetworkManager runs the mainloop at least as
long until the process really terminated. Currently we don't get that
right, and during shutdown we just stop iterating the mainloop. However,
fix this from point of view of NMPPPManager and register a wait-object,
that later will correctly delay shutdown.
Also, NMDeviceWwan cared to wait (asynchronously) until pppd really
terminated. Keep that functionality. nm_ppp_manager_stop() returns
a handle that can be used to cancel the asynchrounous request and invoke
the callback right away. However note, that even when cancelling the
request, the wait-object that prevents shutdown of NetworkManager is
kept around, so that we can be sure to properly clean up.
We usually structure our code in a (pseudo) object oriented way.
It makes sense to call the variable for the target object "self",
it is more familiar and usually done.
- add callback arguments to _ppp_kill(). Although most callers don't
care, it makes it more obvious that this kills the process
asynchronously.
- the call to nm_utils_kill_child_async() is complicated, with many
arguments. Only call it from one place, and re-use the simpler wrapper
function _ppp_kill() everywhere.
Eventually we should do a coordinated shutdown when NetworkManager exits.
That is a large work, ensuring that all asynchronous actions are cancellable
(in time), and that we wait for all pending operations to complete.
Add a function nm_shutdown_register_watchdog(), so that objects can register
themselves, to block shutdown until they are destroyed. It's not yet used,
because actually iterating the mainloop during shutdown can only be done,
once the code is prepared to be ready for that. But we already need the
function, so that we can refactor individual parts of the code, in preparation
of using it in the future.
glib 2.56's g_steal_pointer() won't tolerate a function pointer in place
of a gpointer.
CC src/src_libNetworkManager_la-nm-active-connection.lo
src/nm-active-connection.c:1017:17: error: pointer type mismatch
('NMActiveConnectionAuthResultFunc' (aka 'void (*)(struct _NMActiveConnection *,
int, const char *, void *)') and 'gpointer' (aka 'void *'))
[-Werror,-Wpointer-type-mismatch]
result_func = g_steal_pointer (&priv->auth.result_func);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gmem.h:200:6: note: expanded from macro 'g_steal_pointer'
(0 ? (*(pp)) : (g_steal_pointer) (pp))
^ ~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
There's just a single spot we use it that way, so it's perhaps better to
work around the warning instead of disabling it.
Do some preprocessing on the DNS configuration sent to plugins:
- add the '~' default routing (lookup) domain to IP configurations
with the default route or, when there is none, to all non-VPN
IP configurations
- use the dns-priority to decide which connection to use in case
multiple connections have the same domain
- consider a negative dns-priority value as a way to 'shadow' all
subdomains from other connections
- compute reverse DNS domains
and add the resulting domain list to NMDnsIPConfigData so that
split-DNS plugins can use that directly instead of reimplementing the
same logic themselves.
Coccinelle:
@@
expression a, b;
@@
-a ? a : b
+a ?: b
Applied with:
spatch --sp-file ternary.cocci --in-place --smpl-spacing --dir .
With some manual adjustments on spots that Cocci didn't catch for
reasons unknown.
Thanks to the marvelous effort of the GNU compiler developer we can now
spare a couple of bits that could be used for more important things,
like this commit message. Standards commitees yet have to catch up.
It is meant to be rather similar in nature to isblank() or
g_ascii_isspace().
Sadly, isblank() is locale dependent while g_ascii_isspace() also considers
vertical whitespace as a space. That's no good for configuration files that
are strucutured into lines, which happens to be a pretty common case.
If the master has no carrier in act_stage3_ip6_config_start(), we set
IP state WAIT and wait until carrier goes up before starting IP
configuration.
However, in carrier_changed() if the device state is ACTIVATED we only
call nm_device_update_dynamic_ip_setup(), which just restarts DHCP if
it was already running.
Let's also ensure that we start IP configuration if the IP state is
WAIT.
Fixes: b0f6baad90https://bugzilla.redhat.com/show_bug.cgi?id=1575944
The constructor can bail out early, not setting monitor->sd.watch:
(NetworkManager:373): GLib-CRITICAL **: 20:35:58.601: g_source_remove: assertion 'tag > 0' failed
Without ifindex, adding the direct route to gateway fails:
platform: route-sync: failure to add IPv6 route: fd02::/64 via fd01::1 dev 1635 metric 101 mss 0 rt-src user: No route to host (113); try adding direct route to gateway fd01::1/128 via :: metric 101 mss 0 rt-src user
platform: route: append IPv6 route: fd01::1/128 via :: metric 101 mss 0 rt-src user
platform-linux: delayed-action: schedule wait-for-nl-response (seq 269, timeout in 0.199999195, response-type 0)
platform-linux: delayed-action: handle wait-for-nl-response (any)
platform-linux: netlink: recvmsg: new message NLMSG_ERROR, flags 0, seq 269
platform-linux: netlink: recvmsg: error message from kernel: No such device (19) for request 269
Fixes: c9f89cafdf
When passing "/" to destroy all checkpoints, wrongly no
checkpoint was destroyed.
When passing a particular path that should be destroyed,
wrongly all checkpoints were destroyed.
Fixes: 79458a558b
In nm_manager_get_best_device_for_connection(), not only check whether the first found
active-connection has a device. There might be multiple candidates, in which case iterate
over them and figure out which one is the most suitable.
Also, despite the found @ac has the same settings-connection, it does not
mean that the connection is available on the device. Extend the check and
only return compatible and ready devices. This can easily happen that
the settings-connection was modified in the meantime and no longer is
compatible with the device (despite currently being active on the
device, but with the previous settings).
It is possible, that there are multiple such conflicting connections.
Maybe not now, but with connecition cardinality it will soon be.
Find and disconnect them all.
nm_device_steal_connection() was a bit misleading. It only had one caller,
and what _internal_activate_device() really wants it to deactivate all
other active-connections for the same connection. Hence, it already
performed a lookup for the active-connection that should be disconnected,
only to then lookup the device, and tell it to steal the connection.
Note, that if existing_ac happens to be neither the queued nor the currenct
active connection, then previously it would have done nothing. It's
unclear when that exactly can happen, however, we can avoid that
question entirely.
Instead of having steal-connection(), have a disconnect-active-connection().
If there is no matching device, it will just set the active-connection's
state to DISCONNECTED. Which in turn does nothing, if the state is
already DISCONNECTED.
At various places we check whether we have an active-connection already.
For example, when activating a new connection in _internal_activate_device(),
we might want to "nm_device_steal_connection()" if the same profile is
already active.
However, the max-state argument was not accurate in several cases. For
the purpose of finding concurrent activations, we don't care about
active-connections that are already in state DEACTIVATING. Only those
that are ACTIVATED or before.
active_connection_find_by_connection() (or how it was called previously) is
a simpler wrapper around active_connection_find(), which accepts a NMConnection
argument, that may or may not be a NMSettingsConnection.
It always passed NM_ACTIVE_CONNECTION_STATE_DEACTIVATING as max_state, and I think
that one of the two callers don't should do that. A later commit will change that.
So, we also want to pass @max_state as argument to active_connection_find_by_connection().
At that point, make active_connection_find_by_connection() identical to
active_connection_find(), with the only exception, that it accepts a
NMConnection and does automatically do the right thing (either lookup by
pointer equality or by uuid).
There are various places where we do an internal activation (with an
internal auth-subject). In several of these places, the
ACTIVATION_REASON is USER_REQUEST.
I think it is wrong to generally abort all internal activations, except
AUTOCONNECT_SLAVES ones. I think, aborting an activation should be
opt-in instead of opt-out.
To me it seems, we want to abort the activation based on the activation
reason. For now, opt-in to EXTERNAL, ASSUME and AUTOCONNECT only. If
there are additional cases where we should abort activation, we should
add yet another reason and not use USER_REQUEST.
Until now, we only really cared about whether a connection was activated
with reason NM_ACTIVATION_REASON_AUTOCONNECT_SLAVES or not.
Now however, we will care about whether a connection was activated via
(genuine) autoconnect by NMPolicy, or external/assume by NMManager.
Add a new reason to distinguish between them.
There is no need to perform a lookup by path. NMSettings is a singleton,
it has the connection exactly iff the connection is linked.
Also add an assertion to double-check that the results agree with
the previous implementation.
Instead of passing the interval for the timeout, let concheck_periodic_schedule_do()
figure it out on its own. It only depends on cur-interval and
cur-basetime.
Additionally, pass now_ns timestamp, because we already made
decisions based on this particular timestamp. We don't want to
re-evalutate the current time but ensure to use the same timestamp.
There is no change in behavior, it just seems nicer this way.
When the device-state changes to "activated", force a connectivity check
right away. Something possibly happened that affected connectivity.
Also, reduce the interval time down to CONCHECK_P_PROBE_INTERVAL to
start probing again.
if device *is* a NM_DEVICE_IWD, then make sure to not pass that to
_nm_device_wifi_request_scan (which asserts on anything else than a
NM_DEVICE_WIFI device).
The crash can be triggered by enabling wifi.backend=iwd and clicking
on the 'select network' item in gnome shell for example. The journal
output looks like this:
NetworkManager[1861]: invalid cast from 'NMDeviceIwd' to 'NMDeviceWifi'
NetworkManager[1861]: **
NetworkManager[1861]: NetworkManager:ERROR:src/devices/wifi/nm-device-wifi.c:1127:_nm_device_wifi_request_scan: assertion failed: ((((__extension__ ({ GTypeInstance *__inst = (GTypeInstance*) ((_obj)); GType __t = ((nm_device_wifi_get_type ())); gboolean __r; if (!__inst) __r = (0); else if (__inst->g_class && __inst->g_class->g_type == __t) __r = (!(0)); else __r = g_type_check_instance_is_a (__inst, __t); __r; })))))
systemd[1]: NetworkManager.service: Main process exited, code=dumped, status=6/ABRT
systemd[1]: NetworkManager.service: Failed with result 'core-dump'.
Fixes: 297d4985abhttps://github.com/NetworkManager/NetworkManager/pull/107
Without this, nm_device_get_type_description() would quite likely
return "ethernet" for NMDeviceVeth types. This is wrong and was
broken recently.
Fixes: 0775602574
During shutdown, we will need to still iterate the main loop
to do a coordinated shutdown. Currently we do not, and we just
exit, leaving a lot of objects hanging.
If we are going to fix that, we need during shutdown tell
NMDBusManager to reject all future operations.
Note that property getters and "GetManagerObjects" call is not
blocked. It continues to work.
Certainly for some operations, we want to allow them to be called even
during shutdown. However, these have to opt-in.
This also fixes an uglyness, where nm_dbus_manager_start() would
get the set-property-handler and the @manager as user-data. However,
NMDBusManager will always outlife NMManager, hence, after NMManager
is destroyed, the user-data would be a dangling pointer. Currently
that is not an issue, because
- we always leak NMManager
- we don't run the mainloop during shutdown