If the active connection is deactivated because the device is gone,
don't block autoconnection. Otherwise, whenever the device comes
back (e.g. maybe it was reset in the middle of a connection attempt),
the autoconnection logic won't be triggered, as the settings are still
blocked.
I'm able to reproduce this by performing a WWAN modem reset in the
middle of a connection attempt.
https://github.com/NetworkManager/NetworkManager/pull/121
Add new stable-id specifier "${DEVICE}" to explicitly declare that the
connection's identity differs per-device.
Note that for settings like "ipv6.addr-gen-mode=stable" we already hash
the interface's name. So, in combination with addr-gen-mode, using this
specifier has no real use. But for example, we don't do that for
"ipv4.dhcp-client-id=stable".
Point being, in various context we possibly already include a per-device
token into the generation algorithm. But that is not the case for all
contexts and uses.
Especially the DHCPv4 client identifier is supposed to differ between interfaces
(according to RFC). We don't do that by default with "ipv4.dhcp-client-id=stable",
but with "${DEVICE}" can can now be configured by the user.
Note that the fact that the client-id is the same accross interfaces, is not a
common problem, because profiles are usually restricted to one device via
connection.interface-name.
Otherwise, the generated client-id depends purely on the profile's
stable-id. It means, the same profile (that is, either the same UUID
or same stable-id) on different hosts will result in identical client-ids.
That is clearly not desired. Hash a per-host secret-key as well.
Note, that we don't hash the interface name. So, activating the
profile on different interfaces, will still yield the same client-id.
But also note, that commonly a profile is restricted to one device,
via "connection.interface-name".
Note that this is a change in behavior. However, "ipv4.dhcp-client-id=stable"
was only added recently and not yet released.
Fixes: 62a7863979
and add nm_utils_secret_key_get() to cache the secret-key, to only
obtain it once.
nm_utils_secret_key_read() is not expected to fail. However, in case
of an unexpected error, don't propagate the error to the caller,
but instead handle it internally.
That means, in case of error:
- log a warning within nm_utils_secret_key_read() itself.
- always return a generated secret-key. In case of error, the
key won't be persisted (obviously). But the caller can ignore
the error and just proceed with an in-memory key.
Hence, also add nm_utils_secret_key_get() to cache the key. This way,
we only try to actually generate/read the secret-key once. Since that
might fail and return an in-memory key, we must for future invocations
return the same key, without generating a new one.
The secret_key is binary random data, so one shouldn't ever use it as a
NUL terminated string directly.
Still, just ensure that the entire buffer is always NUL terminated.
nm_ppp_manager_stop() wants to ensure that the pppd process is really
gone. For that it uses nm_utils_kill_child_async() to first send
SIGTERM, and sending SIGKILL after a timeout.
Later, we want to fix shutdown of NetworkManager to iterate the mainloop
during shutdown, so that such operations are still handled. However, we
can only delay shutdown for a certain time. After a timeout (NM_SHUTDOWN_TIMEOUT_MS
plus NM_SHUTDOWN_TIMEOUT_MS_GRACE) we really have to give up and
terminate.
That means, the right amount of time between sending SIGTERM and SIGKILL
is exactly NM_SHUTDOWN_TIMEOUT_MS. Hopefully that is of course
sufficient in the first place. If not, send SIGKILL afterwards, and give
a bit more time (NM_SHUTDOWN_TIMEOUT_MS_GRACE) to reap the child.
And if all this time is still not enough, something is really odd and we
abort waiting, with a warning in the logfile.
Since we don't properly handle shutdown yet, the description above is
not really true. But with this patch, we fix it from point of view of
NMPPPManager.
Previously, there were two functions nm_ppp_manager_stop_sync() and
nm_ppp_manager_stop_async().
However, stop-sync() would still kill the process asynchronously (with a
2 seconds timeout before sending SIGKILL).
On the other hand, stop-async() did pretty much the same thing as
sync-code, except also using the GAsyncResult.
Merge the two functions. Stopping the instance for the most part can be
done entirely synchrnous. The only thing that is asynchronous, is
to wait for the process to terminate. For that, add a new callback
argument to nm_ppp_manager_stop(). This replaces the GAsyncResult
pattern.
Also, always ensure that NetworkManager runs the mainloop at least as
long until the process really terminated. Currently we don't get that
right, and during shutdown we just stop iterating the mainloop. However,
fix this from point of view of NMPPPManager and register a wait-object,
that later will correctly delay shutdown.
Also, NMDeviceWwan cared to wait (asynchronously) until pppd really
terminated. Keep that functionality. nm_ppp_manager_stop() returns
a handle that can be used to cancel the asynchrounous request and invoke
the callback right away. However note, that even when cancelling the
request, the wait-object that prevents shutdown of NetworkManager is
kept around, so that we can be sure to properly clean up.
We usually structure our code in a (pseudo) object oriented way.
It makes sense to call the variable for the target object "self",
it is more familiar and usually done.
- add callback arguments to _ppp_kill(). Although most callers don't
care, it makes it more obvious that this kills the process
asynchronously.
- the call to nm_utils_kill_child_async() is complicated, with many
arguments. Only call it from one place, and re-use the simpler wrapper
function _ppp_kill() everywhere.
Eventually we should do a coordinated shutdown when NetworkManager exits.
That is a large work, ensuring that all asynchronous actions are cancellable
(in time), and that we wait for all pending operations to complete.
Add a function nm_shutdown_register_watchdog(), so that objects can register
themselves, to block shutdown until they are destroyed. It's not yet used,
because actually iterating the mainloop during shutdown can only be done,
once the code is prepared to be ready for that. But we already need the
function, so that we can refactor individual parts of the code, in preparation
of using it in the future.
glib 2.56's g_steal_pointer() won't tolerate a function pointer in place
of a gpointer.
CC src/src_libNetworkManager_la-nm-active-connection.lo
src/nm-active-connection.c:1017:17: error: pointer type mismatch
('NMActiveConnectionAuthResultFunc' (aka 'void (*)(struct _NMActiveConnection *,
int, const char *, void *)') and 'gpointer' (aka 'void *'))
[-Werror,-Wpointer-type-mismatch]
result_func = g_steal_pointer (&priv->auth.result_func);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gmem.h:200:6: note: expanded from macro 'g_steal_pointer'
(0 ? (*(pp)) : (g_steal_pointer) (pp))
^ ~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
There's just a single spot we use it that way, so it's perhaps better to
work around the warning instead of disabling it.
Do some preprocessing on the DNS configuration sent to plugins:
- add the '~' default routing (lookup) domain to IP configurations
with the default route or, when there is none, to all non-VPN
IP configurations
- use the dns-priority to decide which connection to use in case
multiple connections have the same domain
- consider a negative dns-priority value as a way to 'shadow' all
subdomains from other connections
- compute reverse DNS domains
and add the resulting domain list to NMDnsIPConfigData so that
split-DNS plugins can use that directly instead of reimplementing the
same logic themselves.
Coccinelle:
@@
expression a, b;
@@
-a ? a : b
+a ?: b
Applied with:
spatch --sp-file ternary.cocci --in-place --smpl-spacing --dir .
With some manual adjustments on spots that Cocci didn't catch for
reasons unknown.
Thanks to the marvelous effort of the GNU compiler developer we can now
spare a couple of bits that could be used for more important things,
like this commit message. Standards commitees yet have to catch up.
It is meant to be rather similar in nature to isblank() or
g_ascii_isspace().
Sadly, isblank() is locale dependent while g_ascii_isspace() also considers
vertical whitespace as a space. That's no good for configuration files that
are strucutured into lines, which happens to be a pretty common case.
If the master has no carrier in act_stage3_ip6_config_start(), we set
IP state WAIT and wait until carrier goes up before starting IP
configuration.
However, in carrier_changed() if the device state is ACTIVATED we only
call nm_device_update_dynamic_ip_setup(), which just restarts DHCP if
it was already running.
Let's also ensure that we start IP configuration if the IP state is
WAIT.
Fixes: b0f6baad90https://bugzilla.redhat.com/show_bug.cgi?id=1575944
The constructor can bail out early, not setting monitor->sd.watch:
(NetworkManager:373): GLib-CRITICAL **: 20:35:58.601: g_source_remove: assertion 'tag > 0' failed
Without ifindex, adding the direct route to gateway fails:
platform: route-sync: failure to add IPv6 route: fd02::/64 via fd01::1 dev 1635 metric 101 mss 0 rt-src user: No route to host (113); try adding direct route to gateway fd01::1/128 via :: metric 101 mss 0 rt-src user
platform: route: append IPv6 route: fd01::1/128 via :: metric 101 mss 0 rt-src user
platform-linux: delayed-action: schedule wait-for-nl-response (seq 269, timeout in 0.199999195, response-type 0)
platform-linux: delayed-action: handle wait-for-nl-response (any)
platform-linux: netlink: recvmsg: new message NLMSG_ERROR, flags 0, seq 269
platform-linux: netlink: recvmsg: error message from kernel: No such device (19) for request 269
Fixes: c9f89cafdf
When passing "/" to destroy all checkpoints, wrongly no
checkpoint was destroyed.
When passing a particular path that should be destroyed,
wrongly all checkpoints were destroyed.
Fixes: 79458a558b
In nm_manager_get_best_device_for_connection(), not only check whether the first found
active-connection has a device. There might be multiple candidates, in which case iterate
over them and figure out which one is the most suitable.
Also, despite the found @ac has the same settings-connection, it does not
mean that the connection is available on the device. Extend the check and
only return compatible and ready devices. This can easily happen that
the settings-connection was modified in the meantime and no longer is
compatible with the device (despite currently being active on the
device, but with the previous settings).
It is possible, that there are multiple such conflicting connections.
Maybe not now, but with connecition cardinality it will soon be.
Find and disconnect them all.
nm_device_steal_connection() was a bit misleading. It only had one caller,
and what _internal_activate_device() really wants it to deactivate all
other active-connections for the same connection. Hence, it already
performed a lookup for the active-connection that should be disconnected,
only to then lookup the device, and tell it to steal the connection.
Note, that if existing_ac happens to be neither the queued nor the currenct
active connection, then previously it would have done nothing. It's
unclear when that exactly can happen, however, we can avoid that
question entirely.
Instead of having steal-connection(), have a disconnect-active-connection().
If there is no matching device, it will just set the active-connection's
state to DISCONNECTED. Which in turn does nothing, if the state is
already DISCONNECTED.
At various places we check whether we have an active-connection already.
For example, when activating a new connection in _internal_activate_device(),
we might want to "nm_device_steal_connection()" if the same profile is
already active.
However, the max-state argument was not accurate in several cases. For
the purpose of finding concurrent activations, we don't care about
active-connections that are already in state DEACTIVATING. Only those
that are ACTIVATED or before.
active_connection_find_by_connection() (or how it was called previously) is
a simpler wrapper around active_connection_find(), which accepts a NMConnection
argument, that may or may not be a NMSettingsConnection.
It always passed NM_ACTIVE_CONNECTION_STATE_DEACTIVATING as max_state, and I think
that one of the two callers don't should do that. A later commit will change that.
So, we also want to pass @max_state as argument to active_connection_find_by_connection().
At that point, make active_connection_find_by_connection() identical to
active_connection_find(), with the only exception, that it accepts a
NMConnection and does automatically do the right thing (either lookup by
pointer equality or by uuid).
There are various places where we do an internal activation (with an
internal auth-subject). In several of these places, the
ACTIVATION_REASON is USER_REQUEST.
I think it is wrong to generally abort all internal activations, except
AUTOCONNECT_SLAVES ones. I think, aborting an activation should be
opt-in instead of opt-out.
To me it seems, we want to abort the activation based on the activation
reason. For now, opt-in to EXTERNAL, ASSUME and AUTOCONNECT only. If
there are additional cases where we should abort activation, we should
add yet another reason and not use USER_REQUEST.
Until now, we only really cared about whether a connection was activated
with reason NM_ACTIVATION_REASON_AUTOCONNECT_SLAVES or not.
Now however, we will care about whether a connection was activated via
(genuine) autoconnect by NMPolicy, or external/assume by NMManager.
Add a new reason to distinguish between them.
There is no need to perform a lookup by path. NMSettings is a singleton,
it has the connection exactly iff the connection is linked.
Also add an assertion to double-check that the results agree with
the previous implementation.