In nm_manager_get_best_device_for_connection(), not only check whether the first found
active-connection has a device. There might be multiple candidates, in which case iterate
over them and figure out which one is the most suitable.
Also, despite the found @ac has the same settings-connection, it does not
mean that the connection is available on the device. Extend the check and
only return compatible and ready devices. This can easily happen that
the settings-connection was modified in the meantime and no longer is
compatible with the device (despite currently being active on the
device, but with the previous settings).
It is possible, that there are multiple such conflicting connections.
Maybe not now, but with connecition cardinality it will soon be.
Find and disconnect them all.
nm_device_steal_connection() was a bit misleading. It only had one caller,
and what _internal_activate_device() really wants it to deactivate all
other active-connections for the same connection. Hence, it already
performed a lookup for the active-connection that should be disconnected,
only to then lookup the device, and tell it to steal the connection.
Note, that if existing_ac happens to be neither the queued nor the currenct
active connection, then previously it would have done nothing. It's
unclear when that exactly can happen, however, we can avoid that
question entirely.
Instead of having steal-connection(), have a disconnect-active-connection().
If there is no matching device, it will just set the active-connection's
state to DISCONNECTED. Which in turn does nothing, if the state is
already DISCONNECTED.
At various places we check whether we have an active-connection already.
For example, when activating a new connection in _internal_activate_device(),
we might want to "nm_device_steal_connection()" if the same profile is
already active.
However, the max-state argument was not accurate in several cases. For
the purpose of finding concurrent activations, we don't care about
active-connections that are already in state DEACTIVATING. Only those
that are ACTIVATED or before.
active_connection_find_by_connection() (or how it was called previously) is
a simpler wrapper around active_connection_find(), which accepts a NMConnection
argument, that may or may not be a NMSettingsConnection.
It always passed NM_ACTIVE_CONNECTION_STATE_DEACTIVATING as max_state, and I think
that one of the two callers don't should do that. A later commit will change that.
So, we also want to pass @max_state as argument to active_connection_find_by_connection().
At that point, make active_connection_find_by_connection() identical to
active_connection_find(), with the only exception, that it accepts a
NMConnection and does automatically do the right thing (either lookup by
pointer equality or by uuid).
There are various places where we do an internal activation (with an
internal auth-subject). In several of these places, the
ACTIVATION_REASON is USER_REQUEST.
I think it is wrong to generally abort all internal activations, except
AUTOCONNECT_SLAVES ones. I think, aborting an activation should be
opt-in instead of opt-out.
To me it seems, we want to abort the activation based on the activation
reason. For now, opt-in to EXTERNAL, ASSUME and AUTOCONNECT only. If
there are additional cases where we should abort activation, we should
add yet another reason and not use USER_REQUEST.
Until now, we only really cared about whether a connection was activated
with reason NM_ACTIVATION_REASON_AUTOCONNECT_SLAVES or not.
Now however, we will care about whether a connection was activated via
(genuine) autoconnect by NMPolicy, or external/assume by NMManager.
Add a new reason to distinguish between them.
There is no need to perform a lookup by path. NMSettings is a singleton,
it has the connection exactly iff the connection is linked.
Also add an assertion to double-check that the results agree with
the previous implementation.
Instead of passing the interval for the timeout, let concheck_periodic_schedule_do()
figure it out on its own. It only depends on cur-interval and
cur-basetime.
Additionally, pass now_ns timestamp, because we already made
decisions based on this particular timestamp. We don't want to
re-evalutate the current time but ensure to use the same timestamp.
There is no change in behavior, it just seems nicer this way.
When the device-state changes to "activated", force a connectivity check
right away. Something possibly happened that affected connectivity.
Also, reduce the interval time down to CONCHECK_P_PROBE_INTERVAL to
start probing again.
if device *is* a NM_DEVICE_IWD, then make sure to not pass that to
_nm_device_wifi_request_scan (which asserts on anything else than a
NM_DEVICE_WIFI device).
The crash can be triggered by enabling wifi.backend=iwd and clicking
on the 'select network' item in gnome shell for example. The journal
output looks like this:
NetworkManager[1861]: invalid cast from 'NMDeviceIwd' to 'NMDeviceWifi'
NetworkManager[1861]: **
NetworkManager[1861]: NetworkManager:ERROR:src/devices/wifi/nm-device-wifi.c:1127:_nm_device_wifi_request_scan: assertion failed: ((((__extension__ ({ GTypeInstance *__inst = (GTypeInstance*) ((_obj)); GType __t = ((nm_device_wifi_get_type ())); gboolean __r; if (!__inst) __r = (0); else if (__inst->g_class && __inst->g_class->g_type == __t) __r = (!(0)); else __r = g_type_check_instance_is_a (__inst, __t); __r; })))))
systemd[1]: NetworkManager.service: Main process exited, code=dumped, status=6/ABRT
systemd[1]: NetworkManager.service: Failed with result 'core-dump'.
Fixes: 297d4985abhttps://github.com/NetworkManager/NetworkManager/pull/107
Without this, nm_device_get_type_description() would quite likely
return "ethernet" for NMDeviceVeth types. This is wrong and was
broken recently.
Fixes: 0775602574
During shutdown, we will need to still iterate the main loop
to do a coordinated shutdown. Currently we do not, and we just
exit, leaving a lot of objects hanging.
If we are going to fix that, we need during shutdown tell
NMDBusManager to reject all future operations.
Note that property getters and "GetManagerObjects" call is not
blocked. It continues to work.
Certainly for some operations, we want to allow them to be called even
during shutdown. However, these have to opt-in.
This also fixes an uglyness, where nm_dbus_manager_start() would
get the set-property-handler and the @manager as user-data. However,
NMDBusManager will always outlife NMManager, hence, after NMManager
is destroyed, the user-data would be a dangling pointer. Currently
that is not an issue, because
- we always leak NMManager
- we don't run the mainloop during shutdown
nm_settings_add_connection_dbus() has two callers. One of them is NMManager
during AddAndActivate. In this case, the NMActiveConnection already created
an auth-subject. Re-use it.
Note how creating an auth-subject involves reading procfs to determine
whether the process still exists. This is not about the additional
overhead of that, but about the race where the process could drop
of in the meantime. The calling process might be gone now, and we would
fail creating the auth-subject. There is no need for that, because we
already evaluated all information we need. Quite likely, in the case
of this race, PolicyKit will also determine that the process is gone
and fail authorization too. But that's PolicyKit's decision to make,
not nm_settings_add_connection_dbus()'s.
We cannot just fire off asynchronous actions without keeping a handle
to them. Otherwise, it's impossible for NMManager to know which
asynchronous operations are pending, and more importantly: it cannot
cancel them.
One day, I want that we do a clean shutdown, where NetworkManager stops
all pending operations, and cleans up everything. That implies, that
every operation is cancellable in a timely manner.
Rework pending nm_active_connection_authorize() calls to be tracked in a
list, so that they are still reachable to NMManager. Note that currently
NMManager does not yet try to cancel these operations ever. However, it
would now be possible to do so.
The async authorization request also carries user-data and its result
must always be handled. For example, it might carry a GDBusMethodInvocation
context, which must be returned and freed.
Hence, when cancelling the request, we must always invoke the callback.
Also, when the NMActiveConnection progresses to state disconnected,
automatically abort the authorization request.
Previously, nm_active_connection_authorize() accepts two user-data
pointers for convenience.
nm_active_connection_authorize() has three callers. One only requires
one user-data, one passes two user-data pointers, and one requires
three pointer.
Also, the way how the third passes the user data (via
g_object_set_qdata_full()) is not great.
Let's only use one user-data pointer. We commonly do that, and it's easy
enough to allocate a buffer to pack multiple pointers together.
We currently start the bus manager only after the creation of a
NMManager because the NMManager is needed to handle set-property bus
calls. However, objects created by NMManager
(e.g. NMDnsSystemdResolved) need a bus connection and so their
initialization currently fail.
To fix this, split nm_dbus_manager_start() in two parts: first only
create the connection and acquire the bus. After this step the
NMManager can be set up. In the second step, set NMManager as the
set-property handler and start exporting objects on the bus.
Fixes: 297d4985ab
Otherwise, the order is undefined and unstable. If you call
GetManagedObjects() on D-Bus multiple times, it's a very nice
property if the diff is small and not full not noise.
It can easily happen that connectivity checks take a long time to
complete (up to 20 seconds, when they time out).
So, before, during the first 20 seconds no connectivity checks would
return and bump the periodic interval. That meant, for the first 20
seconds we would each second schedule a periodic check.
Then, the checks start timing out, each one second apart as we scheduled
them. Previously, during each completion of the checks, we would bump
the interval every second.
Fix that two ways:
1) when the timer expires, also check whether there are still uncomplete
periodic checks. If there are, already bump the interval at that point.
2) at the same time, when this happens mark the handle so that when
they later complete, that they no longer cause another increase of the
interval (no-bump).
Now the bumping is done either by the timeout, or by the completion of
the request. Whatever happens first.
In concheck_periodic_timeout_cb(), we are not sure that we were
scheduled with the current interval. Instead, the timer might
just cover a part of the interval, for example while resetting
the timer interval.
We must always reschedule the timer.
A larger issue is that concheck_periodic_schedule_do() requires an
interval in nanoseconds scale. We passed the wrong timeout there.
A smaller issue is, when we reset the max_interval to something
shorter, *and* the previously schedule timeout is pending for a shorter
time than the new new max-interval, we only need to re-adjust the
timeout, but keep cur_basetime unchanged.
There can be other reasons why the check was cancelled, not only because
the current item was obsoleted. For example, the caller who scheduled a
check externally, might have cancelled it or NMDevice might be
disposed().
Load the thunderbolt-net module if we see a host-to-host connection
and configure the resulting ethernet connection automatically to be
a link-local only one. The latter is done by setting a new udev
property "NM_AUTO_DEFAULT_LINK_LOCAL_ONLY" which is picked up when
we configure the connection for the device.
https://github.com/NetworkManager/NetworkManager/pull/97
If we can't generate a connection and maybe_later is TRUE, it means
that the device can generate/assume connections but it failed for the
moment due to missing master/slaves/addresses. In this case, just
assume the connection from state file.
https://bugzilla.redhat.com/show_bug.cgi?id=1551958
When the D-Bus name is already taken, NM crashes in the following
way. That's because disposed object are not unexported when quitting
and so they linger in the bus-manager's list of exported objects,
causing an invalid access when a neighboring item is accessed. Instead
of just clearing the path, fully unexport the object.
The behavior of not forcefully exporting objects on quit was added in
f9ee20a7b2 ("core: explicitly unexport objects when we're done with
them"), but such behavior doesn't seem to be needed by the stated
goal.
<error> [1524062008.1886] bus-manager: fatal failure to acquire D-Bus service "org.freedesktop.NetworkManager" (3). Service already taken
<trace> [1524062008.2327] config: state: success writing state file "/var/lib/NetworkManager/NetworkManager.state"
<trace> [1524062008.2338] dns-mgr: stopping...
<info> [1524062008.2344] exiting (error)
<debug> [1524062008.2628] disposing NMManager singleton (0xce587e0)
<trace> [1524062008.2640] dns-mgr: disposing
<debug> [1524062008.2651] disposing NMDnsManager singleton (0xceb8b50)
<debug> [1524062008.2666] disposing NMFirewallManager singleton (0xceb62b0)
<debug> [1524062008.2709] disposing NMHostnameManager singleton (0xce7b370)
<trace> [1524062008.2722] dbus-object[0xce70f40]: unexport: "/org/freedesktop/NetworkManager/AgentManager"
==16381== Invalid write of size 8
==16381== at 0x42F511: c_list_unlink_stale (c-list.h:158)
==16381== by 0x42F511: c_list_unlink (c-list.h:171)
==16381== by 0x42F511: _nm_dbus_manager_obj_unexport (nm-dbus-manager.c:1135)
==16381== by 0x4C5E35: nm_dbus_object_unexport (nm-dbus-object.c:165)
==16381== by 0x5C01E9: dispose (nm-agent-manager.c:1634)
==16381== by 0x6636F37: g_object_unref (gobject.c:3303)
==16381== by 0x4BDC89: _nm_singleton_instance_destroy (nm-core-utils.c:138)
==16381== by 0x400FA85: _dl_fini (in /usr/lib64/ld-2.27.so)
==16381== by 0x7F806AB: __run_exit_handlers (in /usr/lib64/libc-2.27.so)
==16381== by 0x7F807DB: exit (in /usr/lib64/libc-2.27.so)
==16381== by 0x41DA34: main (main.c:463)
==16381== Address 0xce706a0 is 48 bytes inside a block of size 176 free'd
==16381== at 0x4C2EDAC: free (vg_replace_malloc.c:530)
==16381== by 0x6ACA3E1: g_free (gmem.c:194)
==16381== by 0x6AE2572: g_slice_free1 (gslice.c:1136)
==16381== by 0x66550AE: g_type_free_instance (gtype.c:1943)
==16381== by 0x4505F8: dispose (nm-manager.c:6867)
==16381== by 0x6636F37: g_object_unref (gobject.c:3303)
==16381== by 0x4BDC89: _nm_singleton_instance_destroy (nm-core-utils.c:138)
==16381== by 0x400FA85: _dl_fini (in /usr/lib64/ld-2.27.so)
==16381== by 0x7F806AB: __run_exit_handlers (in /usr/lib64/libc-2.27.so)
==16381== by 0x7F807DB: exit (in /usr/lib64/libc-2.27.so)
==16381== by 0x41DA34: main (main.c:463)
==16381== Block was alloc'd at
==16381== at 0x4C2DBAB: malloc (vg_replace_malloc.c:299)
==16381== by 0x6ACA2D5: g_malloc (gmem.c:99)
==16381== by 0x6AE1E36: g_slice_alloc (gslice.c:1025)
==16381== by 0x6AE247C: g_slice_alloc0 (gslice.c:1051)
==16381== by 0x6654E09: g_type_create_instance (gtype.c:1848)
==16381== by 0x66376C7: g_object_new_internal (gobject.c:1799)
==16381== by 0x6638E14: g_object_new_with_properties (gobject.c:1967)
==16381== by 0x66399D0: g_object_new (gobject.c:1639)
==16381== by 0x5D6F18: nm_settings_new (nm-settings.c:1897)
==16381== by 0x4514B4: constructed (nm-manager.c:6489)
==16381== by 0x66378FA: g_object_new_internal (gobject.c:1839)
==16381== by 0x6638E14: g_object_new_with_properties (gobject.c:1967)
https://github.com/NetworkManager/NetworkManager/pull/96