During _new_active_connection() we just create the NMActiveConnection
instance to proceed with authorization. The caller might not even
authorize, so we must not touch the device yet.
Do that only later.
Often, functions perform a series of steps, and when they fail,
they bail out. It's simpler if the code is structured that way,
so you can read it from top to bottom and whenever something is
wrong, either return directly (or goto a cleanup label at the
bottom).
From the D-Bus layer, no specific-object is represented by "/". We
should early on normalize such values to NULL, and not expect or
handle them later (like during _new_active_connection()).
Merge _new_vpn_active_connection() into _new_active_connection(). It was the
only caller, and it is simpler to have all the code visible at one place.
That also shows, that the device argument is ignored and not handled.
Ensure that no device is specified for VPN type activations.
Also, in _add_and_activate_auth_done(), always steal the connection
from active's user-data. Otherwise, the lifetime of the connection
is extended until active gets destroyed. For example, if we would leak
active, we would also leak connection that way.
- pass is-vpn to _new_active_connection(). It is confusing that _new_active_connection()
would re-determine whether a connection is a VPN type, although that was already
established previously. The confusing part is: will they come to the
same result? Why? What if not?
Instead pass it as argument and assert that we got it right.
- the check for existing connections should look whether there is an existing
active connection of type NMVpnConnection. Instead, what matters is,
- do we have a connection of type VPN (otherwise, don't even bother
to search for existing-ac)
- is the connection already active?
Checking whether the connection is already active, and ask backwards
whether it's of type NMVpnConnection is odd, maybe even wrong in
some cases.
- there are only two callers of validate_activation_request(). One of them,
might already lookup the device before calling the validate function.
Safe to looking up again. But this is not only an optimization, more importantly,
it feels odd to first lookup a device, and then later look it up again. Are
we guaranteed to use the same path? Why? Just avoid that question.
- re-order some error checking for missing device, so that it is clearer.
- use cleanup attribute to handle return value and drop the "goto error".
"NMSettingsConnectionFlags" was an internal enum. Soon, we will add such
a type in libnm. Avoid the naming conflict by renaming. The "Int" stands
for "internal".
NMAuthChain is not really ref-counted. True, we have an internal ref-counter
to ensure that the instance stays alive while the callback is invoked. However,
the user cannot take additional references as there is no nm_auth_chain_ref().
When the user wants to get rid of the auth-chain, with the current API it
is important that the callback won't be called after that point. From the
name nm_auth_chain_unref(), it sounds like that there could be multiple references
to the auth-chain, and merely unreferencing the object might not guarantee that
the callback is canceled. However, that is luckily not the case, because
there is no real ref-counting involved here.
Just rename the destroy function to make this clearer.
Essentially, nm_connection_get_path() mirros nm_dbus_object_get_path().
However, when cloning a simple-connection, the path also gets cloned.
I think this field doesn't belong to NMConnection in the first place,
because NMConnection is not a D-Bus object. NMSettingsConnection (in
core) and NMRemoteConnection (in libnm) is.
Don't use the misleading alias, but use nm_dbus_object_get_path()
directly.
NMManager very much cares about changes to the connectivity state
of the device and was therefore listening to notify::connectivity
signals. However, property changed signals can be suppressed by
g_object_freeze_notify(). That is something we even encourage for
NMDBusObject instances, because the D-Bus glue makes use of the
property changed notifications, and encourages to combine multiple
changes by freezing the signal.
Using the property changed notifications of NMDBusObject instances is
ugly. Don't do that and instead add a special signal.
It might happen, that connectivitiy is lost only for a moment and
returns soon after. Based on that assumption, when we loose connectivity
we want to have a probe interval where we check for returning
connectivity more frequently.
For that, we handle tracking of the timeouts per-device.
The intervall shall start with 1 seconds, and double the interval time until
the full interval is reached. Actually, due to the implementation, it's unlikely
that we already perform the second check 1 second later. That is because commonly
the first check returns before the one second timeout is reached and bumps the
interval to 2 seconds right away.
Also, we go through extra lengths so that manual connectivity check
delay the periodic checks. By being more smart about that, we can reduce
the number of connectivity checks, but still keeping the promise to
check at least within the requested interval.
The complexity of book keeping the timeouts is remarkable. But I think
it is worth the effort and we should try hard to
- have a connectivity state as accurate as possible. Clearly,
connectivity checking means that we probing, so being more intelligent
about timeout and backoff timers can result in a better connectivity
state. The connectivity state is important because we use it for
the default-route penaly and the GUI indicates bad connectivity.
- be intelligent about avoiding redundant connectivity checks. While
we want to check often to get an accurate connectivity state, we
also want to minimize the number of HTTP requests, in case the
connectivity is established and suppossedly stable.
Also, perform connectivity checks in every state of the device.
Even if a device is disconnected, it still might have connectivity,
for example if the user externally adds an IP address on an unmanaged
device.
https://bugzilla.gnome.org/show_bug.cgi?id=792240
An asynchronous request should either be cancellable or not keep
the target object alive. Preferably both.
Otherwise, it is impossible to do a controlled shutdown when terminating
NetworkManager. Currently, when NetworkManager is about to terminate,
it just quits the mainloop and essentially leaks everything. That is a
bug. If we ever want to fix that, every asynchronous request must be
cancellable in a controlled way (or it must not prevent objects from
getting disposed, where disposing the object automatically cancels the
callback).
Rework the asynchronous request for connectivity check to
- return a handle that can be used to cancel the operation.
Cancelling is optional. The caller may choose to ignore the handle
because the asynchronous operation does not keep the target object
alive. That means, it is still possible to shutdown, by everybody
giving up their reference to the target object. In which case the
callback will be invoked during dispose() of the target object.
- also, the callback will always be invoked exactly once, and never
synchronously from within the asynchronous start call. But during
cancel(), the callback is invoked synchronously from within cancel().
Note that it's only allowed to cancel an action at most once, and
never after the callback is invoked (also not from within the callback
itself).
- also, NMConnectivity already supports a fake handler, in case
connectivity check is disabled via configuration. Hence, reuse
the same code paths also when compiling without --enable-concheck.
That means, instead of having #if WITH_CONCHECK at various callers,
move them into NMConnectivity. The downside is, that if you build
without concheck, there is a small overhead compared to before. The
upside is, we reuse the same code paths when compiling with or without
concheck.
- also, the patch synchronizes the connecitivty states. For example,
previously `nmcli networking connectivity check` would schedule
requests in parallel, and return the accumulated result of the individual
requests.
However, the global connectivity state of the manager might have have
been the same as the answer to the explicit connecitivity check,
because while the answer for the manual check is waiting for all
pending checks to complete, the global connectivity state could
already change. That is just wrong. There are not multiple global
connectivity states at the same time, there is just one. A manual
connectivity check should have the meaning of ensure that the global
state is up to date, but it still should return the global
connectivity state -- not the answers for several connectivity checks
issued in parallel.
This is related to commit b799de281b
(libnm: update property in the manager after connectivity check),
which tries to address a similar problem client side.
Similarly, each device has a connectivity state. While there might
be several connectivity checks per device pending, whenever a check
completes, it can update the per-device state (and return that device
state as result), but the immediate answer of the individual check
might not matter. This is especially the case, when a later request
returns earlier and obsoletes all earlier requests. In that case,
earlier requests return with the result of the currend devices
connectivity state.
This patch cleans up the internal API and gives a better defined behavior
to the user (thus, the simple API which simplifies implementation for the
caller). However, the implementation of getting this API right and properly
handle cancel and destruction of the target object is more complicated and
complex. But this but is not just for the sake of a nicer API. This fixes
actual issues explained above.
Also, get rid of GAsyncResult to track information about the pending request.
Instead, allocate our own handle structure, which ends up to be nicer
because it's strongly typed and has exactly the properties that are
useful to track the request. Also, it gets rid of the awkward
_finish() API by passing the relevant arguments to the callback
directly.
Specify a reason when creating active connections. The reason will be
used in the next commit to tell whether slaves must be reconnected or
not if a master has autoconnect-slaves=yes.
This allows to adjust the timeout of an existing checkpoint.
The main usecase of checkpoints, is to have a fail-safe when
configuring the network remotely. By allowing to reset the timeout,
the user can perform a series of actions, and keep bumping the
timeout. That way, the entire series is still guarded by the same
checkpoint, but the user can start with short timeout, and
re-adjust the timeout as he goes along.
The libnm API only implements the async form (at least for now).
Sync methods are fundamentally wrong with D-Bus, and it's probably
not needed. Also, follow glib convenction, where the async form
doesn't have the _async name suffix. Also, accept a D-Bus path
as argument, not a NMCheckpoint instance. The libnm API should
not be more restricted than the underlying D-Bus API. It would
be cumbersome to require the user to lookup the NMCheckpoint
instance first, especially since libnm doesn't provide an efficient
or convenient lookup-by-path method. On the other hand, retrieving
the path from a NMCheckpoint instance is always possible.
NMCheckpointManager was added and is not ref-countable, because it
is not needed.
I still often like for such objects (that are not ref-countable),
that their destroy function is called "unref". Both for consistency,
and also if we would later add ref-counting to the object.
However, NMCheckpointManager keeps a pointer to NMManager. So, when
NMManager gets destroyed, it *MUST* destroy the NMCheckpointManager.
It cannot accept that the checkpoint manager outlives NMManager,
but the "unref" name suggests that somebody else might have still
a reference to this object keeping it alive. That is not the case.
Rename so that this is clear.
I would name it nm_checkpoint_manager_destroy(), but "destroy" already
has a meaning for NMCheckpoint instances, so use "free".
Instead of generating a connection and checking whether it is
compatible with the connection indicated in the state file, just pick
the indicated connection after a basic check of compatibility with the
device.
https://bugzilla.redhat.com/show_bug.cgi?id=1551958
We already track an index of exported objects in NMDBusManager.
Actually, that index was unused previously. We either could drop
it, or use it. Let's use it.
For comparing MAC addresses, they anyway have to be normalized
to binary. Convert it once outside the loop and pass the binary
form to nm_utils_hwaddr_matches(). Otherwise, we need to re-convert
it every time.
Instead of using a GSList for tracking the devices, use a CList.
I think a CList is in most cases the more suitable data structure
then GSList:
- you can find out in O(1) whether the object is linked. That
is nice, for example to assert in NMDevice's destructor that
the object was unlinked, and we will use that later in
nm_manager_get_device_by_path().
- you can unlink the element in O(1) and you can unlink the
element without having access to the link's head
- Contrary to GSList, this does not require an extra slice
allocation for the link node. It quite possibliy consumes
slightly less memory because the CList structure is embedded
in a struct that we already allocate. Even if slice allocation
would be perfect to only consume 2*sizeof(gpointer) for the link
note, it would at most be as-good as CList. Quite possibly,
there is an overhead though.
- CList possibly has better memory locality, because the link
structure and the data are close to each other.
Something which could be seen as disavantage, is that with CList
one device can only be tracked in one NMManager instance at a time.
But that is fine. There exists only one NMManager instance for now,
and even if we would ever introduce multiple managers, we probably
would not associate one NMDevice instance with multiple managers.
The advantages are arguably not huge, but CList is IMHO clearly the
more suited data structure. No need to stick to a suboptimal data
structure for the job. Refactor it.
Since commit ed640f857a ("manager: ignore unmanaged devices when
looking for parent by UUID"), unmanaged devices are ignored when
looking for potential parent connection matches. Therefore, a software
device can fail autoactivation because the parent is not managed yet
and NM never tries to reactivate it. Ensure that we retry other
devices when a parent device becomes managed.
Fixes: ed640f857ahttps://bugzilla.redhat.com/show_bug.cgi?id=1553595
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
We create the all_connections list in impl_manager_add_and_activate_connection().
With this list we either call nm_utils_complete_generic() directly,
or pass it to nm_device_complete_connection(). The latter also ends
up only calling nm_utils_complete_generic() with this argument.
nm_utils_complete_generic() doesn't care about the order in which
the connections are returned. Which also becomes apparent, because
we first sorted the list, and then reverted the order with
g_slist_prepend().
Do not sort the list, hence we also don't need to clone it.
I dislike the static hash table to cache the integer counter for
numbered paths. Let's instead cache the counter at the class instance
itself -- since the class contains the information how the export
path should be exported.
However, we cannot use a plain integer field inside the class structure,
because the class is copied between derived classes. For example,
NMDeviceEthernet and NMDeviceBridge both get a copy of the NMDeviceClass
instance. Hence, the class doesn't contain the counter directly, but
a pointer to one counter that can be shared between sibling classes.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The next commit will completely rework NMBusManager and replace
NMExportedObject by a new type NMDBusObject.
Originally, NMDBusObject was added along NMExportedObject to ease
the rework and have compilable, intermediate stages of refactoring. Now,
I think the new name is better, because NMDBusObject is very strongly related
to the bus manager and the old name NMExportedObject didn't make that
clear.
I also slighly prefer the name NMDBusObject over NMBusObject, hence
for consistancy, also rename NMBusManager to NMDBusManager.
This commit only renames the file for a nicer diff in the next commit.
It does not actually update the type name in sources. That will be done
later.
On exit during NMManager's dispose(), we must fist remove active connections
via active_connection_remove(), before clearing the volatile-connection-list.
Otheriwise, while deleting the active connection, we schedule a idle action
to delete the volatile connection on idle, but at that point the dispose()
already cleaned up the idle list.
==3150== 72 (24 direct, 48 indirect) bytes in 1 blocks are definitely lost in loss record 3,411 of 6,079
==3150== at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
==3150== by 0x6AB7358: g_malloc (gmem.c:94)
==3150== by 0x6ACEF35: g_slice_alloc (gslice.c:1025)
==3150== by 0x1686B1: connection_flags_changed (nm-manager.c:1823)
==3150== by 0x661F73C: g_closure_invoke (gclosure.c:804)
==3150== by 0x66324DD: signal_emit_unlocked_R (gsignal.c:3635)
==3150== by 0x663AD04: g_signal_emit_valist (gsignal.c:3391)
==3150== by 0x663B66E: g_signal_emit (gsignal.c:3447)
==3150== by 0x2EC753: connection_flags_changed (nm-settings.c:824)
==3150== by 0x661F73C: g_closure_invoke (gclosure.c:804)
==3150== by 0x66324DD: signal_emit_unlocked_R (gsignal.c:3635)
==3150== by 0x663AD04: g_signal_emit_valist (gsignal.c:3391)
==3150== by 0x663B66E: g_signal_emit (gsignal.c:3447)
==3150== by 0x6623C03: g_object_dispatch_properties_changed (gobject.c:1080)
==3150== by 0x1DFD47: dispatch_properties_changed (nm-dbus-object.c:237)
==3150== by 0x6626178: g_object_notify_by_spec_internal (gobject.c:1173)
==3150== by 0x6626178: g_object_notify_by_pspec (gobject.c:1283)
==3150== by 0x2E7377: _notify (nm-settings-connection.c:53)
==3150== by 0x2E7377: nm_settings_connection_set_flags_full (nm-settings-connection.c:2346)
==3150== by 0x2E744D: nm_settings_connection_set_flags (nm-settings-connection.c:2316)
==3150== by 0x2E7466: set_visible (nm-settings-connection.c:316)
==3150== by 0x2E7774: nm_settings_connection_delete (nm-settings-connection.c:795)
==3150== by 0x1665A8: _delete_volatile_connection_do (nm-manager.c:598)
==3150== by 0x1668F4: active_connection_remove (nm-manager.c:625)
==3150== by 0x16ABA7: dispose (nm-manager.c:6726)
==3150== by 0x6624607: g_object_unref (gobject.c:3293)
==3150== by 0x1D779B: _nm_singleton_instance_destroy (nm-core-utils.c:138)
==3150== by 0x4011332: _dl_fini (in /usr/lib64/ld-2.26.so)
==3150== by 0x815FB57: __run_exit_handlers (in /usr/lib64/libc-2.26.so)
==3150== by 0x815FBA9: exit (in /usr/lib64/libc-2.26.so)
==3150== by 0x1383C7: main (main.c:467)
There is a small change in behavior:
Previously, the DEACTIVATING/DEACTIVATED states were set if and only if
the previous state was less or equal then ACTIVATED. For example,
if the state was already DEACTIVATING, it would have done nothing.
Now, nm_active_connection_set_state_fail() transitions the states
depending on the previous state. E.g. it would only set DEACTIVATING
state, if the previous state was ACTIVATING/ACTIVATED. On the other hand,
it would always progress the state to DEACTIVATED.
The new behavior makes more sense to me, although I doubt that there is
a visible difference.
unmanaged_to_disconnected() is supposed to mark the device as managed.
However, it may easily be unable to do so, for example if the device
is unmanaged by NM_UNMANAGED_USER_SETTINGS.
Shortly before actually enqueuing the activation request, check and
error out. Otherwise, we might hit an assertion later in
_device_activate().
Note how recheck_assume_connection() called:
nm_exported_object_export (NM_EXPORTED_OBJECT (active));
active_connection_add (self, active);
nm_device_queue_activation (device, NM_ACT_REQUEST (active));
That differs from the order during _internal_activate_generic(), where
we would end up with:
nm_exported_object_export (NM_EXPORTED_OBJECT (active));
nm_device_queue_activation (device, NM_ACT_REQUEST (active));
active_connection_add (self, active);
It makes more sense to me to *first* add the connection, and only then
starting the activation with nm_device_queue_activation().
Also, let active_connection_add() always export the new active
connection object, if it is not already exported. All callers of
active_connection_add() ensured that the new object is already
exported.