If the active connection is deactivated because the device is gone,
don't block autoconnection. Otherwise, whenever the device comes
back (e.g. maybe it was reset in the middle of a connection attempt),
the autoconnection logic won't be triggered, as the settings are still
blocked.
I'm able to reproduce this by performing a WWAN modem reset in the
middle of a connection attempt.
https://github.com/NetworkManager/NetworkManager/pull/121
Coccinelle:
@@
expression a, b;
@@
-a ? a : b
+a ?: b
Applied with:
spatch --sp-file ternary.cocci --in-place --smpl-spacing --dir .
With some manual adjustments on spots that Cocci didn't catch for
reasons unknown.
Thanks to the marvelous effort of the GNU compiler developer we can now
spare a couple of bits that could be used for more important things,
like this commit message. Standards commitees yet have to catch up.
"NMSettingsConnectionFlags" was an internal enum. Soon, we will add such
a type in libnm. Avoid the naming conflict by renaming. The "Int" stands
for "internal".
Specify a reason when creating active connections. The reason will be
used in the next commit to tell whether slaves must be reconnected or
not if a master has autoconnect-slaves=yes.
I find it slightly nicer and explict. Also, the list elements
are strictly speaking private, we should better not explicitly
use them outside of NMManager/NMDevice. The macro hides this.
Instead of using a GSList for tracking the devices, use a CList.
I think a CList is in most cases the more suitable data structure
then GSList:
- you can find out in O(1) whether the object is linked. That
is nice, for example to assert in NMDevice's destructor that
the object was unlinked, and we will use that later in
nm_manager_get_device_by_path().
- you can unlink the element in O(1) and you can unlink the
element without having access to the link's head
- Contrary to GSList, this does not require an extra slice
allocation for the link node. It quite possibliy consumes
slightly less memory because the CList structure is embedded
in a struct that we already allocate. Even if slice allocation
would be perfect to only consume 2*sizeof(gpointer) for the link
note, it would at most be as-good as CList. Quite possibly,
there is an overhead though.
- CList possibly has better memory locality, because the link
structure and the data are close to each other.
Something which could be seen as disavantage, is that with CList
one device can only be tracked in one NMManager instance at a time.
But that is fine. There exists only one NMManager instance for now,
and even if we would ever introduce multiple managers, we probably
would not associate one NMDevice instance with multiple managers.
The advantages are arguably not huge, but CList is IMHO clearly the
more suited data structure. No need to stick to a suboptimal data
structure for the job. Refactor it.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The condition was obviosly inverted, blocking autoconnect when
it should not, and not blocking it when it should.
[thaller@redhat.com: modified original patch and rewrite commit message]
Fixes: e2c8ef45achttps://bugzilla.gnome.org/show_bug.cgi?id=794014
XXX was used to either raise attention (NOTE) or to indicate
that this is ugly code that should be fixed (FIXME). The usage
was inconsistent.
Let's avoid XXX and use either NOTE or FIXME.
Don't track the per-device configuration in NMDnsManager by
the ifname, but by the ifindex. We should consistently treat
the ifindex as the ID of a link, like kernel does.
At the few places where we actually need the ifname, resolve
it by looking into the platform cache. That is not necessarily
the same as the ifname that is currently tracked by NMDevice,
because netdev interfaces can be renamed, and NMDevice updates
it's link properties delayed. However, the platform cache has
the most recent notion of the correct interface name for an
ifindex, so if we ever hit a race here, we do it now more
correctly.
This also temporarily drops support for mdns. Will be re-added next,
but differently.
"UNKNOWN" is not a good name. If you don't set the property
in the connection explicitly, it should be "DEFAULT".
Also, make "DEFAULT" -1. For one, that ensures that the enum's
underlying integer type is signed. Otherwise, it's cumbersome
to test "if (mdns >= DEFAULT)" because in case of unsigned types,
the compiler will warn about the check always being true.
Also, it allows for "NO" to be zero. These are no strong reasons,
but I tend to think this is better.
Also, don't make the property of NMSettingConnection a CONSTRUCT property.
Initialize the default manually in the init function.
Also, order the numeric values so that DEFAULT < NO < RESOLVE < YES with
YES being largest because it enables *the most*.
Update nm-policy.c and nm-dns-manager.c so that the connection-specific
settings get propagated to DNS manger. Currently the only such value is
the mDNS status.
Add update_mdns() function to DNS plugin interface. If a DNS plugin
supports mDNS, it can set an interface with a given index to support
mDNS resolving or also register the current hostname.
The mDNS support is currently added only to systemd-resolved DNS plugin.
Previously, NMPolicy would explicitly check whether the connection is not visible,
to skip autoconnect.
We have nm_settings_connection_autoconnect_is_blocked() function, that can do that.
The advantage is, that at various places we call nm_settings_connection_autoconnect_is_blocked()
to determine whether autoconnect is blocked. By declaring invisible connections
as blocked from autoconnect as well, we short-cut various autoconnection attempts,
that previoulsy only failed later during auto_activate_device().
The accessor functions just look whether a certain flag is set. As these
functions have a different name then the flags, this is more confusing
then helpful. For example, if you want to know where the NM_GENERATED
flag matters, you had to know to grep for nm_settings_connection_get_nm_generated()
in addition to NM_SETTINGS_CONNECTION_FLAGS_NM_GENERATED.
The accessor function hid that the property was implemented as
a connection flag. For example, it was not immediately obvious
that nm_settings_connection_get_nm_generated() is the same
as having the NM_SETTINGS_CONNECTION_FLAGS_NM_GENERATED flag
set.
Drop them.
We already need to re-emit the notify::flags signal.
It's cumbersome to do this for boolean properties, so
re-use the flags to also track the visibility state.
When activation of the connection fails with no-secrets, we block
autoconnect due to that. However, NMPolicy also unblocks such
autoconnect, whenever a new secret-agent registers. The reason
is obviously, that the new secret-agent might be able to provide
the previously missing secrets.
However, there is a race between
- making the secret request, failing activation and blocking autoconnect
- new secret-agent registers
If the secret-agent registers after making the request, but before we
block autoconnect, then autoconnect stays blocked.
[1511468634.5759] device (wlp4s0): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed')
[1511468634.5772] device (wlp4s0): No agents were available for this request.
[1511468638.4082] agent-manager: req[0x55ea7e58a5d0, :1.32/org.kde.plasma.networkmanagement/1000]: agent registered
[1511468638.4082] policy: re-enabling autoconnect for all connections with failed secrets
[1511468664.6280] device (wlp4s0): state change: need-auth -> failed (reason 'no-secrets', sys-iface-state: 'managed')
[1511468664.6287] policy: connection 'tuxmobil' now blocked from autoconnect due to no secrets
Note the long timing between making the secret request and the
activation failure. This race already existed before, but now with
WPS push-button method enabled by default, the duraction of the
activation is much longer and the race is easy to hit.
https://bugzilla.gnome.org/show_bug.cgi?id=790571
We want to only check for autoconnect all, if something happend that
makes it possible that we can autoconnect now (while we couldn't
previously).
It's not a real problem to check more often then strictly necessary.
But add a check to rule out a few false-positives to avoid the
overhead of checking all devices for autoconnect.
reset_autoconnect_all() as two callers with only_no_secrets=FALSE:
- sleeping_changed() when NM returns from sleep.
- when device changes to state DISCONNECTED with reason
CARRIER.
In both cases, this should not overwrite a previous decision by
the user that the connection should not autoconnect.
schedule_activate_all() is expensive. It iterates over all devices, and asks
them to autoactivated (which might involve iterating over all connections for
each device). Avoid it if nothing changed.
Extend the enum and API to use flags for the blocked reasons.
A connection is blocked from autoconnect if it has any reason
set.
There is no behavioral change in this patch beyond that, because
where we previously would set blocked-reason NONE, we would still
clear all flags, and not only a particular one.
Later of course, we want to set and clear individual flags
independently.
NMPolicy printed
policy: connection 'a' failed to autoconnect; 1 tries left
settings-connection[0x55a485553b60,ab9f3891-3420-335e-89da-f14c1b94c540]: autoconnect: retries set 0
That is, it claimed there was one more try, when in fact there wasn't.
Our GError codes are mostly meaningless, only the message is interesting.
And our messages should anyway be unique, so that one could understand
which was the corresponding error code (by inspecting the source code).
While at it, use gs_free_error.
Note that for the
if (nm_device_state_reason_check (reason) == NM_DEVICE_STATE_REASON_NO_SECRETS)
case we no longer do the
if (nm_settings_connection_autoconnect_retries_get (connection) == 0)
check. But that is fine, because we only skip schedling a reset_connections_retries()
action. But note, that that previously we also would never actually
scheudle a new timeout, because
- either nm_settings_connection_autoconnect_retries_get (connection) != 0
- or the retries count was zero, in which case we already have a
reset_connections_retries action pending (from the time when we
set it to zero.
So, there is no change in behavior at all except dropping of a redundant
logging line.
NMSettingsConnection has 3 properties that are related to autoconnect:
- autoconnect_retries
- autoconnect_blocked_until
- autoconnect_blocked_reason
autoconnect_blocked_reason is entirely independent from the other two.
A connection have have autoconnect blocked via a blocked-reason, but the
retry count is not affected by that. The retry count is an independent
mechanism, that may additionally prevent autoconnect.
However autoconnect_retries and autoconnect_retries_blocked_until are
strongly related. The latter is set if and only if autoconnect_retries is
at zero.
Rename to reflect that better.
Using CList, we embed the list element in NMActiveConnection struct
itself. That means for example, that you couldn't track a
NMActiveConnection more then once. But we anyway never want that.
The advantage is, that removing an active connection from the list
is O(1), and we safe additional GSlice allocations for each node
element.
We also do this for libnm, where it causes visible changes
in behavior. But if somebody would rely on the hashing implementation
for hash tables, it would be seriously flawed.
When a master connection is deactivated by user, we set the
autoconnect-blocked reason 'user-request' for the connection and we
propagate the same reason to slaves. Doing so prevents the
autoactivation of slaves when the master is manually activated again,
because the only way to override the 'user-request' blocked reason is
through manual activation of slaves.
Instead what should happen is that the manual deactivation of a master
marks slaves as blocked for failed dependencies. When the master
becomes available again, slaves can autoactivate if the profile allows
it.
https://bugzilla.redhat.com/show_bug.cgi?id=1437598