Instead of returning a boolean @is_user_config value from
get_configured_mtu(), return an mtu-source enum with possible values
NONE,CONNECTION. This enum will be expanded later; for now there is no
change in behavior.
(cherry picked from commit 9f8b0697de)
nm_device_steal_connection() was a bit misleading. It only had one caller,
and what _internal_activate_device() really wants it to deactivate all
other active-connections for the same connection. Hence, it already
performed a lookup for the active-connection that should be disconnected,
only to then lookup the device, and tell it to steal the connection.
Note, that if existing_ac happens to be neither the queued nor the currenct
active connection, then previously it would have done nothing. It's
unclear when that exactly can happen, however, we can avoid that
question entirely.
Instead of having steal-connection(), have a disconnect-active-connection().
If there is no matching device, it will just set the active-connection's
state to DISCONNECTED. Which in turn does nothing, if the state is
already DISCONNECTED.
Without this, nm_device_get_type_description() would quite likely
return "ethernet" for NMDeviceVeth types. This is wrong and was
broken recently.
Fixes: 0775602574
NMManager very much cares about changes to the connectivity state
of the device and was therefore listening to notify::connectivity
signals. However, property changed signals can be suppressed by
g_object_freeze_notify(). That is something we even encourage for
NMDBusObject instances, because the D-Bus glue makes use of the
property changed notifications, and encourages to combine multiple
changes by freezing the signal.
Using the property changed notifications of NMDBusObject instances is
ugly. Don't do that and instead add a special signal.
It might happen, that connectivitiy is lost only for a moment and
returns soon after. Based on that assumption, when we loose connectivity
we want to have a probe interval where we check for returning
connectivity more frequently.
For that, we handle tracking of the timeouts per-device.
The intervall shall start with 1 seconds, and double the interval time until
the full interval is reached. Actually, due to the implementation, it's unlikely
that we already perform the second check 1 second later. That is because commonly
the first check returns before the one second timeout is reached and bumps the
interval to 2 seconds right away.
Also, we go through extra lengths so that manual connectivity check
delay the periodic checks. By being more smart about that, we can reduce
the number of connectivity checks, but still keeping the promise to
check at least within the requested interval.
The complexity of book keeping the timeouts is remarkable. But I think
it is worth the effort and we should try hard to
- have a connectivity state as accurate as possible. Clearly,
connectivity checking means that we probing, so being more intelligent
about timeout and backoff timers can result in a better connectivity
state. The connectivity state is important because we use it for
the default-route penaly and the GUI indicates bad connectivity.
- be intelligent about avoiding redundant connectivity checks. While
we want to check often to get an accurate connectivity state, we
also want to minimize the number of HTTP requests, in case the
connectivity is established and suppossedly stable.
Also, perform connectivity checks in every state of the device.
Even if a device is disconnected, it still might have connectivity,
for example if the user externally adds an IP address on an unmanaged
device.
https://bugzilla.gnome.org/show_bug.cgi?id=792240
An asynchronous request should either be cancellable or not keep
the target object alive. Preferably both.
Otherwise, it is impossible to do a controlled shutdown when terminating
NetworkManager. Currently, when NetworkManager is about to terminate,
it just quits the mainloop and essentially leaks everything. That is a
bug. If we ever want to fix that, every asynchronous request must be
cancellable in a controlled way (or it must not prevent objects from
getting disposed, where disposing the object automatically cancels the
callback).
Rework the asynchronous request for connectivity check to
- return a handle that can be used to cancel the operation.
Cancelling is optional. The caller may choose to ignore the handle
because the asynchronous operation does not keep the target object
alive. That means, it is still possible to shutdown, by everybody
giving up their reference to the target object. In which case the
callback will be invoked during dispose() of the target object.
- also, the callback will always be invoked exactly once, and never
synchronously from within the asynchronous start call. But during
cancel(), the callback is invoked synchronously from within cancel().
Note that it's only allowed to cancel an action at most once, and
never after the callback is invoked (also not from within the callback
itself).
- also, NMConnectivity already supports a fake handler, in case
connectivity check is disabled via configuration. Hence, reuse
the same code paths also when compiling without --enable-concheck.
That means, instead of having #if WITH_CONCHECK at various callers,
move them into NMConnectivity. The downside is, that if you build
without concheck, there is a small overhead compared to before. The
upside is, we reuse the same code paths when compiling with or without
concheck.
- also, the patch synchronizes the connecitivty states. For example,
previously `nmcli networking connectivity check` would schedule
requests in parallel, and return the accumulated result of the individual
requests.
However, the global connectivity state of the manager might have have
been the same as the answer to the explicit connecitivity check,
because while the answer for the manual check is waiting for all
pending checks to complete, the global connectivity state could
already change. That is just wrong. There are not multiple global
connectivity states at the same time, there is just one. A manual
connectivity check should have the meaning of ensure that the global
state is up to date, but it still should return the global
connectivity state -- not the answers for several connectivity checks
issued in parallel.
This is related to commit b799de281b
(libnm: update property in the manager after connectivity check),
which tries to address a similar problem client side.
Similarly, each device has a connectivity state. While there might
be several connectivity checks per device pending, whenever a check
completes, it can update the per-device state (and return that device
state as result), but the immediate answer of the individual check
might not matter. This is especially the case, when a later request
returns earlier and obsoletes all earlier requests. In that case,
earlier requests return with the result of the currend devices
connectivity state.
This patch cleans up the internal API and gives a better defined behavior
to the user (thus, the simple API which simplifies implementation for the
caller). However, the implementation of getting this API right and properly
handle cancel and destruction of the target object is more complicated and
complex. But this but is not just for the sake of a nicer API. This fixes
actual issues explained above.
Also, get rid of GAsyncResult to track information about the pending request.
Instead, allocate our own handle structure, which ends up to be nicer
because it's strongly typed and has exactly the properties that are
useful to track the request. Also, it gets rid of the awkward
_finish() API by passing the relevant arguments to the callback
directly.
Instead of using a GSList for tracking the devices, use a CList.
I think a CList is in most cases the more suitable data structure
then GSList:
- you can find out in O(1) whether the object is linked. That
is nice, for example to assert in NMDevice's destructor that
the object was unlinked, and we will use that later in
nm_manager_get_device_by_path().
- you can unlink the element in O(1) and you can unlink the
element without having access to the link's head
- Contrary to GSList, this does not require an extra slice
allocation for the link node. It quite possibliy consumes
slightly less memory because the CList structure is embedded
in a struct that we already allocate. Even if slice allocation
would be perfect to only consume 2*sizeof(gpointer) for the link
note, it would at most be as-good as CList. Quite possibly,
there is an overhead though.
- CList possibly has better memory locality, because the link
structure and the data are close to each other.
Something which could be seen as disavantage, is that with CList
one device can only be tracked in one NMManager instance at a time.
But that is fine. There exists only one NMManager instance for now,
and even if we would ever introduce multiple managers, we probably
would not associate one NMDevice instance with multiple managers.
The advantages are arguably not huge, but CList is IMHO clearly the
more suited data structure. No need to stick to a suboptimal data
structure for the job. Refactor it.
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The connection.mdns setting is a per-connection setting,
so one might expect that one activated device can only have
one MDNS setting at a time.
However, with certain VPN plugins (those that don't have their
own IP interface, like libreswan), the VPN configuration is merged
into the configuration of the device. So, in this case, there
might be multiple settings for one device that must be merged.
We already have a mechanism for that. It's NMIP4Config. Let NMIP4Config
track this piece of information. Although, stricitly speaking this
is not tied to IPv4, the alternative would be to introduce a new
object to track such data, which would be a tremendous effort
and more complicated then this.
Luckily, NMDnsManager and NMDnsPlugin are already equipped to
handle multiple NMIPConfig instances per device (IPv4 vs. IPv6,
and Device vs. VPN).
Also make "connection.mdns" configurable via global defaults in
NetworkManager.conf.
Instead, intern the string and cache it in the NMDeviceClass instance.
It anyway depends entirely on the GObject type (name), hence it should
also be cached at the type.
The _NM_GET_PRIVATE() macro already preserved and propagated
the constness of @self to the resulting private pointer.
_NM_GET_PRIVATE_PTR() didn't do that. Extend the macro,
to make that possible.
- split NM_DEVICE_AUTOCONNECT_BLOCKED_INTERN in two parts:
"wrong-pin" and "manual-disconnect". Setting/unsetting them
should be tracked differently, as their reason differs.
- no longer initialize/clear the autoconnect-blocked reasons
during realize/unrealize of the device. Instead, initialize
it once when the object gets created (nm_device_init()), and
keep the settings beyond unrealize/realize cycles. This only
matters for software devices, as regular devices get deleted
after unrealizing once. But for software devices it is essential,
because we don't want to forget the autoconnect settings of
the device instance.
- drop verbose logging about blocking autoconnect due to failed
pin. We already log changes to autoconnect-blocked flags with
TRACE level. An additional message about this particular issue
seems not necessary at INFO level.
- in NMManager's do_sleep_wake(), no longer block autoconnect
for devices during sleep. We already unmanage the device, which
is a far more effective measure to prevent activation. We should
not also block autoconnect.
The flags allow for more then two reasons. Currently the only reasons
for allowing or disallowing autoconnect are "user" and "intern".
It's a bit odd, that NMDeviceAutoconnectBlockedFlags has a negative
meaning. So
nm_device_set_autoconnect_intern (device, FALSE);
gets replaced by
nm_device_set_autoconnect_blocked_set (device, NM_DEVICE_AUTOCONNECT_BLOCKED_INTERN);
and so on.
However, it's chosen this way, because autoconnect shall be allowed,
unless any blocked-reason is set. That is, to check whether autoconnect
is allowed, we do
if (!nm_device_get_autoconnect_blocked (device, NM_DEVICE_AUTOCONNECT_BLOCKED_ALL))
The alternative check would be
if (nm_device_get_autoconnect_allowed (device, NM_DEVICE_AUTOCONNECT_ALLOWED_ALL) == NM_DEVICE_AUTOCONNECT_ALLOWED_ALL)
which seems odd too.
So, add the inverse flags to block autoconnect.
Beside refactoring and inverting the meaning of the autoconnect
settings, there is no change in behavior.
The number of authentication retires is useful also for passwords aside
802-1x settings. For example, src/devices/wifi/nm-device-wifi.c also has
a retry counter and uses a hard-coded value of 3.
Move the setting, so that it can be used in general. Although it is still
not implemented for other settings.
This is an API and ABI break.
Since commit 4a6fd0e83e (device: honor the
connection.autoconnect-retries for 802.1X) and the related bug bgo#723084,
we reuse the autoconnect-retries setting to control the retry count
for requesting passwords.
I think that is wrong. These are two different settings, we should not
reuse the autoconnect retry counter while the device is still active.
For example, the user might wish to set autoconnect-retries to infinity
(zero). In that case, we would retry indefinitly to request a password.
That could be problematic, if there is a different issue with the
connection, that makes it appear tha the password is wrong.
A full re-activation might succeed, but we would never stop retrying
to authenticate. Instead, we should have two different settings for
retrying to authenticate and to autoconnect.
This is a change in behavior compared to 1.8.
For some software devices, the platform link appears only after they've been
realized. Update their properties and let them know that the link has changed
so they can eventually proceed with activation.
Also, reset the properties (udi, iface, driver) that are set from the platform
link when the link goes away. At that point they don't reflect reality anymore.
Removes some code duplication too.
We added "ipv4.route-table-sync" and "ipv6.route-table-sync" to not change
behavior for users that configured policy routing outside of NetworkManager,
for example, via a dispatcher script. Users had to explicitly opt-in
for NetworkManager to fully manage all routing tables.
These settings were awkward. Replace them with new settings "ipv4.route-table"
and "ipv6.route-table". Note that this commit breaks API/ABI on the unstable
development branch by removing recently added API.
As before, a connection will have no route-table set by default. This
has the meaning that policy-routing is not enabled and only the main table
will be fully synced. Once the user sets a table, we recognize that and
NetworkManager manages all routing tables.
The new route-table setting has other important uses: analog to
"ipv4.route-metric", it is the default that applies to all routes.
Currently it only works for static routes, not DHCP, SLAAC,
default-route, etc. That will be implemented later.
For static routes, each route still can explicitly set a table, and
overwrite the per-connection setting in "ipv4.route-table" and
"ipv6.route-table".
The name nm_device_get_priority() is misleading. Nowadays it's only used
for the default route metric, and nothing else.
Rename it, and make it static.
We already have various ways to mark a device as unmanaged.
1) via udev-rule ENV{NM_UNMANAGED}. This can be overwritten via D-Bus
at runtime.
2) via settings plugin. That is NM_CONTROLLED=no for ifcfg-rh and
keyfile.unmanaged-devices in NetworkManager.conf.
3) at runtime, via D-Bus. This is persisted in the run state file
and persists restarts (but not reboot).
This adds another way via NetworkManager.conf file. Note that the
existing keyfile.unmanaged-devices (above 2) is also a configuration
optin in NetworkManager.conf. However it has various downsides:
- it cannot be overwritten at runtime (see commit
c210134bd5).
- you can only explicitly mark a device as unmanaged. That means,
you cannot use it to manage a device which is unmanaged due to
a udev rule.
- the name "keyfile.*" sounds like it's only relevant for the keyfile settings
plugin. Nowadays the keyfile plugin is always loaded, so the option applies
to NetworkManager in general.
https://github.com/NetworkManager/NetworkManager/pull/29
- cleanup data type and use guint32 consistently. We might want to
introduce a new "infinity" value. But since libnm's
NM_SETTING_IP_CONFIG_DHCP_TIMEOUT asserts against the range
0 - G_MAXINT32, we cannot express it as -1 anyway. So, infinity
will have the numerical value G_MAXINT32, hence guint32 is just
fine.
- make use of existing ipv6.dhcp-timeout setting and add global
default configuration in NetworkManager.conf
- instead of having subclasses call nm_device_set_dhcp_timeout(),
add a virtual function get_dhcp_timeout().
Remove NMDefaultRouteManager. Instead, add the default-route to the
NMIP4Config/NMIP6Config instance.
This basically reverts commit e8824f6a52.
We added NMDefaultRouteManager because we used the corresponding to `ip
route replace` when configuring routes. That would replace default-routes
on other interfaces so we needed a central manager to coordinate routes.
Now, we use the corresponding of `ip route append` to configure routes,
and each interface can configure routes indepdentently.
In NMDevice, when creating the default-route, ignore @auto_method for
external devices. We shall not touch these devices.
Especially the code in NMPolicy regarding selection of the best-device
seems wrong. It probably needs further adjustments in the future.
Especially get_best_ip_config() should be replaced, because this
distinction VPN vs. devices seems wrong to me.
Thereby, remove the @ignore_never_default argument. It was added by
commit bb75026004, I don't think it's
needed anymore.
This brings another change. Now that we track default-routes in
NMIP4Config/NMIP6Config, they are also exposed on D-Bus like regular
routes. I think that makes sense, but it is a change in behavior, as
previously such routes were not exposed there.
NMIP4Config, NMIP6Config, and NMPlatform shall share one
NMDedupMultiIndex instance.
For that, pass an NMDedupMultiIndex instance to NMPlatform and NMNetns.
NMNetns than passes it on to NMDevice, NMDhcpClient, NMIP4Config and NMIP6Config.
So currently NMNetns is the access point to the shared NMDedupMultiIndex
instance, and it gets it from it's NMPlatform instance.
The NMDedupMultiIndex instance is really a singleton, we don't want
multiple instances of it. However, for testing, instead of adding a
singleton instance, pass the instance explicitly around.
Currently, device types like Bond hack around ignore-carrier
setting, as they always want to ignore-carrier.
Prepare so that also for such master types, we rely and honor the
ignore-carrier setting better. In the next commit, bond, bridge and
team devices they will get ignore-carrier turned on by default.
For externally managed interfaces, we create an in-memory connection
and keep the device with sys-iface-state=external.
When the user actively modifies the connection, we persist it to
storage. But we also must take over managing the device.
One problem is that nm_device_reapply() errors out if the device
is still activating. It's unclear how to reapply the connection
while the device is in the process of activation. So, if the user
modifies the created connection very quickly, reapplying the settings
will fail.
https://bugzilla.redhat.com/show_bug.cgi?id=1462223
Don't log in a function that basically just inspects state, without
mutating it. Instead, pass the reason why a connection could not be
generated to the caller so that we have one sensible log message.
Originally 850c977 "device: track system interface state in NMDevice",
intended that a connection can only be assumed initially when seeing
a device for the first time. Assuming a connection later was to be
prevented by setting device's sys-iface-state to MANAGED.
That changed too much in behavior, because we used to assume external
connections also when they are activated later on. So this was attempted
to get fixed by
- acf1067 nm-manager: try assuming connections on managed devices
- b6b7d90 manager: avoid generating in memory connections during startup for managed devices
It's probably just wrong to prevent assuming connections based on the
sys-iface-state. So drop the check for sys-iface-state from
recheck_assume_connection(). Now, we can assume anytime on managed,
disconnected interfaces, like previously.
Btw, note that priv->startup is totally wrong to check there, because
priv->startup has the sole purpose of tracking startup-complete property.
Startup, as far as NMManager is concerned, is platform_query_devices().
However, the problem is that we only assume connections (contrary to
doing external activation) when we have a connection-uuid from the state
file or with guess-assume during startup.
When assuming a master device, it can fail with
(nm-bond): ignoring generated connection (IPv6LL-only and not in master-slave relationship)
thus, for internal reason the device cannot be assumed yet.
Fix that by attatching the assume-state to the device, so that on multiple
recheck_assume_connection() calls we still try to assume. Whenever we try
to assume the connection and it fails due to external reasons (like, the connection
no longer matching), we clear the assume state, so that we only try as
long as there are internal reasons why assuming fails.
https://bugzilla.redhat.com/show_bug.cgi?id=1452062
The state file should only be read initially when NM starts, that is:
during NMManager's platform_query_devices().
At all later points, for example when a software device gets destroyed
and re-realized, the state file is clearly no longer relevant.
Hence, pass the set-nm-owned flag from NMManager to realize_start_setup().
This is very much the same as with the NM_UNMANAGED_FLAG_USER_EXPLICT flag,
which we also read from the state-file.
The plugins may use stuff from core, but not the other way around.
Including "bluetooth/nm-bluez-common.h" is wrong.
The UUID argument is always "nap" in the NAP case. We don't need
the flexibility that it might be anything else. Just drop it.
As far as NMDevice is concerned, it anyway wouldn't (or shouldn't
know what the uuid is. It says register, and NMBluez5Manager should
figure out the details.
We'll use this to let the devices know they can retry autoactivation
because some component became available without actually having any
data that would be useful for that device.
Adjust the comment.
Don't give the subclass the ability to override the parents
behavior. The parent implementation is not intended to allow
for that. Instead, restrict the flexibility of how the virtual
function integrates with the larger picture. That means, the
virtual function is only called at one place, and there is only
one implementation in NMDeviceEthernet (and it doesn't really
matter whether the implementation chains up the parent implementation
or not).
Before, setting a device to unmanaged causes it to go down and clear
the interface state.
It may be useful to instruct NetworkManager not to touch the device
anymore but leave the current state up. Changing behavior for
nmcli device set "$DEV" managed no
To get the previous behavior, one has to first disconnect the interface
via
nmcli device disconnect "$DEV"
nmcli device set "$DEV" managed no
Note that non-permanent addresses like from DHCP will eventually time
out because NetworkManager stops the DHCP client. When instructing
NetworkManager to let go of the device, you have to take it over in
any way you see fit.
https://bugzilla.redhat.com/show_bug.cgi?id=1371433
(cherry picked from commit 9e8218f99a)
This also ensures that we own a reference to the
NMPlatform, NMRouteManager and NMDefaultRouteManager
instances. See bug rh#1440089 where we might access
the singleton getter after destroing the singleton
instance of NMRouteManager. This is prevented by
keeping a reference to those instances -- indirectly
via the netns instance.
Later, we may add support for multiple namespaces. Then it might
make sense to swap the NMNetns instance of a device when moving
the device between namespaces.
Also, drop the use of singelton instances.
https://bugzilla.redhat.com/show_bug.cgi?id=1440089
(cherry picked from commit c48a19b7c6)