NMManager tries to assign unique route-metrics in an increasing manner
so that the device which activates first keeps to have the best routes.
This information is also persisted in the device's state file, however
we not only need to persist the effective route-metric which was
eventually chosen by NMManager, but also the aspired metric.
The reason is that when a metric is chosen for a device, the entire
range between aspired and effective route-metric is reserved for that
device. We must remember the entire range so that after restart the
entire range is still considered to be in use.
Fixes: 6a32c64d8f
First check that the limit of 50 metric points is not surpassed.
Otherwise, if you have an ethernet device (aspired 100, effective
130) and a MACSec devic (aspired 125, effective 155), activating a
new ethernet device would bump it's metric to 155 -- more then
the 50 points limit.
It doesn't matter too much, because the cases where the limit of
50 could have been surpassed were very specific. Still, change
it to ensure that the limit is always honored as one would expect.
Fixes: 6a32c64d8f
In the past we had NMDefaultRouteManager which would coordinate adding
the default-route with identical metrics. That especially happened, when
activating two devices of the same type, without explicitly specifying
ipv4.route-metric. For example, with ethernet devices, the routes on
both interfaces would get a metric of 100.
Coordinating routes was especially necessary, because we added
routes with NLM_F_EXCL flag, akin to `ip route replace`. We not
only had to avoid that activating two devices in NetworkManager would
result in a fight over the default-route, but more importently
to preserve externally added default-routes on unmanaged interfaces.
NMDefaultRouteManager would ensure that in case of duplicate
metrics, that the device that activated first would keep the
best default-route. It would do so by bumping the metric
of the second device to find a unused metric. The bumping itself
was not very important -- MDefaultRouteManager could also just not
configure any default-routes that show up as second, the result
would be quite similar. More important was to keep the best
default-route on the first activating device until the device
deactivates or a device activates that really has a better
default-route..
Likewise, NMRouteManager would globally manage non-default-routes.
It would not do any bumping of metrics, but it would also ensure that the routes
of the device that activates first are not overwritten by a device activating
later.
However, the `ip route replace` approach has downsides, especially
that it messes with routes on other interfaces, interfaces that are
possibly not managed by NetworkManager. Another downside is, that
binding a socket to an interface might not result in correct
routes, because the route might just not be there (in case of
NMRouteManager, which wouldn't configure duplicate routes by bumping
their metric).
Since commit 77ec302714 we would no longer
use NLM_F_EXCL, but add routes akin to `ip route append`. When
activating for example two ethernet devices with no explict route
metric configuration, there are two routes like
default via 10.16.122.254 dev eth0 proto dhcp metric 100
default via 192.168.100.1 dev eth1 proto dhcp metric 100
This does not only affect default routes. In case of a multi-homing
setup you'd get
192.168.100.0/24 dev eth0 proto kernel scope link src 192.168.100.1 metric 100
192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.1 metric 100
but it's visible the most for default-routes.
Note that we would append the routes that are activated later, as the order
of `ip route show` confirms. One might hence expect, that kernel selects
a route based on the order in the routing tables. However, that isn't
the case, and activating the second interface will non-deterministically
re-route traffic via the new interface. That will interfere badly with
with NAT, stateful firewalls, and existing connections (like TCP).
The solution is to have NMManager keep a global index of the default route-metrics
currently in use. So, instead of determining the default-route metric based solely
on the device-type, we now in addition generate default metrics that do not
overlap. For example, if you activate eth0 first, it gets route-metric 100,
and if you then activate eth1, it gets 101. Note that if you deactivate
and re-activate eth0, then it will get route-metric 102, because the
best route should stick on eth1 (which reserves the range 100 to 101).
Note that when a connection explititly selects a particular metric, then that
choice is honored (contrary to NMDefaultRouteManager which was more concerned
with avoiding conflicts, then keeping the exact metric).
https://bugzilla.redhat.com/show_bug.cgi?id=1505893
NMManager will need to know the state of all device at once.
Hence, load it once and cache it in NMConfig.
Note that this wastes a bit of memory in the order of
O(number-of-interfaces). But each device state entry is
rather small, and we always consume memory in the order
of O(number-of-interfaces).
Previously, we would only set a connection as volatile before
adding it to manager. As we never would set it volatile last on,
there was no need to handle deletion.
Now support that. Watch the volatile flag, and if the connection
has currently not active connection that keeps it alive, delete
it in an idle handler.
The accessor functions just look whether a certain flag is set. As these
functions have a different name then the flags, this is more confusing
then helpful. For example, if you want to know where the NM_GENERATED
flag matters, you had to know to grep for nm_settings_connection_get_nm_generated()
in addition to NM_SETTINGS_CONNECTION_FLAGS_NM_GENERATED.
The accessor function hid that the property was implemented as
a connection flag. For example, it was not immediately obvious
that nm_settings_connection_get_nm_generated() is the same
as having the NM_SETTINGS_CONNECTION_FLAGS_NM_GENERATED flag
set.
Drop them.
The matching works fuzzy and is not reliable. That is why we store
which connection should be assumed after restart in the state file
of NetworkManager.
In that case, we don't need to do a full check (with the possibility
of a false-reject). Just check for the minimum required properties:
the type and slave-type.
Yes, if the user modifies the connection while restarting NM, then
we might wrongly assume a connection that no longer would match.
But NM should not read minds, it should do as indicated.
Using CList, we embed the list element in NMActiveConnection struct
itself. That means for example, that you couldn't track a
NMActiveConnection more then once. But we anyway never want that.
The advantage is, that removing an active connection from the list
is O(1), and we safe additional GSlice allocations for each node
element.
We also do this for libnm, where it causes visible changes
in behavior. But if somebody would rely on the hashing implementation
for hash tables, it would be seriously flawed.
GHashTable optimizes a NULL equality function to use direct pointer
comparison. That saves the overhead of calling g_direct_equal().
This is also documented behavior for g_hash_table_new().
While at it, also don't pass g_direct_hash() but use the default
of %NULL. The behavior is the same, but consistently don't use
g_direct_hash().
Don't include unrealized devices in checkpoint because, as the name
says, they are not real.
While at it, remove nm_manager_get_device_paths() as it is no longer
used.
- split NM_DEVICE_AUTOCONNECT_BLOCKED_INTERN in two parts:
"wrong-pin" and "manual-disconnect". Setting/unsetting them
should be tracked differently, as their reason differs.
- no longer initialize/clear the autoconnect-blocked reasons
during realize/unrealize of the device. Instead, initialize
it once when the object gets created (nm_device_init()), and
keep the settings beyond unrealize/realize cycles. This only
matters for software devices, as regular devices get deleted
after unrealizing once. But for software devices it is essential,
because we don't want to forget the autoconnect settings of
the device instance.
- drop verbose logging about blocking autoconnect due to failed
pin. We already log changes to autoconnect-blocked flags with
TRACE level. An additional message about this particular issue
seems not necessary at INFO level.
- in NMManager's do_sleep_wake(), no longer block autoconnect
for devices during sleep. We already unmanage the device, which
is a far more effective measure to prevent activation. We should
not also block autoconnect.
The flags allow for more then two reasons. Currently the only reasons
for allowing or disallowing autoconnect are "user" and "intern".
It's a bit odd, that NMDeviceAutoconnectBlockedFlags has a negative
meaning. So
nm_device_set_autoconnect_intern (device, FALSE);
gets replaced by
nm_device_set_autoconnect_blocked_set (device, NM_DEVICE_AUTOCONNECT_BLOCKED_INTERN);
and so on.
However, it's chosen this way, because autoconnect shall be allowed,
unless any blocked-reason is set. That is, to check whether autoconnect
is allowed, we do
if (!nm_device_get_autoconnect_blocked (device, NM_DEVICE_AUTOCONNECT_BLOCKED_ALL))
The alternative check would be
if (nm_device_get_autoconnect_allowed (device, NM_DEVICE_AUTOCONNECT_ALLOWED_ALL) == NM_DEVICE_AUTOCONNECT_ALLOWED_ALL)
which seems odd too.
So, add the inverse flags to block autoconnect.
Beside refactoring and inverting the meaning of the autoconnect
settings, there is no change in behavior.
It has almost no callers, and it is a bit of a strange API. Let's
not cache the last accessed value inside NMConfigData. Instead, free
it right after use. It was not reused anyway, it only hangs around
as convenience for the caller.
For some software devices, the platform link appears only after they've been
realized. Update their properties and let them know that the link has changed
so they can eventually proceed with activation.
Also, reset the properties (udi, iface, driver) that are set from the platform
link when the link goes away. At that point they don't reflect reality anymore.
Removes some code duplication too.
During write, it can regularly happen that the connection gets modified.
For example, keyfile never writes blobs as-is, it always writes the
blob to an external file, and replaces the certificate property with
a path.
Other reasons could be just bugs, where the reader and writer are not doing
a proper round trip (these cases should be fixed).
Refactor commit_changes(), to return the re-read connection to
the settings-connection class, and handle replacing the settings
there.
Also, prepare for another change. Sometimes we first call replace_settings()
followed by commit_changes(). It would be better to instead call commit_changes()
first, and only on success proceed with replace_settings(). Hence, commit_changes()
gets a new argument new_connection, that can be used to write another
connection to disk.
gcc doesn't consider variables with cleanup attribute as unused.
clang does, and warns about them.
In one case, clang is right, in the other one the warning is bogus.
Fix both.
Sometimes, when we have a platform object, we need to keep it
alive, because any subsequent platform operation might invalidate
the object.
Previously, we achieved that by copying the NMPlatformLink data.
For a while now, all platform object are immuable and recounted.
We should not copy the instance to a NMPlatformLink object, because
then the instance is no longer a full NMPObject. Instead, just take an
additional reference. Since the object must be immutable, that is
just as safe. But now callees down the stack get a proper NMPObject
instance, and might reference it too.
We call _platform_link_cb_idle() on idle, so we must take care of the lifetime
of NMManager.
We don't want to take a reference, so that the manager is not kept alive
by platform events.
Refactor the previous implementation with weak pointers to use a linked list
instead. Let's not have any pending idle actions after the manager instance
is destroyed. Instead, properly track and cancel the events.
We should reduce uses of singletons in general. Instead, the platform
instance should be passed around and kept for as long as it's needed.
Especially, as we subscribe platform_link_cb() signal. Currently, we
never unsubscribe it (wrongly). Subscribing signals is a strong
indication that the target object should keep the source object alive
until the signal is unsubscribed.
At startup the manager tries to create virtual devices without a
specific order and spits warnings when a device can't be realized
because the parent device is not yet created. These failures are not
something the user should worry about because the creation will be
retried when the parent appears.
A better approach is to return an error code from the device's
create_and_realize() telling that it failed because the parent doesn't
exist. In this way, the manager knows that the device isn't ready and
can avoid printing warning messages.
After a device is created in system_create_virtual_device(), the
manager tries to activate connections that depend on the device
even if the device isn't realized, as in the following log:
# team0 gets created
<info> manager: (team0): new Team device (/org/freedesktop/NetworkManager/Devices/7)
# team0.23 gets created
<debug> device[0x28079b0] (team0.23): constructed (NMDeviceVlan)
<debug> manager: (team0-vlan23) create virtual device team0.23
<debug> device[0x28079b0] (team0.23): unmanaged: flags set to [platform-init,!sleeping=0x10/0x11/unmanaged/unrealized], set-managed [sleeping
<info> manager: (team0.23): new VLAN device (/org/freedesktop/NetworkManager/Devices/8)
# the manager tries to realize team0.23
<debug> device[0x28079b0] (team0.23): create (is nm-owned)
<warn> manager: (team0-vlan23) couldn't create the device: cannot retrieve ifindex of interface team0 (Team): skip VLAN creation for now
<debug> manager: (team0.23): removing device (allow_unmanage 1, managed 0)
<debug> device[0x28079b0] (team0.23): ip4-config: update (commit=0, new-config=(nil))
<debug> device[0x28079b0] (team0.23): ip6-config: update (commit=0, new-config=(nil))
<debug> device[0x28079b0] (team0.23): disposing
<debug> device[0x28079b0] (team0.23): ip4-config: update (commit=1, new-config=(nil))
<debug> device[0x28079b0] (team0.23): ip6-config: update (commit=1, new-config=(nil))
<debug> device[0x28079b0] (team0.23): finalize(): NMDeviceVlan
# the manager realizes team0
<debug> device[0x2800870] (team0): create (is nm-owned)
<debug> platform: link: add link 'team0' of type 'team' (196610)
Change the order of operations and try the child connection only after
the parent has been realized.
It seems the assert there is too strict. I don't really understand why
it fails, but I also don't see why the assert is supposed to hold.
Just return in case the device is unmanagable at this point.
The activation shall fail later.
Traceback from a test build of commit a7aca2ab08:
#0 0x00007fdb28ffb643 in g_logv (log_domain=0x7fdb2b584cc9 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fff10630200) at gmessages.c:1086
#1 0x00007fdb28ffb7bf in g_log (log_domain=log_domain@entry=0x7fdb2b584cc9 "NetworkManager", log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, format=format@entry=0x7fdb29069190 "%s: assertion '%s' failed") at gmessages.c:1119
#2 0x00007fdb28ffb7f9 in g_return_if_fail_warning (log_domain=log_domain@entry=0x7fdb2b584cc9 "NetworkManager", pretty_function=pretty_function@entry=0x7fdb2b54fee0 <__func__.38922> "unmanaged_to_disconnected", expression=expression@entry=0x7fdb2b54d450 "nm_device_get_managed (device, FALSE)") at gmessages.c:1128
#3 0x00007fdb2b36e05b in unmanaged_to_disconnected (device=device@entry=0x7fdb2d2384f0 [NMDeviceVlan]) at src/nm-manager.c:3201
#4 0x00007fdb2b37eb3a in _internal_activate_generic (error=0x7fff106303d0, active=0x7fdb2d1d4550 [NMActRequest], self=0x0) at src/nm-manager.c:3430
#5 0x00007fdb2b37eb3a in _internal_activate_generic (self=self@entry=0x7fdb2d02b090 [NMManager], active=active@entry=0x7fdb2d1d4550 [NMActRequest], error=error@entry=0x7fff10630450) at src/nm-manager.c:3458
#6 0x00007fdb2b37fe90 in _activation_auth_done (active=0x7fdb2d1d4550 [NMActRequest], success=1, error_desc=0x0, user_data1=0x7fdb2d02b090, user_data2=0x7fdb0800bec0) at src/nm-manager.c:3866
#7 0x00007fdb2b4cc9d7 in auth_done (chain=0x7fdb2d17de30, error=0x0, unused=<optimized out>, user_data=<optimized out>) at src/nm-active-connection.c:929
#8 0x00007fdb2b4d6884 in auth_chain_finish (user_data=0x7fdb2d17de30) at src/nm-auth-utils.c:92
#9 0x00007fdb28ff4d7a in g_main_context_dispatch (context=0x7fdb2cff2e00) at gmain.c:3152
#10 0x00007fdb28ff4d7a in g_main_context_dispatch (context=context@entry=0x7fdb2cff2e00) at gmain.c:3767
#11 0x00007fdb28ff50b8 in g_main_context_iterate (context=0x7fdb2cff2e00, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3838
#12 0x00007fdb28ff538a in g_main_loop_run (loop=0x7fdb2cff2ec0) at gmain.c:4032
#13 0x00007fdb2b349ed7 in main (argc=1, argv=0x7fff106307a8) at src/main.c:438
https://bugzilla.redhat.com/show_bug.cgi?id=1478911
Add code to NMPppDevice to activate new-style PPPoE connections. This
is a bit tricky because we can't create the link as usual in
create_and_realize(). Instead, we create a device without ifindex and
start pppd in stage2; when pppd reports a new configuration, we rename
the platform link to the correct name and set the ifindex into the
device.
This mechanism is inherently racy, but there is no way to tell pppd to
create an arbitrary interface name.