Kernel does not all allow to configure a route via a gateway, if the
gateway is not directly reachable.
For non-manually added routes (e.g. from DHCP), we ignore them as a
server configuration errors. For manually added routes, we try to work
around them.
Note that if the user adds a manual route that references a gateway,
maybe he should be required to also add a matching onlink route for
the gateway (or an address that results in a device-route), otherwise
the configuration could be considered invalid. That was however not
done historically, and also, it seems a rather unhelpful behavior.
NetworkManage should just make it work, not not assume anything is
wrong with the configuration. Similarly, for IPv4, the user could
configure the route as onlink, however, that still requires extra
configuration of which the user might not be aware.
This would apply for example, when a connection has method=auto,
and would obtain the routes automatically. It seems sensible to
allow the user to add a route via the gateway, if he ~knows~ that
this particular network will provide such a configuration via DHCP.
In the past however, we tried not to automatically add a device route,
but instead see whether we will get a suitable route via DHCP. If we
wouldn't get such a route, we would however fail the connection.
However, this is really very hard to get right.
We call ip_config_merge_and_apply() possibly before receiving automatic
IP configuration (commit 7070d17ced, "device: reset
@con_ip6_config on failure before RA"). In this case, we could not yet
configure the route. Instead, we also cannot fail (yet), because we should
wait whether we will receive a route that makes this configuration
feasable.
That is hard to get right. How long should we wait? If we get a DHCP lease
and still cannot add the route, should we fail the IP configuration or wait
longer for another lease? Worse, if we decide to fail the IP configuration,
it might not fail the entire activation. Instead, we would only mark the
current address family as failed. If we later get a DHCP lease, should we
retry to add the route again? -- probably yes. If we still fail, we would
need to keep the IP configuration in failed state, regardless that DHCP
succeeded. Part of the problem is, that we are bad at tracking the
failed state per IP method. So, if manual configuration fails but DHCP
succeeds, we get the state wrong. That should be fixed separately, but it
just shows how hard it is to have this route that we currently cannot
add, and wanting to wait for something that might never come, but still
fail at some point.
Instead, if we cannot add a route due to a missing onlink gateway,
just retry and add the /32 or /128 direct route ourself.
Note that for IPv6 routes that have a "src" address which is still
TENTATIVE, we also cannot currently add the route and retry later.
However, that is fundamentally different, because:
- the configuration here is correct, it's only that the address
didn't yet pass IPv6 DAD and kernel is being unhelpful (rh#1457196).
- we only have to wait a few seconds for DAD to complete or fail.
So, it's easy to implement this sensibly.
The device must not directly add addresses or routes. Instead,
it must track the addresses/routes it wants to add in the NMIP6Config.
Otherwise, during reapply, the information is lost and the next
sync will remove them.
Fixes-test: @ipv6_preserve_cached_routes
We already have IP4LL and maybe should re-use that also for IPv6.
However, when adding the prefix route for IPv6 link local addresses,
we want to add it with protocol "kernel", unlike "user" for IPv4.
There is no strong reason for this. I don't think the protocol matters
much. But up to now kernel automatically adds this prefix route, so
as we are going to change that and let NetworkManager add it, keep the
protocol at "kernel".
- indent with spaces for wrapped line
- drop g_return_val_if_fail(). It really shouldn't happen,
and if it does, we can just continue and will hit g_critical()
warnings below, when using the variables.
- get the NMActRequest instance first, and from there the applied
and settings connection. Not the other way around.
nm_device_get_settings_connection() and nm_device_get_applied_connection()
are just convenience functions to get the fields from
act_request.
This allows to adjust the timeout of an existing checkpoint.
The main usecase of checkpoints, is to have a fail-safe when
configuring the network remotely. By allowing to reset the timeout,
the user can perform a series of actions, and keep bumping the
timeout. That way, the entire series is still guarded by the same
checkpoint, but the user can start with short timeout, and
re-adjust the timeout as he goes along.
The libnm API only implements the async form (at least for now).
Sync methods are fundamentally wrong with D-Bus, and it's probably
not needed. Also, follow glib convenction, where the async form
doesn't have the _async name suffix. Also, accept a D-Bus path
as argument, not a NMCheckpoint instance. The libnm API should
not be more restricted than the underlying D-Bus API. It would
be cumbersome to require the user to lookup the NMCheckpoint
instance first, especially since libnm doesn't provide an efficient
or convenient lookup-by-path method. On the other hand, retrieving
the path from a NMCheckpoint instance is always possible.
Introduce a new flag NM_CHECKPOINT_CREATE_FLAG_ALLOW_OVERLAPPING
that allows the creation of overlapping checkpoints. Before, and
by default, checkpoints that reference a same device conflict,
and creating such a checkpoint failed.
Now, allow this. But during rollback automatically destroy all
overlapping checkpoints that were created after the checkpoint
that is about to rollback.
With this, you can create a series of checkpoints, and rollback them
individually. With the restriction, that if you once rolled back to an
older checkpoint, you no longer can roll"forward" to a younger one.
What this implies and what is new here, is that the checkpoint might be
automatically destroyed by NetworkManager before the timeout expires. When
the user later would try to manually destroy/rollback such a checkpoint, it
would fail because the checkpoint no longer exists.
We already do error checking in nm_checkpoint_manager_create(). No need
to split it in two places. Let all error conditions be handled by
nm_checkpoint_manager_create() first, and then once we decide all is
good, nm_checkpoint_new() can no longer fail.
Instead of scheduling one timeout only, let each checkpoint instance
individually schedule a timeout. This has some overhead, but glib
is supposed to make scheduling many timers efficient. Otherwise,
glib should be fixed.
This simplifies in my opinion the code, because it's up to each
checkpoint to maintain its own timeout.
Later we will also add a AdjustRollbackTimeout operation, which
allow to reschedule the timeout. It also seems slightly simpler,
if scheduling of the timeout is done by the NMCheckpoint instance
itself.
If no device paths are given, we can take the devices directly.
We don't need to first create a list of paths, and then
look them up by path again to add them to the list.
NMCheckpointManager was added and is not ref-countable, because it
is not needed.
I still often like for such objects (that are not ref-countable),
that their destroy function is called "unref". Both for consistency,
and also if we would later add ref-counting to the object.
However, NMCheckpointManager keeps a pointer to NMManager. So, when
NMManager gets destroyed, it *MUST* destroy the NMCheckpointManager.
It cannot accept that the checkpoint manager outlives NMManager,
but the "unref" name suggests that somebody else might have still
a reference to this object keeping it alive. That is not the case.
Rename so that this is clear.
I would name it nm_checkpoint_manager_destroy(), but "destroy" already
has a meaning for NMCheckpoint instances, so use "free".
We don't need an external CheckpointItem, just to wrap the
CList instance. Embed it directly in NMCheckpoint.
Sure, that exposes the checkpoints_lst field in the (internal)
header file, hiding the private member less.
Similar cases of such a field are named "manager". Also,
internal names shall not have an "nm" prefix, contrary
to names in a header file, which shall have such a prefix.
I find it slightly nicer and explict. Also, the list elements
are strictly speaking private, we should better not explicitly
use them outside of NMManager/NMDevice. The macro hides this.
Instead of generating a connection and checking whether it is
compatible with the connection indicated in the state file, just pick
the indicated connection after a basic check of compatibility with the
device.
https://bugzilla.redhat.com/show_bug.cgi?id=1551958
- consistently set options, searches, domains fields to %NULL,
if there are no values.
- in nm_global_dns_config_update_checksum(), ensure that we uniquely
hash values. E.g. a config with "searches[a], options=[b]" should
hash differently from "searches=[ab], options=[]".
- in nm_global_dns_config_to_dbus(), reuse the sorted domain list.
We already have it, and it guarantees a consistent ordering of
fields.
- in global_dns_domain_from_dbus(), fix memleaks if D-Bus strdict
contains duplicate entries.
- no longer track APs in a hash table with their exported path
as key. The exported path is already tracked by NMDBusManager's
lookup index, so we can reuse that for fast lookup by path. Otherwise,
track the APs in a CList per device.
- as we now track APs in a CList, their order is well defined.
We no longer need to sort APs and obsoletes nm_wifi_aps_get_sorted()
and simplifies nm_wifi_aps_find_first_compatible().
We already track an index of exported objects in NMDBusManager.
Actually, that index was unused previously. We either could drop
it, or use it. Let's use it.
For comparing MAC addresses, they anyway have to be normalized
to binary. Convert it once outside the loop and pass the binary
form to nm_utils_hwaddr_matches(). Otherwise, we need to re-convert
it every time.
Instead of using a GSList for tracking the devices, use a CList.
I think a CList is in most cases the more suitable data structure
then GSList:
- you can find out in O(1) whether the object is linked. That
is nice, for example to assert in NMDevice's destructor that
the object was unlinked, and we will use that later in
nm_manager_get_device_by_path().
- you can unlink the element in O(1) and you can unlink the
element without having access to the link's head
- Contrary to GSList, this does not require an extra slice
allocation for the link node. It quite possibliy consumes
slightly less memory because the CList structure is embedded
in a struct that we already allocate. Even if slice allocation
would be perfect to only consume 2*sizeof(gpointer) for the link
note, it would at most be as-good as CList. Quite possibly,
there is an overhead though.
- CList possibly has better memory locality, because the link
structure and the data are close to each other.
Something which could be seen as disavantage, is that with CList
one device can only be tracked in one NMManager instance at a time.
But that is fine. There exists only one NMManager instance for now,
and even if we would ever introduce multiple managers, we probably
would not associate one NMDevice instance with multiple managers.
The advantages are arguably not huge, but CList is IMHO clearly the
more suited data structure. No need to stick to a suboptimal data
structure for the job. Refactor it.
GVariant is immutable and can nicely be shared and cached.
Cache the property variants. This makes getting the properties
faster, at the expense of using some extra memory.
Tested with https://tratt.net/laurie/src/multitime/
$ multitime -n 200 -s 0 bash -c 'echo -n .; exec busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects &>/dev/null'
# Mean Std.Dev. Min Median Max
# real(before) 0.013+/-0.0000 0.001 0.012 0.013 0.019
# real(after) 0.013+/-0.0000 0.002 0.011 0.012 0.034
$ multitime -n 100 -s 0 bash -c 'for i in {1..5}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects &>/dev/null & done; wait; echo -n .'
# Mean Std.Dev. Min Median Max
# real(before) 0.040+/-0.0000 0.002 0.037 0.040 0.049
# real(after) 0.037+/-0.0000 0.002 0.034 0.036 0.045
$ multitime -n 30 -s 0 bash -c 'for i in {1..100}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects &>/dev/null & done; wait; echo -n .'
# Mean Std.Dev. Min Median Max
# real(before) 0.704+/-0.0000 0.016 0.687 0.701 0.766
# real(after) 0.639+/-0.0000 0.013 0.622 0.637 0.687
$ multitime -n 200 -s 0 bash -c 'echo -n .; exec nmcli &>/dev/null'
# Mean Std.Dev. Min Median Max
# real(before) 0.092+/-0.0000 0.005 0.081 0.092 0.119
# real(after) 0.092+/-0.0000 0.005 0.080 0.091 0.123
$ multitime -n 100 -s 0 bash -c 'for i in {1..5}; do nmcli &>/dev/null & done; wait; echo -n .'
# Mean Std.Dev. Min Median Max
# real(before) 0.436+/-0.0000 0.043 0.375 0.424 0.600
# real(after) 0.413+/-0.0000 0.022 0.380 0.410 0.558
$ multitime -n 20 -s 0 bash -c 'for i in {1..100}; do nmcli &>/dev/null & done; wait; echo -n .'
# Mean Std.Dev. Min Median Max
# real(before) 8.796+/-0.1070 0.291 8.073 8.818 9.247
# real(after) 8.736+/-0.0893 0.243 8.017 8.780 9.101
The time savings are small, but that is because caching mostly speeds up
the GetManagedObjects calls, and that is only a small part of the entire
nmcli call from client side.
Move handling non-NM_IP_CONFIG_SOURCE_USER routes first. These are
routes that were added manually by the user in the connection.
Note that there is no change in behavior, because of how
_err_inval_due_to_ipv6_tentative_pref_src() would only accept
user routes already.
Since commit ed640f857a ("manager: ignore unmanaged devices when
looking for parent by UUID"), unmanaged devices are ignored when
looking for potential parent connection matches. Therefore, a software
device can fail autoactivation because the parent is not managed yet
and NM never tries to reactivate it. Ensure that we retry other
devices when a parent device becomes managed.
Fixes: ed640f857ahttps://bugzilla.redhat.com/show_bug.cgi?id=1553595