if device *is* a NM_DEVICE_IWD, then make sure to not pass that to
_nm_device_wifi_request_scan (which asserts on anything else than a
NM_DEVICE_WIFI device).
The crash can be triggered by enabling wifi.backend=iwd and clicking
on the 'select network' item in gnome shell for example. The journal
output looks like this:
NetworkManager[1861]: invalid cast from 'NMDeviceIwd' to 'NMDeviceWifi'
NetworkManager[1861]: **
NetworkManager[1861]: NetworkManager:ERROR:src/devices/wifi/nm-device-wifi.c:1127:_nm_device_wifi_request_scan: assertion failed: ((((__extension__ ({ GTypeInstance *__inst = (GTypeInstance*) ((_obj)); GType __t = ((nm_device_wifi_get_type ())); gboolean __r; if (!__inst) __r = (0); else if (__inst->g_class && __inst->g_class->g_type == __t) __r = (!(0)); else __r = g_type_check_instance_is_a (__inst, __t); __r; })))))
systemd[1]: NetworkManager.service: Main process exited, code=dumped, status=6/ABRT
systemd[1]: NetworkManager.service: Failed with result 'core-dump'.
Fixes: 297d4985abhttps://github.com/NetworkManager/NetworkManager/pull/107
Without this, nm_device_get_type_description() would quite likely
return "ethernet" for NMDeviceVeth types. This is wrong and was
broken recently.
Fixes: 0775602574
It can easily happen that connectivity checks take a long time to
complete (up to 20 seconds, when they time out).
So, before, during the first 20 seconds no connectivity checks would
return and bump the periodic interval. That meant, for the first 20
seconds we would each second schedule a periodic check.
Then, the checks start timing out, each one second apart as we scheduled
them. Previously, during each completion of the checks, we would bump
the interval every second.
Fix that two ways:
1) when the timer expires, also check whether there are still uncomplete
periodic checks. If there are, already bump the interval at that point.
2) at the same time, when this happens mark the handle so that when
they later complete, that they no longer cause another increase of the
interval (no-bump).
Now the bumping is done either by the timeout, or by the completion of
the request. Whatever happens first.
In concheck_periodic_timeout_cb(), we are not sure that we were
scheduled with the current interval. Instead, the timer might
just cover a part of the interval, for example while resetting
the timer interval.
We must always reschedule the timer.
A larger issue is that concheck_periodic_schedule_do() requires an
interval in nanoseconds scale. We passed the wrong timeout there.
A smaller issue is, when we reset the max_interval to something
shorter, *and* the previously schedule timeout is pending for a shorter
time than the new new max-interval, we only need to re-adjust the
timeout, but keep cur_basetime unchanged.
There can be other reasons why the check was cancelled, not only because
the current item was obsoleted. For example, the caller who scheduled a
check externally, might have cancelled it or NMDevice might be
disposed().
Load the thunderbolt-net module if we see a host-to-host connection
and configure the resulting ethernet connection automatically to be
a link-local only one. The latter is done by setting a new udev
property "NM_AUTO_DEFAULT_LINK_LOCAL_ONLY" which is picked up when
we configure the connection for the device.
https://github.com/NetworkManager/NetworkManager/pull/97
Print a warning whenever we find a IP conflict on the network. In the
future we may export a flag on the device or send a signal so that
clients can notify the user of the conflict.
Don't return an error from nm_arping_manager_start_probe() since it is
currently useless and the arping-manager already prints the failure
reason. Also, drop a log print from add_address().
NMArpingManager previously spawned an arping process for each
probed/announced address and watched it. This has the disadvantage of
being inefficient and also that for small timeouts we can't be sure
that arping actually started the probe.
Switch to an implementation that doesn't need to spawn external
processes, by using the n-acd code [1] currently imported in our
source tree. The long term plan is that n-acd will become a shared
library we can link against.
The file is still called nm-arping-manager for lazyness, even if a
better name would be nm-acd-manager.
[1] https://github.com/nettools/n-acd/https://bugzilla.redhat.com/show_bug.cgi?id=1507864
"NMSettingsConnectionFlags" was an internal enum. Soon, we will add such
a type in libnm. Avoid the naming conflict by renaming. The "Int" stands
for "internal".
The GObject property "path" exists for the sole reasons so that
other components can connect to the "notify::path" signal.
However, notifications are blocked by g_object_freeze_notify(),
and especially for NMDBusObject we want to make use of that to
combine multiple PropertiesChanged events into one.
This blocking of the signal is not desired for the case where
we wait for "notify::path". Convert that to a signal, which
will not be blocked.
Essentially, nm_connection_get_path() mirros nm_dbus_object_get_path().
However, when cloning a simple-connection, the path also gets cloned.
I think this field doesn't belong to NMConnection in the first place,
because NMConnection is not a D-Bus object. NMSettingsConnection (in
core) and NMRemoteConnection (in libnm) is.
Don't use the misleading alias, but use nm_dbus_object_get_path()
directly.
Currently we overwrite the interface rp_filter value with 2 ("loose")
only when it is 1 ("strict") because when it is 0 ("no validation") it
is already more permissive.
So, if the value for the interface is 0 and
net/ipv4/conf/all/rp_filter is 1 (like it happens by default on Fedora
28), we don't overwrite it; since kernel considers the maximum between
{all,$dev}/rp_filter, the effective value remains 'strict'.
We should instead combine the two {all,$dev}/rp_filter, and if it's 1
overwrite the value with 2.
https://bugzilla.redhat.com/show_bug.cgi?id=1565529
There are multiple tests with the same in different directories; add a
unique prefix to test names so that it is clear from the output which
one is running.
Otherwise, if connectivity checking was disabled, we would never
reset the connectivity state and leave it wrongly at UNKNOWN.
nm_device_check_connectivity_update_interval() is already called
during state-changes, so this is the right place. However,
it's far from perfect still, because we might not notice when
a default-route gets added or removed. Also, devices that are not
in ACTIVATED state, are considered with connectivity NONE. Which
might not be correct.
Fixes: 0a62a0e903
NMManager very much cares about changes to the connectivity state
of the device and was therefore listening to notify::connectivity
signals. However, property changed signals can be suppressed by
g_object_freeze_notify(). That is something we even encourage for
NMDBusObject instances, because the D-Bus glue makes use of the
property changed notifications, and encourages to combine multiple
changes by freezing the signal.
Using the property changed notifications of NMDBusObject instances is
ugly. Don't do that and instead add a special signal.
It might happen, that connectivitiy is lost only for a moment and
returns soon after. Based on that assumption, when we loose connectivity
we want to have a probe interval where we check for returning
connectivity more frequently.
For that, we handle tracking of the timeouts per-device.
The intervall shall start with 1 seconds, and double the interval time until
the full interval is reached. Actually, due to the implementation, it's unlikely
that we already perform the second check 1 second later. That is because commonly
the first check returns before the one second timeout is reached and bumps the
interval to 2 seconds right away.
Also, we go through extra lengths so that manual connectivity check
delay the periodic checks. By being more smart about that, we can reduce
the number of connectivity checks, but still keeping the promise to
check at least within the requested interval.
The complexity of book keeping the timeouts is remarkable. But I think
it is worth the effort and we should try hard to
- have a connectivity state as accurate as possible. Clearly,
connectivity checking means that we probing, so being more intelligent
about timeout and backoff timers can result in a better connectivity
state. The connectivity state is important because we use it for
the default-route penaly and the GUI indicates bad connectivity.
- be intelligent about avoiding redundant connectivity checks. While
we want to check often to get an accurate connectivity state, we
also want to minimize the number of HTTP requests, in case the
connectivity is established and suppossedly stable.
Also, perform connectivity checks in every state of the device.
Even if a device is disconnected, it still might have connectivity,
for example if the user externally adds an IP address on an unmanaged
device.
https://bugzilla.gnome.org/show_bug.cgi?id=792240
An asynchronous request should either be cancellable or not keep
the target object alive. Preferably both.
Otherwise, it is impossible to do a controlled shutdown when terminating
NetworkManager. Currently, when NetworkManager is about to terminate,
it just quits the mainloop and essentially leaks everything. That is a
bug. If we ever want to fix that, every asynchronous request must be
cancellable in a controlled way (or it must not prevent objects from
getting disposed, where disposing the object automatically cancels the
callback).
Rework the asynchronous request for connectivity check to
- return a handle that can be used to cancel the operation.
Cancelling is optional. The caller may choose to ignore the handle
because the asynchronous operation does not keep the target object
alive. That means, it is still possible to shutdown, by everybody
giving up their reference to the target object. In which case the
callback will be invoked during dispose() of the target object.
- also, the callback will always be invoked exactly once, and never
synchronously from within the asynchronous start call. But during
cancel(), the callback is invoked synchronously from within cancel().
Note that it's only allowed to cancel an action at most once, and
never after the callback is invoked (also not from within the callback
itself).
- also, NMConnectivity already supports a fake handler, in case
connectivity check is disabled via configuration. Hence, reuse
the same code paths also when compiling without --enable-concheck.
That means, instead of having #if WITH_CONCHECK at various callers,
move them into NMConnectivity. The downside is, that if you build
without concheck, there is a small overhead compared to before. The
upside is, we reuse the same code paths when compiling with or without
concheck.
- also, the patch synchronizes the connecitivty states. For example,
previously `nmcli networking connectivity check` would schedule
requests in parallel, and return the accumulated result of the individual
requests.
However, the global connectivity state of the manager might have have
been the same as the answer to the explicit connecitivity check,
because while the answer for the manual check is waiting for all
pending checks to complete, the global connectivity state could
already change. That is just wrong. There are not multiple global
connectivity states at the same time, there is just one. A manual
connectivity check should have the meaning of ensure that the global
state is up to date, but it still should return the global
connectivity state -- not the answers for several connectivity checks
issued in parallel.
This is related to commit b799de281b
(libnm: update property in the manager after connectivity check),
which tries to address a similar problem client side.
Similarly, each device has a connectivity state. While there might
be several connectivity checks per device pending, whenever a check
completes, it can update the per-device state (and return that device
state as result), but the immediate answer of the individual check
might not matter. This is especially the case, when a later request
returns earlier and obsoletes all earlier requests. In that case,
earlier requests return with the result of the currend devices
connectivity state.
This patch cleans up the internal API and gives a better defined behavior
to the user (thus, the simple API which simplifies implementation for the
caller). However, the implementation of getting this API right and properly
handle cancel and destruction of the target object is more complicated and
complex. But this but is not just for the sake of a nicer API. This fixes
actual issues explained above.
Also, get rid of GAsyncResult to track information about the pending request.
Instead, allocate our own handle structure, which ends up to be nicer
because it's strongly typed and has exactly the properties that are
useful to track the request. Also, it gets rid of the awkward
_finish() API by passing the relevant arguments to the callback
directly.
For completeness, extend the API to support non-persistant
device. That requires that nm_platform_link_tun_add()
returns the file descriptor.
While NetworkManager doesn't create such devices itself,
it recognizes the IFLA_TUN_PERSIST / IFF_PERSIST flag.
Since ip-tuntap (obviously) cannot create such devices,
we cannot add a test for how non-persistent devices look
in the platform cache. Well, we could instead add them
with ioctl directly, but instead, just extend the platform
API to allow for that.
Also, use the function from test-lldp.c to (optionally) use
nm_platform_link_tun_add() to create the tap device.
Kernel does not all allow to configure a route via a gateway, if the
gateway is not directly reachable.
For non-manually added routes (e.g. from DHCP), we ignore them as a
server configuration errors. For manually added routes, we try to work
around them.
Note that if the user adds a manual route that references a gateway,
maybe he should be required to also add a matching onlink route for
the gateway (or an address that results in a device-route), otherwise
the configuration could be considered invalid. That was however not
done historically, and also, it seems a rather unhelpful behavior.
NetworkManage should just make it work, not not assume anything is
wrong with the configuration. Similarly, for IPv4, the user could
configure the route as onlink, however, that still requires extra
configuration of which the user might not be aware.
This would apply for example, when a connection has method=auto,
and would obtain the routes automatically. It seems sensible to
allow the user to add a route via the gateway, if he ~knows~ that
this particular network will provide such a configuration via DHCP.
In the past however, we tried not to automatically add a device route,
but instead see whether we will get a suitable route via DHCP. If we
wouldn't get such a route, we would however fail the connection.
However, this is really very hard to get right.
We call ip_config_merge_and_apply() possibly before receiving automatic
IP configuration (commit 7070d17ced, "device: reset
@con_ip6_config on failure before RA"). In this case, we could not yet
configure the route. Instead, we also cannot fail (yet), because we should
wait whether we will receive a route that makes this configuration
feasable.
That is hard to get right. How long should we wait? If we get a DHCP lease
and still cannot add the route, should we fail the IP configuration or wait
longer for another lease? Worse, if we decide to fail the IP configuration,
it might not fail the entire activation. Instead, we would only mark the
current address family as failed. If we later get a DHCP lease, should we
retry to add the route again? -- probably yes. If we still fail, we would
need to keep the IP configuration in failed state, regardless that DHCP
succeeded. Part of the problem is, that we are bad at tracking the
failed state per IP method. So, if manual configuration fails but DHCP
succeeds, we get the state wrong. That should be fixed separately, but it
just shows how hard it is to have this route that we currently cannot
add, and wanting to wait for something that might never come, but still
fail at some point.
Instead, if we cannot add a route due to a missing onlink gateway,
just retry and add the /32 or /128 direct route ourself.
Note that for IPv6 routes that have a "src" address which is still
TENTATIVE, we also cannot currently add the route and retry later.
However, that is fundamentally different, because:
- the configuration here is correct, it's only that the address
didn't yet pass IPv6 DAD and kernel is being unhelpful (rh#1457196).
- we only have to wait a few seconds for DAD to complete or fail.
So, it's easy to implement this sensibly.
The device must not directly add addresses or routes. Instead,
it must track the addresses/routes it wants to add in the NMIP6Config.
Otherwise, during reapply, the information is lost and the next
sync will remove them.
Fixes-test: @ipv6_preserve_cached_routes
Similar cases of such a field are named "manager". Also,
internal names shall not have an "nm" prefix, contrary
to names in a header file, which shall have such a prefix.
I find it slightly nicer and explict. Also, the list elements
are strictly speaking private, we should better not explicitly
use them outside of NMManager/NMDevice. The macro hides this.
- no longer track APs in a hash table with their exported path
as key. The exported path is already tracked by NMDBusManager's
lookup index, so we can reuse that for fast lookup by path. Otherwise,
track the APs in a CList per device.
- as we now track APs in a CList, their order is well defined.
We no longer need to sort APs and obsoletes nm_wifi_aps_get_sorted()
and simplifies nm_wifi_aps_find_first_compatible().
Instead of using a GSList for tracking the devices, use a CList.
I think a CList is in most cases the more suitable data structure
then GSList:
- you can find out in O(1) whether the object is linked. That
is nice, for example to assert in NMDevice's destructor that
the object was unlinked, and we will use that later in
nm_manager_get_device_by_path().
- you can unlink the element in O(1) and you can unlink the
element without having access to the link's head
- Contrary to GSList, this does not require an extra slice
allocation for the link node. It quite possibliy consumes
slightly less memory because the CList structure is embedded
in a struct that we already allocate. Even if slice allocation
would be perfect to only consume 2*sizeof(gpointer) for the link
note, it would at most be as-good as CList. Quite possibly,
there is an overhead though.
- CList possibly has better memory locality, because the link
structure and the data are close to each other.
Something which could be seen as disavantage, is that with CList
one device can only be tracked in one NMManager instance at a time.
But that is fine. There exists only one NMManager instance for now,
and even if we would ever introduce multiple managers, we probably
would not associate one NMDevice instance with multiple managers.
The advantages are arguably not huge, but CList is IMHO clearly the
more suited data structure. No need to stick to a suboptimal data
structure for the job. Refactor it.
These fields have the same purpose for IPv4 and IPv6. Also, they have an alias
with name _x, that can be indexed by an IS_IPv4 1/0 value.
Rename the fields so that the distinguisher 4/6/x is at the end. The point
is to make the name more similar.