Instead of passing the interval for the timeout, let concheck_periodic_schedule_do()
figure it out on its own. It only depends on cur-interval and
cur-basetime.
Additionally, pass now_ns timestamp, because we already made
decisions based on this particular timestamp. We don't want to
re-evalutate the current time but ensure to use the same timestamp.
There is no change in behavior, it just seems nicer this way.
When the device-state changes to "activated", force a connectivity check
right away. Something possibly happened that affected connectivity.
Also, reduce the interval time down to CONCHECK_P_PROBE_INTERVAL to
start probing again.
if device *is* a NM_DEVICE_IWD, then make sure to not pass that to
_nm_device_wifi_request_scan (which asserts on anything else than a
NM_DEVICE_WIFI device).
The crash can be triggered by enabling wifi.backend=iwd and clicking
on the 'select network' item in gnome shell for example. The journal
output looks like this:
NetworkManager[1861]: invalid cast from 'NMDeviceIwd' to 'NMDeviceWifi'
NetworkManager[1861]: **
NetworkManager[1861]: NetworkManager:ERROR:src/devices/wifi/nm-device-wifi.c:1127:_nm_device_wifi_request_scan: assertion failed: ((((__extension__ ({ GTypeInstance *__inst = (GTypeInstance*) ((_obj)); GType __t = ((nm_device_wifi_get_type ())); gboolean __r; if (!__inst) __r = (0); else if (__inst->g_class && __inst->g_class->g_type == __t) __r = (!(0)); else __r = g_type_check_instance_is_a (__inst, __t); __r; })))))
systemd[1]: NetworkManager.service: Main process exited, code=dumped, status=6/ABRT
systemd[1]: NetworkManager.service: Failed with result 'core-dump'.
Fixes: 297d4985abhttps://github.com/NetworkManager/NetworkManager/pull/107
Without this, nm_device_get_type_description() would quite likely
return "ethernet" for NMDeviceVeth types. This is wrong and was
broken recently.
Fixes: 0775602574
During shutdown, we will need to still iterate the main loop
to do a coordinated shutdown. Currently we do not, and we just
exit, leaving a lot of objects hanging.
If we are going to fix that, we need during shutdown tell
NMDBusManager to reject all future operations.
Note that property getters and "GetManagerObjects" call is not
blocked. It continues to work.
Certainly for some operations, we want to allow them to be called even
during shutdown. However, these have to opt-in.
This also fixes an uglyness, where nm_dbus_manager_start() would
get the set-property-handler and the @manager as user-data. However,
NMDBusManager will always outlife NMManager, hence, after NMManager
is destroyed, the user-data would be a dangling pointer. Currently
that is not an issue, because
- we always leak NMManager
- we don't run the mainloop during shutdown
nm_settings_add_connection_dbus() has two callers. One of them is NMManager
during AddAndActivate. In this case, the NMActiveConnection already created
an auth-subject. Re-use it.
Note how creating an auth-subject involves reading procfs to determine
whether the process still exists. This is not about the additional
overhead of that, but about the race where the process could drop
of in the meantime. The calling process might be gone now, and we would
fail creating the auth-subject. There is no need for that, because we
already evaluated all information we need. Quite likely, in the case
of this race, PolicyKit will also determine that the process is gone
and fail authorization too. But that's PolicyKit's decision to make,
not nm_settings_add_connection_dbus()'s.
We cannot just fire off asynchronous actions without keeping a handle
to them. Otherwise, it's impossible for NMManager to know which
asynchronous operations are pending, and more importantly: it cannot
cancel them.
One day, I want that we do a clean shutdown, where NetworkManager stops
all pending operations, and cleans up everything. That implies, that
every operation is cancellable in a timely manner.
Rework pending nm_active_connection_authorize() calls to be tracked in a
list, so that they are still reachable to NMManager. Note that currently
NMManager does not yet try to cancel these operations ever. However, it
would now be possible to do so.
The async authorization request also carries user-data and its result
must always be handled. For example, it might carry a GDBusMethodInvocation
context, which must be returned and freed.
Hence, when cancelling the request, we must always invoke the callback.
Also, when the NMActiveConnection progresses to state disconnected,
automatically abort the authorization request.
Previously, nm_active_connection_authorize() accepts two user-data
pointers for convenience.
nm_active_connection_authorize() has three callers. One only requires
one user-data, one passes two user-data pointers, and one requires
three pointer.
Also, the way how the third passes the user data (via
g_object_set_qdata_full()) is not great.
Let's only use one user-data pointer. We commonly do that, and it's easy
enough to allocate a buffer to pack multiple pointers together.
We currently start the bus manager only after the creation of a
NMManager because the NMManager is needed to handle set-property bus
calls. However, objects created by NMManager
(e.g. NMDnsSystemdResolved) need a bus connection and so their
initialization currently fail.
To fix this, split nm_dbus_manager_start() in two parts: first only
create the connection and acquire the bus. After this step the
NMManager can be set up. In the second step, set NMManager as the
set-property handler and start exporting objects on the bus.
Fixes: 297d4985ab
Otherwise, the order is undefined and unstable. If you call
GetManagedObjects() on D-Bus multiple times, it's a very nice
property if the diff is small and not full not noise.
It can easily happen that connectivity checks take a long time to
complete (up to 20 seconds, when they time out).
So, before, during the first 20 seconds no connectivity checks would
return and bump the periodic interval. That meant, for the first 20
seconds we would each second schedule a periodic check.
Then, the checks start timing out, each one second apart as we scheduled
them. Previously, during each completion of the checks, we would bump
the interval every second.
Fix that two ways:
1) when the timer expires, also check whether there are still uncomplete
periodic checks. If there are, already bump the interval at that point.
2) at the same time, when this happens mark the handle so that when
they later complete, that they no longer cause another increase of the
interval (no-bump).
Now the bumping is done either by the timeout, or by the completion of
the request. Whatever happens first.
In concheck_periodic_timeout_cb(), we are not sure that we were
scheduled with the current interval. Instead, the timer might
just cover a part of the interval, for example while resetting
the timer interval.
We must always reschedule the timer.
A larger issue is that concheck_periodic_schedule_do() requires an
interval in nanoseconds scale. We passed the wrong timeout there.
A smaller issue is, when we reset the max_interval to something
shorter, *and* the previously schedule timeout is pending for a shorter
time than the new new max-interval, we only need to re-adjust the
timeout, but keep cur_basetime unchanged.
There can be other reasons why the check was cancelled, not only because
the current item was obsoleted. For example, the caller who scheduled a
check externally, might have cancelled it or NMDevice might be
disposed().
Load the thunderbolt-net module if we see a host-to-host connection
and configure the resulting ethernet connection automatically to be
a link-local only one. The latter is done by setting a new udev
property "NM_AUTO_DEFAULT_LINK_LOCAL_ONLY" which is picked up when
we configure the connection for the device.
https://github.com/NetworkManager/NetworkManager/pull/97
If we can't generate a connection and maybe_later is TRUE, it means
that the device can generate/assume connections but it failed for the
moment due to missing master/slaves/addresses. In this case, just
assume the connection from state file.
https://bugzilla.redhat.com/show_bug.cgi?id=1551958
When the D-Bus name is already taken, NM crashes in the following
way. That's because disposed object are not unexported when quitting
and so they linger in the bus-manager's list of exported objects,
causing an invalid access when a neighboring item is accessed. Instead
of just clearing the path, fully unexport the object.
The behavior of not forcefully exporting objects on quit was added in
f9ee20a7b2 ("core: explicitly unexport objects when we're done with
them"), but such behavior doesn't seem to be needed by the stated
goal.
<error> [1524062008.1886] bus-manager: fatal failure to acquire D-Bus service "org.freedesktop.NetworkManager" (3). Service already taken
<trace> [1524062008.2327] config: state: success writing state file "/var/lib/NetworkManager/NetworkManager.state"
<trace> [1524062008.2338] dns-mgr: stopping...
<info> [1524062008.2344] exiting (error)
<debug> [1524062008.2628] disposing NMManager singleton (0xce587e0)
<trace> [1524062008.2640] dns-mgr: disposing
<debug> [1524062008.2651] disposing NMDnsManager singleton (0xceb8b50)
<debug> [1524062008.2666] disposing NMFirewallManager singleton (0xceb62b0)
<debug> [1524062008.2709] disposing NMHostnameManager singleton (0xce7b370)
<trace> [1524062008.2722] dbus-object[0xce70f40]: unexport: "/org/freedesktop/NetworkManager/AgentManager"
==16381== Invalid write of size 8
==16381== at 0x42F511: c_list_unlink_stale (c-list.h:158)
==16381== by 0x42F511: c_list_unlink (c-list.h:171)
==16381== by 0x42F511: _nm_dbus_manager_obj_unexport (nm-dbus-manager.c:1135)
==16381== by 0x4C5E35: nm_dbus_object_unexport (nm-dbus-object.c:165)
==16381== by 0x5C01E9: dispose (nm-agent-manager.c:1634)
==16381== by 0x6636F37: g_object_unref (gobject.c:3303)
==16381== by 0x4BDC89: _nm_singleton_instance_destroy (nm-core-utils.c:138)
==16381== by 0x400FA85: _dl_fini (in /usr/lib64/ld-2.27.so)
==16381== by 0x7F806AB: __run_exit_handlers (in /usr/lib64/libc-2.27.so)
==16381== by 0x7F807DB: exit (in /usr/lib64/libc-2.27.so)
==16381== by 0x41DA34: main (main.c:463)
==16381== Address 0xce706a0 is 48 bytes inside a block of size 176 free'd
==16381== at 0x4C2EDAC: free (vg_replace_malloc.c:530)
==16381== by 0x6ACA3E1: g_free (gmem.c:194)
==16381== by 0x6AE2572: g_slice_free1 (gslice.c:1136)
==16381== by 0x66550AE: g_type_free_instance (gtype.c:1943)
==16381== by 0x4505F8: dispose (nm-manager.c:6867)
==16381== by 0x6636F37: g_object_unref (gobject.c:3303)
==16381== by 0x4BDC89: _nm_singleton_instance_destroy (nm-core-utils.c:138)
==16381== by 0x400FA85: _dl_fini (in /usr/lib64/ld-2.27.so)
==16381== by 0x7F806AB: __run_exit_handlers (in /usr/lib64/libc-2.27.so)
==16381== by 0x7F807DB: exit (in /usr/lib64/libc-2.27.so)
==16381== by 0x41DA34: main (main.c:463)
==16381== Block was alloc'd at
==16381== at 0x4C2DBAB: malloc (vg_replace_malloc.c:299)
==16381== by 0x6ACA2D5: g_malloc (gmem.c:99)
==16381== by 0x6AE1E36: g_slice_alloc (gslice.c:1025)
==16381== by 0x6AE247C: g_slice_alloc0 (gslice.c:1051)
==16381== by 0x6654E09: g_type_create_instance (gtype.c:1848)
==16381== by 0x66376C7: g_object_new_internal (gobject.c:1799)
==16381== by 0x6638E14: g_object_new_with_properties (gobject.c:1967)
==16381== by 0x66399D0: g_object_new (gobject.c:1639)
==16381== by 0x5D6F18: nm_settings_new (nm-settings.c:1897)
==16381== by 0x4514B4: constructed (nm-manager.c:6489)
==16381== by 0x66378FA: g_object_new_internal (gobject.c:1839)
==16381== by 0x6638E14: g_object_new_with_properties (gobject.c:1967)
https://github.com/NetworkManager/NetworkManager/pull/96
Introduce a new ifcfg-rh variable ACD_TIMEOUT that stores the exact
value of ipv4.dad-timeout without rounding. We still write the
initscripts-compatible ARPING_WAIT variable, and read it when
ACD_TIMEOUT is missing.
Print a warning whenever we find a IP conflict on the network. In the
future we may export a flag on the device or send a signal so that
clients can notify the user of the conflict.
Don't return an error from nm_arping_manager_start_probe() since it is
currently useless and the arping-manager already prints the failure
reason. Also, drop a log print from add_address().
NMArpingManager previously spawned an arping process for each
probed/announced address and watched it. This has the disadvantage of
being inefficient and also that for small timeouts we can't be sure
that arping actually started the probe.
Switch to an implementation that doesn't need to spawn external
processes, by using the n-acd code [1] currently imported in our
source tree. The long term plan is that n-acd will become a shared
library we can link against.
The file is still called nm-arping-manager for lazyness, even if a
better name would be nm-acd-manager.
[1] https://github.com/nettools/n-acd/https://bugzilla.redhat.com/show_bug.cgi?id=1507864
During _new_active_connection() we just create the NMActiveConnection
instance to proceed with authorization. The caller might not even
authorize, so we must not touch the device yet.
Do that only later.
Often, functions perform a series of steps, and when they fail,
they bail out. It's simpler if the code is structured that way,
so you can read it from top to bottom and whenever something is
wrong, either return directly (or goto a cleanup label at the
bottom).
From the D-Bus layer, no specific-object is represented by "/". We
should early on normalize such values to NULL, and not expect or
handle them later (like during _new_active_connection()).
Merge _new_vpn_active_connection() into _new_active_connection(). It was the
only caller, and it is simpler to have all the code visible at one place.
That also shows, that the device argument is ignored and not handled.
Ensure that no device is specified for VPN type activations.
Also, in _add_and_activate_auth_done(), always steal the connection
from active's user-data. Otherwise, the lifetime of the connection
is extended until active gets destroyed. For example, if we would leak
active, we would also leak connection that way.