There is only one user of the NM_DEVICE_MASTER property (NMActiveConnection).
device_master_changed() uses the property for change notifications,
but returns early if !nm_device_get_master().
We might emit the "notify::master" signal too often, even when nothing
changed (because currently we always emit it when priv->master changes).
If we would fix that to only emit the signal when nm_device_get_master()
actually changes, it would still be correct, because device_master_changed()
doesn't care about notifications when the master is unset.
Hence, the only user doesn't care about the difference. So change it because
it's unexpected that the property and function are not completely equal.
Commit cd3df12c8f reused the
virtual function component_added() to notify the vlan device
about a possibly new parent.
This reuse of the virtual function for another purpose is confusing.
Clean that up by splitting the implementation and add a new
virtual function nm_device_notify_new_device_added() which gets
(only implemented by NMDeviceVlan).
When deleting a master-device either via `nmcli device delete`
or `ip link delete`, the slave-device would hang.
This seems to be broken for a very long time already.
This is due to the following:
#0 0x00005555555f548c in nm_device_slave_notify_release (self=0x555555dc1300, reason=NM_DEVICE_STATE_REASON_NONE) at devices/nm-device.c:2175
#1 0x00005555555d6de2 in nm_device_release_one_slave (self=0x555555de3dd0, slave=0x555555dc1300, configure=0, reason=NM_DEVICE_STATE_REASON_NONE) at devices/nm-device.c:1117
#2 0x00005555555f02b7 in device_link_changed (self=0x555555dc1300) at devices/nm-device.c:1460
Previously, nm_device_slave_notify_release() being called with reason
"NONE" did not actually transition the device-state, thus keeping the
device wrongly in activated state.
There were two callers that passed configure=FALSE to nm_device_release_one_slave(),
(and thus reason=NONE to nm_device_slave_notify_release()):
- (1) device_link_changed():
nm_device_release_one_slave (priv->master, self, FALSE, /*wrong reason NONE*/);
- (2) nm_device_removed():
nm_device_release_one_slave (priv->master, self, FALSE, NM_DEVICE_STATE_REASON_REMOVED);
We should always change the device-state during nm_device_slave_notify_release()
regardless of the reason.
(2) was added by commit c83b40aca7, later
refined by commit 5dd48f7527. In a way
change the second fix to perform some of the configuration (but still
not unenslaving the device).
The new object type represents tunnels over IPv4 and IPv6.
We have a single setting type (NMSettingIPTunnel) for tunnels and it
can't be shared among different device factories. So we define also a
single device type for all tunnels.
This new object will also represent GRE tunnels, which before were
instantiated as NMDeviceGre and had a ".Device.Gre" D-Bus
interface. This commit introduces a change in behavior.
This enum was unused and meaningless because the platform signals
are emitted as a consequence of netlink messages. It is not clear
whether a netlink message was received due to an external event
or an internal action.
The DHCP client from new libsystemd-network requires a link-local IPv6
address to be passed to the library; add a new argument to
nm_dhcp_manager_start_ip6() and related functions.
When deconfiguring a device, we must also explicitly clear the
default-route -- unless the device was assumed.
This can easily reproduced by disconnecting the cable from the
wired connection that has the default rout. Prevously, the
default-route was not cleared and lingered around.
https://bugzilla.gnome.org/show_bug.cgi?id=757587
Device subclasses can call nm_device_recheck_available() at any time,
and the function would change the device's state to UNKNOWN in cases
where the device was available already. For WWAN devices, availability
is rechecked every time the modem state changes, resulting in:
NetworkManager[28919]: <info> (ttyUSB4): modem state changed, 'disabled' --> 'enabling' (reason: user-requested)
NetworkManager[28919]: <debug> [1445538582.116727] [devices/nm-device.c:2769] recheck_available(): [0x23bd710] (ttyUSB4): device is available, will transition to unknown
NetworkManager[28919]: <info> (ttyUSB4): modem state changed, 'enabling' --> 'searching' (reason: user-requested)
NetworkManager[28919]: <debug> [1445538582.776317] [devices/nm-device.c:2769] recheck_available(): [0x23bd710] (ttyUSB4): device is available, will transition to unknown
Previously most objects were implicitly unexported when they were
destroyed, but since refcounts may make the object live longer than
intended, we should explicitly unexport them when they should no
longer be present on the bus.
This means we can assume that objects will always be un-exported
already when they are destroyed, *except* when quitting where most
objects will live until exit because NM leaves interfaces up and
running on quit.
Otherwise we'd hit an assert and rightly so!
Program received signal SIGTRAP, Trace/breakpoint trap.
g_logv (log_domain=0x5555556b2f80 "NetworkManager", log_level=G_LOG_LEVEL_WARNING, format=<optimized out>, args=args@entry=0x7fffffffcd10) at gmessages.c:1046
1046 g_private_set (&g_log_depth, GUINT_TO_POINTER (depth));
(gdb) bt
#0 g_logv (log_domain=0x5555556b2f80 "NetworkManager", log_level=G_LOG_LEVEL_WARNING, format=<optimized out>, args=args@entry=0x7fffffffcd10) at gmessages.c:1046
#1 0x00007ffff4a4ea3f in g_log (log_domain=log_domain@entry=0x5555556b2f80 "NetworkManager", log_level=log_level@entry=G_LOG_LEVEL_WARNING, format=format@entry=0x7ffff4ac1e4c "%s") at gmessages.c:1079
#2 0x00007ffff4a4ed56 in g_warn_message (domain=domain@entry=0x5555556b2f80 "NetworkManager", file=file@entry=0x5555556aca93 "devices/nm-device.c", line=line@entry=1101,
func=func@entry=0x5555556b22e0 <__FUNCTION__.35443> "nm_device_release_one_slave", warnexpr=warnexpr@entry=0x0) at gmessages.c:1112
#3 0x00005555555ba80a in nm_device_release_one_slave (self=self@entry=0x5555559ec4c0, slave=slave@entry=0x5555559f7800, configure=configure@entry=1, reason=reason@entry=NM_DEVICE_STATE_REASON_NONE)
at devices/nm-device.c:1101
#4 0x00005555555c264b in slave_state_changed (slave=0x5555559f7800, slave_new_state=NM_DEVICE_STATE_FAILED, slave_old_state=NM_DEVICE_STATE_IP_CONFIG, reason=NM_DEVICE_STATE_REASON_NONE, self=0x5555559ec4c0)
at devices/nm-device.c:1700
#5 0x00007ffff339cdac in ffi_call_unix64 () at ../src/x86/unix64.S:76
#6 0x00007ffff339c6d5 in ffi_call (cif=cif@entry=0x7fffffffd1c0, fn=<optimized out>, rvalue=0x7fffffffd130, avalue=avalue@entry=0x7fffffffd0b0) at ../src/x86/ffi64.c:522
#7 0x00007ffff4d45678 in g_cclosure_marshal_generic (closure=0x5555559b0160, return_gvalue=0x0, n_param_values=<optimized out>, param_values=<optimized out>, invocation_hint=<optimized out>, marshal_data=0x0)
at gclosure.c:1454
#8 0x00007ffff4d44e38 in g_closure_invoke (closure=0x5555559b0160, return_value=return_value@entry=0x0, n_param_values=4, param_values=param_values@entry=0x7fffffffd3c0,
invocation_hint=invocation_hint@entry=0x7fffffffd360) at gclosure.c:768
#9 0x00007ffff4d5675d in signal_emit_unlocked_R (node=node@entry=0x55555598a6f0, detail=detail@entry=0, instance=instance@entry=0x5555559f7800, emission_return=emission_return@entry=0x0,
instance_and_params=instance_and_params@entry=0x7fffffffd3c0) at gsignal.c:3553
#10 0x00007ffff4d5e4c1 in g_signal_emit_valist (instance=instance@entry=0x5555559f7800, signal_id=signal_id@entry=72, detail=detail@entry=0, var_args=var_args@entry=0x7fffffffd5f8) at gsignal.c:3309
#11 0x00007ffff4d5ecc8 in g_signal_emit_by_name (instance=instance@entry=0x5555559f7800, detailed_signal=detailed_signal@entry=0x5555556c0405 "state-changed") at gsignal.c:3405
#12 0x00005555555bd0e0 in _set_state_full (self=self@entry=0x5555559f7800, state=state@entry=NM_DEVICE_STATE_FAILED, reason=reason@entry=NM_DEVICE_STATE_REASON_NONE, quitting=quitting@entry=0)
at devices/nm-device.c:8580
#13 0x00005555555be0e7 in nm_device_state_changed (self=self@entry=0x5555559f7800, state=state@entry=NM_DEVICE_STATE_FAILED, reason=reason@entry=NM_DEVICE_STATE_REASON_NONE) at devices/nm-device.c:8741
#14 0x00005555555c0a45 in queued_set_state (user_data=<optimized out>) at devices/nm-device.c:8765
#15 0x00007ffff4a4779a in g_main_dispatch (context=0x5555559433c0) at gmain.c:3109
#16 g_main_context_dispatch (context=context@entry=0x5555559433c0) at gmain.c:3708
#17 0x00007ffff4a47ae8 in g_main_context_iterate (context=0x5555559433c0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3779
#18 0x00007ffff4a47dba in g_main_loop_run (loop=0x555555943480) at gmain.c:3973
#19 0x000055555559713d in main (argc=1, argv=0x7fffffffdb78) at main.c:512
(gdb)
Device activation normally fails during one of the stages and in that
case the activation chain is implicitly interrupted.
But in some cases the device fails for external events (as a failure
of master connection) while the activation sequence is still running
and so we need to ensure that any pending activation source gets
cleared upon entering the failed state.
https://bugzilla.redhat.com/show_bug.cgi?id=1270814
RFC7217 introduces an alternative mechanism for creating addresses during
stateless IPv6 address configuration. It's supposed to create addresses whose
host part stays stable in a particular network but changes when the hosts
enters another network to mitigate possibility of tracking the host movement.
It can be used alongside RFC 4941 privacy extensions (temporary addresses)
and replaces the use of RFC 4862 interface identifiers.
The address creation mode is controlld by ip6.addr_gen_mode property
(ADDR_GEN_MODE in ifcfg-rh), with values of "stable-privacy" and "eui-64",
defaulting to "eui-64" if unspecified.
The host part of an address is computed by hashing a system-specific secret
salted with various stable values that identify the connection with a secure
hash algorithm:
RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key)
For NetworkManager we use these parameters:
* F()
SHA256 hash function.
* Prefix
This is a network part of the /64 address
* Net_Iface
We use the interface name (e.g. "eth0"). This ensures the address won't
change with the change of interface hardware.
* Network_ID
We use the connection UUID here. This ensures the salt is different for
wireless networks with a different SSID as suggested by RFC7217.
* DAD_Counter
A per-address counter that increases with each DAD failure.
* secret_key
We store the secret key in /var/lib/NetworkManager/secret_key. If it's
shorter than 128 bits then it's rejected. If the file is not present we
initialize it by fetching 256 pseudo-random bits from /dev/urandom on
first use.
Duplicate address detection uses IDGEN_RETRIES = 3 and does not utilize the
IDGEN_DELAY delay (despite it SHOULD). This is for ease of implementation
and may change in future. Neither parameter is currently configurable.
NMDevice detects the DAD failures by watching the removal of tentative
addresses (happens for DAD of addresses with valid lifetime, typically
discovered addresses) or changes to addresses with dadfailed flag (permanent
addresses, typically link-local and manually configured addresses).
It retries creation of link-local addresses itself and lets RDisc know about
the rest so that it can decide if it's rdisc-managed address and retry
with a new address.
Currently NMDevice doesn't do anything useful about link-local address DAD
failures -- it just fails the link-local address addition instead of just
timing out, which happened before. RDisc just logs a warning and removes
the address from the list.
However, with RFC7217 stable privacy addresses the use of a different address
and thus a recovery from DAD failures would be possible.
Instead of using libnl-route-3 library to serialize netlink messages,
construct the netlink messages ourselves.
This has several advantages:
- Creating the netlink message ourself is actually more straight
forward then having an intermediate layer between NM and the kernel.
Now it is immediately clear, how a platform request translates to
a netlink/kernel request.
You can look at the kernel sources how a certain netlink attribute
behaves, and then it's immediately clear how to set that (and vice
versa).
- Older libnl versions might have bugs or missing features for which
we needed to workaround (often by offering a reduced/broken/untested
functionality). Now we can get rid or workaround like _nl_has_capability(),
check_support_libnl_extended_ifa_flags(), HAVE_LIBNL_INET6_TOKEN.
Another example is a libnl bug when setting vlan ingress map which
isn't even yet fixed in libnl upstream.
- We no longer need libnl-route-3 at all and can drop that runtime
requirement, saving some 400k.
Constructing the messages ourselves also gives better performance
because we don't have to create the intermediate libnl object.
- In the future we will add more link-type support which is easier
to support by basing directly on the plain kernel/netlink API,
instead of requiring also libnl3 to expose this functionality.
E.g. adding macvtap support: we already parsed macvtap properties
ourselves because of missing libnl support. To *add* macvtap
support, we also would have to do it ourself (or extend libnl).
The peer-address (IFA_ADDRESS) can also be all-zero (0.0.0.0).
That is distinct from an usual address without explicit peer-address,
which implicitly has the same peer and local address.
Previously, we treated an all-zero peer_address as having peer and
local address equal. This is especially grave, because the peer is part
of the primary key for an IPv4 address. So we not only get a property of
the address wrong, but we wrongly consider two different addresses as
one and the same.
To properly handle these addresses, we always must explicitly set the peer.
Usually, the peer-address is the same as the local address.
In case where it is not, it is the peer-address that determines
the IPv4 device-route. So we must use the peer-address.
Also, don't consider device-routes with the first octet of zero,
just like kernel does.
Also, nm_ip4_config_get_subnet_for_host() is effectively the same
as nm_ip4_config_destination_is_direct(). So drop it.
The peer-address seems less important then the prefix-length.
Also, nm_platform_ip4_address_delete() has the peer-address
argument as last.
Soon ip4_address_get() also receives a peer-address argument,
so get the order right first.
This adds a LldpNeighbors property to the Device D-Bus interface
carrying information about devices discovered through LLDP. The
property is an array of hashes and each hash describes the values of
LLDP TLVs for a specific neighbor.
The unmanaged-flag NM_UNMANAGED_EXTERNAL_DOWN is initially set during
nm_device_finish_init(). But it was only set if the device was down at
that point.
If due to a race the platform device was not yet initialized, a later
initialization in device_link_changed() would clear NM_UNMANAGED_PLATFORM_INIT.
If the device is not external-down (because it was already up during
nm_device_finish_init()), the device will be managed right away with
reason NM_DEVICE_STATE_REASON_NOW_MANAGED.
Together with commit e29ab54335, this
is a race that causes a failure to assume the external-down device.
https://bugzilla.redhat.com/show_bug.cgi?id=1269199
Executing:
# brctl addbr lbr0
# ip addr add 10.1.1.1/24 dev lbr0
# ip link set lbr0 up
can result in a race so that NetworkManager would manage the device
(and clear the IP addresses).
It happens, when NetworkManager first receives platform signals that
the device is already up:
signal: link changed: 11: lbr0 <UP,LOWER_UP;broadcast,multicast,up,running,lowerup> mtu 1500 arp 1 bridge* not-init addrgenmode eui64 addr D2:A1:B4:17:18:F2 driver bridge
Note that the device is still unknown via udev (not-init). The
unmanaged-state NM_UNMANAGED_EXTERNAL_DOWN gets cleared, but the
device still stays unmanaged.
Only afterwards the device is known in udev:
signal: link changed: 11: lbr0 <UP,LOWER_UP;broadcast,multicast,up,running,lowerup> mtu 1500 arp 1 bridge* init addrgenmode eui64 addr D2:A1:B4:17:18:F2 driver bridge
At this point, we also clear NM_UNMANAGED_PLATFORM_INIT, making
the device managed with reason NM_DEVICE_STATE_REASON_NOW_MANAGED.
That results in managing the external device.
Fix that by only clearing NM_UNMANAGED_EXTERNAL_DOWN after the device
is no longer NM_UNMANAGED_PLATFORM_INIT.
https://bugzilla.redhat.com/show_bug.cgi?id=1269199
schedule_stage3() used to set the firewall before really
scheduling the stage3. In that case, fw_change_zone_cb() would
then directly call:
activation_source_schedule (self, activate_stage3_ip_config_start, AF_INET);
This was different from all other places. Usually, only the
nm_device_schedule_*() functions would directly call to
activation_source_schedule().
Change this, to behave similar as when we wait for master-ready
while scheduling stage2. As such, it is more idiomatic, and
it would still work correctly even if there were multiple conditions
that would block scheduling-stage3.
Instead of adding explicit logging lines while scheduling and processing
the activation-stages, only log them in activation_source_schedule()
and activation_source_handle_cb().
Effectively, lines like:
"Activation: Stage 1 of 5 (Device Prepare) started..."
are now:
"activation-stage: invoke activate_stage1_device_prepare,2 (id x)"
The function name is logged, so change them to be more suitable:
- drop the "nm_device_" prefix. It is redundant in the logging,
and usually our static functions don't have an nm-prefix.
- have all functions now start with "activate_stagex_". The
stage number corresponds to what we are currently logging
during activation:
"Activation: Stage 1 of 5 (Device Prepare)..."
No other functions start with a "activate_stagex_" prefix,
so it's easy to grep.
Every activation-source handler first cleared the current
activation-source (activation_source_clear()) and later
returned FALSE/G_SOURCE_REMOVE.
Do not directly attach the handlers to g_idle_add().
Instead wrap the actual handler. This wrapper
- always clears the activation source first
- always returns G_SOURCE_REMOVE
- does trace logging about invoking the handler
This also avoids a possibility for a bug where the handler returns
G_SOURCE_REMOVE, without also clearing activation_source_clear().
Don't handle master-ready at the beginning of stage2, but instead while
scheduling (and then possibly delaying the scheduling of stage2).
This seems more idiomatic:
When inside a stage and your part is done: call schedule-next-stage.
That is, always schedule the next stage, not the current one.
schedule-next-stage then might delay to really scheduling until the
device is ready for the next state.
Fixes: 85ac903bb8
During stage2, if the slave detected that it would need to wait for
the master, it would return FALSE (which removes the g-idle-handler).
However, it would not clear the activation-source, so later, when
the master becomes ready, its attempt to schedule stage2 again would
result in an error-log and the idle-handler would not be scheduled
again.
Fixes: 85ac903bb8https://bugzilla.redhat.com/show_bug.cgi?id=1268797https://bugzilla.redhat.com/show_bug.cgi?id=1183444