If NM fails to connect to teamd, it currently just sets the device
state to FAILED and waits that deactivate() is called later. However,
the 5 seconds timeout on teamd process start can hit in the meantime,
which fails with an assertion "nm_device_is_activating (device)".
Clean up the device state when the connection to teamd fails.
https://bugzilla.redhat.com/show_bug.cgi?id=1697900
We call GetConnectionUnixProcessID and GetConnectionUnixUser *a lot*.
And we do so synchronously. Both is a problem.
To avoid the first problem, cache the last few requests with each cached
value being valid for one second.
On a quick test, this saves 98% of the requests:
59 GetConnectionUnixProcessID(*)
3201 GetConnectionUnixProcessID(*) (served from cache)
59 GetConnectionUnixUser(*)
3201 GetConnectionUnixUser(*) (served from cache)
Note that now as we serve requests from the cache, it might be the case
that the D-Bus endpoint already disconnected. Previously, the request would
have failed but now we return the cached user-id and process-id. This
problem is mitigated by only caching the values for up to one second.
Also, it's not really a problem because we cache sender names. Those
are supposed to be unique and not repeat. So, even if the peer already
disconnected, it is still true that the corresponding PID/UID was as
we have cached it. We don't use this API for checking whether the peer
is still connected, but what UID/PID it has/had. That answer is still
correct for the cached value after the peer disconnected.
The proxy does nothing for us, except overhead.
We can directly subscribe to "NameOwnerChanged" signals on the
GDBusConnection. Also, instead of asynchronously creating the
GDBusProxy, asynchronously call "GetNameOwner". That's what the
proxy does anyway.
GDBusConnection is actually a decent API. We don't need another layer on
top of that, for functionality that we don't use.
Also, don't use G_BUS_TYPE_SYSTEM, but use the GDBusConnection that
also the bus-manager uses. For all practical purposes, that is the
connection was want to use also in NMDnsSystemdResolved.
Every (failed) attempt to D-Bus activate a service results in log-messages
from dbus-daemon. It must be avoided to spam the logs that way.
Let connectivity check not only ask whether systemd-resolved is enabled
(and NetworkManager would like to push information there), but also
whether it looks like the service is actually available. That is,
either it has a name-owner or it's not blocked from starting.
The previous workaround was to configure main.systemd-resolved=no
in NetworkManager.conf. But that requires explict configuration.
Previously, we would create the D-Bus proxy without
%G_DBUS_PROXY_FLAGS_DO_NOT_AUTO_START_AT_CONSTRUCTION
flag.
That means, when systemd-resolved was not available or masked, the creation
of the D-Bus proxy would fail with
dns-sd-resolved[0x561905dc92d0]: failure to create D-Bus proxy for systemd-resolved: Error calling StartServiceByName for org.freedesktop.resolve1: GDBus.Error:org.freedesktop.systemd1.NoSuchUnit: Unit dbus-org.freedesktop.resolve1.service not found.
and never retried.
Now, when creating the D-Bus proxy don't autostart the instance.
Instead, each D-Bus call will try to poke and start the service.
There is a problem however: if systemd-resolved is not available, then
we must not constantly trying to start it, because it results in a slur
or syslog messages from dbus-daemon:
dbus-daemon[991]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service' requested by ':1.23' (uid=0 pid=1012 comm="/usr/bin/NetworkManager --no-daemon ")
dbus-daemon[991]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.resolve1.service': Unit dbus-org.freedesktop.resolve1.service not found.
dbus-daemon[991]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service' requested by ':1.23' (uid=0 pid=1012 comm="/usr/bin/NetworkManager --no-daemon ")
Avoid that by watching the name owner.
But, since systemd-resolved is D-Bus activated, watching the name owner
alone is not enough to know whether we should try to autostart the service.
Instead:
- if we have a name owner, assume the service runs and we send the update
- if we have no name owner, and we did not recently try to start
the service by name, poke it via "StartServiceByName". The idea
is, that in total we only try this once and remember a previous
attempt in priv->try_start_blocked.
- if we get a name-owner, priv->try_start_blocked gets reset.
Either it was us who started the service, or somebody else.
Either way, we are good to send updates again.
The nice thing is that we only try once to start resolved and only
generate one logging message from dbus-daemon about failure to do so.
But still, after blocking start on failure, when somebody else starts
resolved, we notice it and start using it again.
As we frequently send updates to systemd-resolved and for each update
send multiple messages, it can happen that we log a large number of
warnings if they all fail.
Rate limit the warnings to only warn once (until the failure is
recovered).
Currently, if systemd-resolved is not installed (or disabled) we already
fail once to create the D-Bus proxy (and never retry). That should be
fixed, to create the proxy with G_DBUS_PROXY_FLAGS_DO_NOT_AUTO_START_AT_CONSTRUCTION.
If we allow creating the proxy we would repeatedly try to send messages
and they would all fail. This is one example, where we need to ratelimit
the warning.
Open vSwitch is the special kid on the block -- it likes to be in charge of
the link lifetime and so we shouldn't be. This means that we shouldn't be
attempting to remove the link: we'd just (gracefully) fail anyways.
More importantly, this also means that we shouldn't care if we see the link
go away.
https://bugzilla.redhat.com/show_bug.cgi?id=1543557
If the ovsdb entry gets removed without the device being deactivated,
it's because its parent was removed and we should use the
DEPENDENCY_FAILED reason.
This is important because, with that reason, policy knows not to
autoconnect and bring the port that was being removed back.
Going directly to unmanaged just to prevent auto-connection turns out to
be the wrong thing to do. Perhaps we're reactivating the device, and
unmanaging it would interfere with the new activation.
This reverts commit 045b88a5b5.
In general shortcutting state is a no-no. But putting a device to FAILED
state because its master is going down is a crime. It's the wrong state:
the devices should enter it when their connections themselves failed
unexpectedly, and can potentially recover with another actiation.
Otherwise bad things happen,
In particular, the devices automatically enter DISCONNECTED state and
eventually retry autoconnecting. In this case they would attempt to
bring the master back up. Ugh.
This situation happens when a topomost master of multiple levels of
master-slave relationship is deactivated.
Aside from that, shortcutting to DISCONNECTED on unknown change reason
doesn't make sense either. Like, wtf, just traverse through DEACTIVATING
like all the other kids do.
Seems on a busy system, we can hit this timeout. Increase it.
ERROR:../src/platform/tests/test-common.c:939:_ip_address_add: code should not be reached
Connection defaults should correspond in range to the per-profile values.
"infiniband.mtu" is required to be not larger than 65520, so we also
need to honor that when parsing the connection default.
'sriov_drivers_autoprobe' was added in kernel 4.12. With previous
kernel versions NM is currently unable to set any SR-IOV parameter
because it tries to read 'sriov_drivers_autoprobe' which doesn't
exist, assumes that current value is -1 and tries to change it,
failing.
When the file doesn't exist, drivers are automatically probed so we
can assume the value is 1. In this way NM is able to activate a
connection with sriov.autoprobe-drivers=1 (the default) even on older
kernel versions.
Fixes: 1e41495d9a ('platform: sriov: write new values when we can't read old ones')
https://bugzilla.redhat.com/show_bug.cgi?id=1695093
... and nm_acd_manager_announce_addresses().
The test will need more information to know why it may fail.
Return a NetworkManager error code, instead of a boolean.
When a device is removed (like when the user unplugs a usb network
device) the device object is removed, so it doesn't emit a notify signal
for a change in its connectivity and so, device_connectivity_changed
is not called. This means that nobody updates the global connectivity
value which is potentially wrong if the device was the one providing
network connectivity.
Since device_connectivity_changed's first two parameters aren't actually
used and are there just for the signal to be able to be connected, I
moved the code from device_connectivity_changed to a new
update_connectivity_value function that just takes a NMManager
parameter and also call it from remove_device.
[thaller@redhat.com: fix coding style regarding whitespace]
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/141https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/101
Go straight to unmanaged. That's what all the other devices do when
their backing resources vanish. If the device reached disconnected
state, an autoconnect check would try to connect it back, in vain.
https://github.com/NetworkManager/NetworkManager/pull/324
Open vSwitch is the special kid on the block -- it likes to be in charge of
the link lifetime and so we shouldn't be. This means that we shouldn't be
attempting to remove the link: we'd just (gracefully) fail anyways.
More importantly, this also means that we shouldn't care if we see the link
go away. Once the device reaches DISCONNECTED state, its configuration is
cleaned up and we may already be activating another connection. We shouldn't
alter the device state when OpenVSwitch decides to drop the old link.
https://bugzilla.redhat.com/show_bug.cgi?id=1543557https://github.com/NetworkManager/NetworkManager/pull/324
Fixes a crash on failed AddAndActivate:
$ ip link set eth0 down
$ nmcli d conn eth0
Error: Failed to add/activate new connection: Connection 'eth0' is not available on device eth0 because device has no carrier
<NetworkManager crashes>
#3 0x000055555558b6c5 in _nm_g_return_if_fail_warning
#4 0x00005555557008c7 in nm_settings_has_connection
#5 0x0000555555700e5f in pk_add_cb
#6 0x0000555555726e30 in pk_call_cb
#7 0x0000555555726e30 in pk_call_cb
#8 0x0000555555726e30 in pk_call_cb
#9 0x00005555555aaea8 in _call_id_invoke_callback
#10 0x00005555555ab2e8 in _call_on_idle
https://github.com/NetworkManager/NetworkManager/pull/325
initscripts support rule-* and rule6-* files for that.
Up until now, we ignored these files for the most part, except if
a user configured such files, the profile could not contain any static
routes (or specify a route-table setting). This also worked together
with the dispatcher script "examples/dispatcher/10-ifcfg-rh-routes.sh".
We cannot now start taking over that file format for rules. It might
break existing setups, because we can never fully understand all rules as
they are understood by iproute2. Also, if a user has a rule/rule6 file and
uses NetworkManager successfully today, then clearly there is a script
in place to make that work. We must not break that when adding rules
support.
Hence, store routing rules as numbered "ROUTING_RULE_#" and
"ROUTING_RULE6_#" keys.
Note that we use different keys for IPv4 and IPv6. The main reason is
that the string format is mostly compatible with iproute2. That means,
you can take the value and pass it to `ip rule add`.
However, `ip rule add` only accepts IPv4 rules. For IPv6 rules, the user
needs to call `ip -6 rule add`. If we would use the same key for IPv4
and IPv6, then it would be hard to write a script to do this.
Also, nm_ip_routing_rule_from_string() does take the address family as
hint in this case. This makes
ROUTING_RULE_1="pref 1"
ROUTING_RULE6_1="pref 1"
automatically determine that address families. Otherwise, such
abbreviated forms would be not valid.
It's called NM_MORE_ASSERTS not WITH_MORE_ASSERTS.
Also, NM_MORE_ASSERTS is always enabled. It's wrong to check whether it
is defined.
Fixes: e1e428b21e
Add support for IEEE 802.3 organizationally specific TLVs:
- MAC/PHY configuration/status (IEEE 802.1AB-2009 clause F.2)
- power via medium dependent interface (clause F.3)
- maximum frame size (clause F.4)
Previously we exported the contents of VLAN Name TLV in the 'vid'
(uint32) and 'vlan-name' (string) attributes. This is not entirely
correct as the TLV can appear multiple times.
We need a way to export all the VLAN IDs and names for the
neighbor. Add a new 'vlans' attribute which obsoletes the other two
and is an array of dictionaries, where each dictionary contains the
'vid' and 'name' keys.
Support the management address TLV (IEEE 802.1AB-2009 clause
8.5.9). The TLV can appear multiple times and so it is exported on
D-Bus as an array of dictionaries.