CentOS 7's headers don't yet contains IFLA_BOND_PEER_NOTIF_DELAY.
Define it ourselves.
Fixes: f900f7bc2c ('platform: add netlink support for bond link')
_host_id_read() is the only place where we really care to have good
random numbers, because that is the secret key that we persist to disk.
Previously, we tried only nm_random_get_bytes_full(), which is a best
effort to get strong random numbers. If it fails to generate those,
it would simply remember the generated key in memory and proceed, but not
persist it to disk.
nm_random_get_bytes_full() does not block waiting for good numbers.
Change that. Now, first call nm_random_get_crypto_bytes(), which would
block and try hard to get good random numbers. Only if that fails,
fallback to nm_random_get_bytes_full() as before. The difference is of
course only in early boot, when we might not yet have entropy. In that
case, I think it's better for NetworkManager to block.
Heavily inspired by systemd ([1]).
We now also have nm_random_get_bytes{,_full}() and
nm_random_get_crypto_bytes(), like systemd's random_bytes()
and crypto_random_bytes(), respectively.
Differences:
- instead of systemd's random_bytes(), our nm_random_get_bytes_full()
also estimates whether the output is of high quality. The caller
may find that interesting. Due to that, we will first try to call
getrandom(GRND_NONBLOCK) before getrandom(GRND_INSECURE). That is
reversed from systemd's random_bytes(), because we want to find
out whether we can get good random numbers. In most cases, kernel
should have entropy already, and it makes no difference.
Otherwise, heavily rework the code. It should be easy to understand
and correct.
There is also a major bugfix here. Previously, if getrandom() failed
with ENOSYS and we fell back to /dev/urandom, we would assume that we
have high quality random numbers. That assumption is not warranted.
Now instead poll on /dev/random to find out.
[1] a268e7f402/src/basic/random-util.c (L81)
sysfs is deprecated and kernel people will not add new bond options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
Add parentheses around macro arguments.
Yes, it's not technically necessary when using macro arguments are
surrounded by commas. Still do it, for consistency and for not having
special exceptions to the rule.
The devices generally need to be IFF_UP and wait a little before the
carrier detection is reliable. Some devices, actually need to wait
more than a little -- r8169 needs up to 5 seconds.
For this reason, we delay startup complete while the carrier is down
after we bring the device up. We do this so that we don't reject
activations due to carrier down until we're sure it's really down.
This works well as long as it's us who brought the device up.
If we're restarting the daemon, the device is going to be already up
when we start up the daemon for the second time. There's, however, a
slim chance that the device was brought down and up very shortly before
the restart and therefore the carrier reporting is still not reliable.
As a matter of fact, we bring the devices down and back up on some
occassions, such as when enslaving to a team device.
Therefore, the following events in quick succession cause trouble:
# nmcli con up team-slave-eth0
[20099.205355] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.365641] nm-team: Port device eth0 added
[20099.370728] r8169 0000:03:00.0 eth0: Link is Down
[20099.436631] nm-team: Port device eth0 removed
[20099.463422] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.628505] r8169 0000:03:00.0 eth0: Link is Down
[20099.669425] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.833457] r8169 0000:03:00.0 eth0: Link is Down
[20099.838471] nm-team: Port device eth0 added
The device has been brought down, enslaved and brought up.
"Link is Down" indicates carrier not being detected.
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/7)
# systemctl restart NetworkManager
Now NM sees the device being up, but carrier down.
# nmcli con up testeth0
Error: Connection activation failed: No suitable device found for this connection (...).
Activation failed, because eth0 carrier still appears down.
# [20102.943464] r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Now it's up, but the party is already over. Shiet.
Let's wait whenever the device reaches unavailable state, whether we
bring it up at that point or not.
Fixes-test: @restart_L2_only_lacp
https://bugzilla.redhat.com/show_bug.cgi?id=2092361https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1316
nm_utils_random_bytes() is supposed to give us good random number from
kernel. It guarantees to always provide some bytes, and it has a
boolean return value that estimates whether the bytes are good
randomness. In practice, most callers ignore that return value, because
what would they do about it anyway?
Of course, we want to primarily use getrandom() (or "/dev/urandom"). But
if we fail to get random bytes from them, we have a fallback path that
tries to generate "random" bytes.
It does so, by initializing a global seed from various sources, and keep
sha256 hashing the buffer in a loop. That's certainly not efficient nor
elegant, but we already are in a fallback path.
Still, we can do slightly better. Instead of just using the global state
and keep updating it (entirely deterministically), every time also mix in
the results from getrandom() and a current timestamp. The idea is that if you
have a virtual machine that gets cloned, we don't want that our global
state keeps giving the same random numbers. In particular, because
getrandom() might handle that case, even if it doesn't have good
entropy.
The reproducer for another problem tripped an assertion failure:
$ nmcli con del act-conn
Connection 'act-conn' (...) successfully deleted.
$ nmcli con down another-conn
(process:94552): nm-CRITICAL **: 17:07:21.170: ((src/libnm-client-impl/nm-remote-connection.c:593)): assertion '<dropped>' failed
Connection 'another-conn' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4)
$
What happens is that the second invocation, when resolving the
connection name into a NMRemoteConnection object, assumes an active
connection has a settings connection.
This assumption is likely to be wrong immediately after deleting a
connection was active, before giving the active connection enough time
to fully deactivate.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1317
This doesn't use NM_UNIQ_T() to create a truly unique name.
Instead, it avoids "_arr" as local variable name, which other
macros also use. By choosing a different name, we can nest such
macro calls, without getting a "-Wshadow" warning.
If you do:
nm_assert_addr_family(NMP_OBJECT_CAST_MPTCP_ADDR(obj)->addr_family));
then there are two nested NM_IN_SET() macro invocations. Once,
NMP_OBJECT_CAST_MPTCP_ADDR() checks that the object type is one of
a few selected (using NM_IN_SET()). Then, that is passed to
nm_assert_addr_family(), which checks NM_IN_SET(addr_family, AF_INET,
AF_INET6).
In general, it's easy to end up in a situation like this.
And it mostly works just fine. The only problem was that NM_IN_SET()
uses an internal, local variable "_x". The compiler will emit a very
verbose failure about the shadowed variable:
./src/libnm-std-aux/nm-std-aux.h:802:14: error: declaration of '_x' shadows a previous local [-Werror=shadow]
802 | type _x = (x); \
NM_UNIQ_T() exists for this purpose. Use it. NM_IN_SET() is
popular enough to warrant a special treatment to avoid this pitfall.
The strength of CList is of course to use it as a stack of queue,
and only append/remove from the front/tail.
However, since this is an intrusive list, it can also be useful to
just use it to track elements, and -- when necessary -- sort them
via c_list_sort().
If we have a sorted list, we might want to insert a new element
honoring the sort order. This function achieves that.
Don't mix <net/ethernet.h> and <linux/if_ether.h>.
Fixes the following build error with musl libc:
In file included from /usr/include/net/ethernet.h:10,
from ../src/libnm-platform/nm-linux-platform.c:17:
/usr/include/netinet/if_ether.h:115:8: error: redefinition of 'struct ethhdr'
115 | struct ethhdr {
| ^~~~~~
In file included from ../src/linux-headers/ethtool.h:19,
from ../src/libnm-std-aux/nm-linux-compat.h:22,
from ../src/libnm-platform/nm-linux-platform.c:10:
/usr/include/linux/if_ether.h:169:8: note: originally defined here
169 | struct ethhdr {
| ^~~~~~
Fixes: dc98ab807c ('platform: include "linux-headers" via "libnm-std-aux/nm-linux-compat.h"')
NetworkManager primarily manages interfaces in an independent fashion.
That means, whenever possible, we want to have a interface specific
view. In many cases, the underlying kernel API also supports that view.
For example, when configuring IP addresses or unicast routes, we do so
per interfaces and don't need a holistic view.
However, that is not always sufficient. For routing rules and certain
route types (blackhole, unreachable, etc), we need a system wide view
of all the objects in the network namespace.
Originally, NMPRulesManager was added to track routing rules. Then, it
was extended to also track certain route types, and the API was renamed to
NMPRouteManager.
This will also be used to track MPTCP addresses.
So rename again, to give it a general name that is suitable for what it
does. Still, the name is not great (suggestion welcome), but it should
cover the purpose of the API well enough. And it's the best I came
up with.
Rename.
For IPv6, kernel does not accept the ifa_scope parameter and always
determines the scope based on the address itself.
For IPv4, it honors whatever scope the user sets via netlink.
NetworkManager does not allow to directly configure the address
scope, but autodetects it.
Use nm_platform_ip4_address_get_scope() for detecting the scopt.
This also fixes the issue that to detect loopback addresses 127.0.0.0/8
and use scope "host".
Try:
$ nmcli device modify "$IFACE" +ipv4.addresses 127.0.0.5/8
We have our own copy of linux kernel headers, and we must never
directly include the corresponding versions from the system.
Avoid that, by only including the clones via "libnm-std-aux/nm-linux-compat.h"
and by including the compat wrapper header before other system headers.
It's similar to nm_ip_addr_cmp(), but it can be used as an argument
to g_qsort_with_data() to sort a list of NMIPAddr (or in_addr_t or
struct in6_addr).
The address family needs to be given as user-data.
By default, wpa_supplicant sets these parameters according to the
802.11 standard:
dot11RSNAConfigPMKLifetime = 43200 seconds (12 hours)
dot11RSNAConfigPMKReauthThreshold = 70%
With these, the supplicant triggers a new EAP authentication every 8
hours and 24 minutes. If the network uses one-time secrets, the
reauthentication fails and the supplicant disconnects. It doesn't seem
desirable that the client starts a reauthentication so early; bump the
lifetime to a week.
Currently, due to a bug, the new value is ignored by wpa_supplicant
when set via D-Bus. This patch needs the fix at [1], not yet merged.
[1] http://lists.infradead.org/pipermail/hostap/2022-July/040664.htmlhttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1306
The current behavior of "nmcli networking off" is that it starts
disconnecting the devices, but doesn't wait for them to actually
come down.
That is not too helpful: the user never knows when the network is
actually disconnected.
Some users, notably the NetworkManager-CI test suite, seem to expect the
devices are all disconnected after the command finishes. Even worse,
it immediately proceeds activating the connections:
@ovs_cloned_mac_set_on_iface
...
* Execute "nmcli networking off && nmcli networking on"
This results in pure utter chaos. In particular, the slave connections
sometimes refuse to activate after "nmcli networking on", because the
master connections are still getting disconnected in response to
preceding "nmcli networking off".
Let's make Enable(FALSE) and Sleep(TRUE) block until none of the devices
are expected to go down.
Note that this makes those call also return when Enable(TRUE) and
Sleep(FALSE) is issued in meanwhile. Therefore a return from
Enable(FALSE) doesn't necessarily imply the networking is disabled.
This is a feature, not a bug -- the actual manager state is available in
the "state" property.
Fixes-test: @ovs_cloned_mac_set_on_iface
https://bugzilla.redhat.com/show_bug.cgi?id=2093175https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1292
Usually we want no difference between the upstream project that we fork
via git-subtree, and our copy. However, for the subprojects, we need to
patch them. Do it.
If you know a better way, that allows to overwriting the subprojects
please send a patch.
n-dhcp4 only supports calling ACCEPT during the GRANTED state.
Not during a EXTENDED event. So usually, we would not want
to call accept in that case.
And we didn't. During EXTENDED event, we would usually skip ACD (because
it's either not enabled or we already passed ACD for the current address).
In that case, in _nm_dhcp_client_notify() we hit the line
if (client_event_type == NM_DHCP_CLIENT_EVENT_TYPE_BOUND && priv->l3cd_curr
&& nm_l3_config_data_get_num_addresses(priv->l3cd_curr, priv->config.addr_family) > 0)
priv->l3cfg_notify.wait_dhcp_commit = TRUE;
else
priv->l3cfg_notify.wait_dhcp_commit = FALSE;
and would not set `wait_dhpc_commit`. That means, we never called _dhcp_client_accept().
For nettools, that doesn't really matter because calling ACCEPT during EXTENDED
is invalid anyway. However, for dhclient that is fatal because we wouldn't reply the
D-Bus request from nm-dhcp-helper. The helper times out after 60 seconds and dhclient
would misbehave.
We need to fix that by also calling _dhcp_client_accept() in the case when we don't
need to wait (the EXTENDED case).
However, previously _dhcp_client_accept() was rather peculiar and didn't like to be
called in an unexpected state. Relax that. Now, when calling accept in an unexpected
state, just do nothing and signal success. That frees the caller from the complexity
to understand when they must/must not call accept.
https://bugzilla.redhat.com/show_bug.cgi?id=2109285
Fixes: 156d84217c ('dhcp/dhclient: implement accept/decline (ACD) for dhclient plugin')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1308