CentOS 7's headers don't yet contains IFLA_BOND_PEER_NOTIF_DELAY.
Define it ourselves.
Fixes: f900f7bc2c ('platform: add netlink support for bond link')
Leave a hint about core-dumps.
Also, now we have `contrib/fedora/rpm/configure-for-system.sh` script,
which can configure the build in a way similar to what we get
when doing an RPM build.
That means, inside "contrib/scripts/nm-in-container.sh" we
can just type `make install`, and it will replace the pre-installed
NetworkManager.
The main advantage is that it becomes convenient to run NetworkManager
as a systemd service. Previously, the suggested was to to install
NetworkManager inside another prefix, and run it in the terminal.
Running NetworkManager as systemd service is also necessary for NM-ci,
which restarts the NetworkManager service, and you couldn't run a test,
if you just started NetworkManager in a terminal.
Previously, you had to build a complete RPM, which takes a lot of time.
Yes, it's a large dependency. But on your outer host you
probably configured NetworkManager with QT enabled (for the
example scripts). We want to compile the same work tree inside
the container. So install qt-devel.
This will use the same option as when we do an RPM build.
The purpose is that you could type `make install` with such
a build, and it would replace the files that you'd get by installing
the NetworkManager RPMs.
Of course, you would not want to do that on your work station, but it
will be useful in a container, where we don't mind messing up the
installation.
autotools accepts both --with-$OPTION/--without-$OPTION and
--with-$OPTION=yes|no. Consistently use the latter.
The advantage is that whether it's enabled becomes an argument, so in a
script you could do
"--with-$OPTION=$VALUE"
Same for enable/disable option.
_host_id_read() is the only place where we really care to have good
random numbers, because that is the secret key that we persist to disk.
Previously, we tried only nm_random_get_bytes_full(), which is a best
effort to get strong random numbers. If it fails to generate those,
it would simply remember the generated key in memory and proceed, but not
persist it to disk.
nm_random_get_bytes_full() does not block waiting for good numbers.
Change that. Now, first call nm_random_get_crypto_bytes(), which would
block and try hard to get good random numbers. Only if that fails,
fallback to nm_random_get_bytes_full() as before. The difference is of
course only in early boot, when we might not yet have entropy. In that
case, I think it's better for NetworkManager to block.
Heavily inspired by systemd ([1]).
We now also have nm_random_get_bytes{,_full}() and
nm_random_get_crypto_bytes(), like systemd's random_bytes()
and crypto_random_bytes(), respectively.
Differences:
- instead of systemd's random_bytes(), our nm_random_get_bytes_full()
also estimates whether the output is of high quality. The caller
may find that interesting. Due to that, we will first try to call
getrandom(GRND_NONBLOCK) before getrandom(GRND_INSECURE). That is
reversed from systemd's random_bytes(), because we want to find
out whether we can get good random numbers. In most cases, kernel
should have entropy already, and it makes no difference.
Otherwise, heavily rework the code. It should be easy to understand
and correct.
There is also a major bugfix here. Previously, if getrandom() failed
with ENOSYS and we fell back to /dev/urandom, we would assume that we
have high quality random numbers. That assumption is not warranted.
Now instead poll on /dev/random to find out.
[1] a268e7f402/src/basic/random-util.c (L81)
sysfs is deprecated and kernel people will not add new bond options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
Add parentheses around macro arguments.
Yes, it's not technically necessary when using macro arguments are
surrounded by commas. Still do it, for consistency and for not having
special exceptions to the rule.
The devices generally need to be IFF_UP and wait a little before the
carrier detection is reliable. Some devices, actually need to wait
more than a little -- r8169 needs up to 5 seconds.
For this reason, we delay startup complete while the carrier is down
after we bring the device up. We do this so that we don't reject
activations due to carrier down until we're sure it's really down.
This works well as long as it's us who brought the device up.
If we're restarting the daemon, the device is going to be already up
when we start up the daemon for the second time. There's, however, a
slim chance that the device was brought down and up very shortly before
the restart and therefore the carrier reporting is still not reliable.
As a matter of fact, we bring the devices down and back up on some
occassions, such as when enslaving to a team device.
Therefore, the following events in quick succession cause trouble:
# nmcli con up team-slave-eth0
[20099.205355] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.365641] nm-team: Port device eth0 added
[20099.370728] r8169 0000:03:00.0 eth0: Link is Down
[20099.436631] nm-team: Port device eth0 removed
[20099.463422] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.628505] r8169 0000:03:00.0 eth0: Link is Down
[20099.669425] Generic FE-GE Realtek PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
[20099.833457] r8169 0000:03:00.0 eth0: Link is Down
[20099.838471] nm-team: Port device eth0 added
The device has been brought down, enslaved and brought up.
"Link is Down" indicates carrier not being detected.
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/7)
# systemctl restart NetworkManager
Now NM sees the device being up, but carrier down.
# nmcli con up testeth0
Error: Connection activation failed: No suitable device found for this connection (...).
Activation failed, because eth0 carrier still appears down.
# [20102.943464] r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Now it's up, but the party is already over. Shiet.
Let's wait whenever the device reaches unavailable state, whether we
bring it up at that point or not.
Fixes-test: @restart_L2_only_lacp
https://bugzilla.redhat.com/show_bug.cgi?id=2092361https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1316
The test bcond is used to make test suite failure terminate the %check
phase. That should generally be done for distro builds.
What the distro builds shouldn't do is terminate the build on compiler
warnings, because they're fairly likely to appear on toolchain updates
without indicating actual bugs.
However, currently that's precisely what do we do now.
Let's use -Werror on debug builds instead. They are intentionally made
more prone to fail (e.g. trip more runtime assertions) because failures
can be more rapidly addressed and don't affect distro builds.
nm_utils_random_bytes() is supposed to give us good random number from
kernel. It guarantees to always provide some bytes, and it has a
boolean return value that estimates whether the bytes are good
randomness. In practice, most callers ignore that return value, because
what would they do about it anyway?
Of course, we want to primarily use getrandom() (or "/dev/urandom"). But
if we fail to get random bytes from them, we have a fallback path that
tries to generate "random" bytes.
It does so, by initializing a global seed from various sources, and keep
sha256 hashing the buffer in a loop. That's certainly not efficient nor
elegant, but we already are in a fallback path.
Still, we can do slightly better. Instead of just using the global state
and keep updating it (entirely deterministically), every time also mix in
the results from getrandom() and a current timestamp. The idea is that if you
have a virtual machine that gets cloned, we don't want that our global
state keeps giving the same random numbers. In particular, because
getrandom() might handle that case, even if it doesn't have good
entropy.
The reproducer for another problem tripped an assertion failure:
$ nmcli con del act-conn
Connection 'act-conn' (...) successfully deleted.
$ nmcli con down another-conn
(process:94552): nm-CRITICAL **: 17:07:21.170: ((src/libnm-client-impl/nm-remote-connection.c:593)): assertion '<dropped>' failed
Connection 'another-conn' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4)
$
What happens is that the second invocation, when resolving the
connection name into a NMRemoteConnection object, assumes an active
connection has a settings connection.
This assumption is likely to be wrong immediately after deleting a
connection was active, before giving the active connection enough time
to fully deactivate.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1317
This doesn't use NM_UNIQ_T() to create a truly unique name.
Instead, it avoids "_arr" as local variable name, which other
macros also use. By choosing a different name, we can nest such
macro calls, without getting a "-Wshadow" warning.
If you do:
nm_assert_addr_family(NMP_OBJECT_CAST_MPTCP_ADDR(obj)->addr_family));
then there are two nested NM_IN_SET() macro invocations. Once,
NMP_OBJECT_CAST_MPTCP_ADDR() checks that the object type is one of
a few selected (using NM_IN_SET()). Then, that is passed to
nm_assert_addr_family(), which checks NM_IN_SET(addr_family, AF_INET,
AF_INET6).
In general, it's easy to end up in a situation like this.
And it mostly works just fine. The only problem was that NM_IN_SET()
uses an internal, local variable "_x". The compiler will emit a very
verbose failure about the shadowed variable:
./src/libnm-std-aux/nm-std-aux.h:802:14: error: declaration of '_x' shadows a previous local [-Werror=shadow]
802 | type _x = (x); \
NM_UNIQ_T() exists for this purpose. Use it. NM_IN_SET() is
popular enough to warrant a special treatment to avoid this pitfall.
The strength of CList is of course to use it as a stack of queue,
and only append/remove from the front/tail.
However, since this is an intrusive list, it can also be useful to
just use it to track elements, and -- when necessary -- sort them
via c_list_sort().
If we have a sorted list, we might want to insert a new element
honoring the sort order. This function achieves that.
Don't mix <net/ethernet.h> and <linux/if_ether.h>.
Fixes the following build error with musl libc:
In file included from /usr/include/net/ethernet.h:10,
from ../src/libnm-platform/nm-linux-platform.c:17:
/usr/include/netinet/if_ether.h:115:8: error: redefinition of 'struct ethhdr'
115 | struct ethhdr {
| ^~~~~~
In file included from ../src/linux-headers/ethtool.h:19,
from ../src/libnm-std-aux/nm-linux-compat.h:22,
from ../src/libnm-platform/nm-linux-platform.c:10:
/usr/include/linux/if_ether.h:169:8: note: originally defined here
169 | struct ethhdr {
| ^~~~~~
Fixes: dc98ab807c ('platform: include "linux-headers" via "libnm-std-aux/nm-linux-compat.h"')