When there are many VFs the default buffer size of 1 memory page is
not enough. Each VF can take up to ~120 bytes and so when the page
size is 4KiB at most ~34 VFs can be added.
Specify the buffer size when allocating the message.
(cherry picked from commit f7ac887502)
Add a len argument to nlmsg_alloc() and nlmsg_alloc_simple(). After
that, nlmsg_alloc_size() can be dropped. Also, rename
nlmsg_alloc_simple() to nlmsg_alloc_new().
(cherry picked from commit f12d96f0fa)
At startup, we remove from ovsdb any existing interface created by NM
and later an interface with the same name might be readded. This can
cause race conditions. Consider this series of events:
1. at startup NM removes the entry from ovsdb;
2. ovsdb reports success;
3. NM inserts an interface with the same name again;
4. ovs-vswitch monitors ovsdb changes, and gets events for removal and
insertion. Depending on how those events are split in different
batches, it might decide:
4a. to delete the link and add it back, or
4b. to keep the existing link because the delete and insertion
cancel out each other.
When NM sees the link staying in platform, it doesn't know if it's
because of 4b or because 4a will happen eventually.
To avoid this ambiguity, after ovsdb reports the successful deletion
NM should also wait that the link disappears from platform.
Unfortunately, this means that ovsdb gets a dependency to the platform
code.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1386
(cherry picked from commit 4f60fe293c)
In nm_dns_manager_set_ip_config() we try to avoid calling update_dns()
unless something changes, because updating DNS is expensive and can
trigger other actions such as a new hostname resolution.
When we add a new ip_data, even if the new element is equivalent to
the old one that was removed, we need to sort the list again.
Fixes: ce0a36d20f ('dns: better track l3cd changes')
https://bugzilla.redhat.com/show_bug.cgi?id=2098574
(cherry picked from commit 3cc7801779)
The dhclient plugin already supports sending a decline when IPv4 ACD
fails. Also implement support for IPv6 DAD.
See-also: 156d84217c ("dhcp/dhclient: implement accept/decline (ACD) for dhclient plugin")
(cherry picked from commit e4aefbc556)
Currently we accept the DHCPv6 just after addresses are configured on
kernel, without waiting DAD result. Instead, wait that DAD completes
and decline the lease if all addresses are detected as duplicate.
Note that when an address has non-infinite lifetime and fails DAD,
kernel removes it automatically. With iproute2 we see something like:
602: testX6 inet6 2620:🔢5678/128 scope global tentative dynamic noprefixroute
valid_lft 7500sec preferred_lft 7200sec
Deleted 602: testX6 inet6 2620:🔢5678/128 scope global dadfailed tentative dynamic noprefixroute
valid_lft 7500sec preferred_lft 7200sec
Since the address gets removed from the platform cache, at the moment
we don't have a way to check the flags of the removal
message. Therefore, we assume that any address that goes away in
tentative state was detected as duplicate.
https://bugzilla.redhat.com/show_bug.cgi?id=2096386
(cherry picked from commit a7eb77260a)
This partly reverts 1fe8166fc9 ('device: only deactivate when the master
we've enslaved to goes away').
If the controller fails while the port is not yet fully attached,
before this patch the following happened:
<info> [1664299566.1065] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299566.1073] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299566.1073] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (not enslaved) (configure)
<debug> [1664299566.1073] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<info> [1664299566.1080] device (eth1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Note that now eth1 has no controller, but it lingers in "ip-config" state indefinitely.
If we look at a case where the port is already attached we see:
<info> [1664299540.9661] device (bond0): state change: secondaries -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299540.9667] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299540.9667] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (enslaved) (configure)
<debug> [1664299540.9667] platform: (eth1) link: releasing 10 from master 'bond0' (80)
...
<info> [1664299540.9740] device (bond0): detached bond port eth1
...
<debug> [1664299540.9749] device[a9f10ea824bb1725] (eth1): Activation: connection 'eth1' master failed
...
<warn> [1664299540.9749] device (eth1): queue-state[secondaries, reason:none, id:520]: replace previously queued state change
...
<debug> [1664299540.9750] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: queue state change
<debug> [1664299540.9751] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664299541.0201] device[a9f10ea824bb1725] (eth1): enslaved to unknown device 0 (??)
...
<debug> [1664299541.0227] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: change state
<info> [1664299541.0228] device (eth1): state change: ip-check -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Fix that by not ignoring the nm_device_slave_notify_release() call. Now we get:
<info> [1664391684.9757] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<debug> [1664391684.9759] active-connection[69c2b12d61f5b171]: set state deactivated (was activating)
<debug> [1664391684.9760] active-connection[142bb8240f6a696d]: check-master-ready: already signalled (state activating, master 0x56116f1480a0 is in state deactivated)
...
<debug> [1664391684.9762] manager: ActivatingConnection now (none)
...
<warn> [1664391684.9763] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664391684.9763] device[142828814dec6e26] (bond0): master: release one slave 720791275fe8a68c/eth1 (not enslaved) (configure)
<debug> [1664391684.9763] device[720791275fe8a68c] (eth1): Activation: connection 'eth1' master failed
...
<debug> [1664391684.9764] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: queue state change
<debug> [1664391684.9765] device[720791275fe8a68c] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664391684.9797] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: change state
<info> [1664391684.9797] device (eth1): state change: config -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Commit 1fe8166fc9 ('device: only deactivate when the master we've
enslaved to goes away') added the "return", but it seems to also add it
in cases where we need to handle this. Restrict the return to cases if
we do "no-config".
https://bugzilla.redhat.com/show_bug.cgi?id=2130287
Fixes: 1fe8166fc9 ('device: only deactivate when the master we've enslaved to goes away')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1406
(cherry picked from commit 2be9c693d9)
Instead, hack gettext's Makefile.
gettext has an issue with parallel make. See [1] and [2].
Reproduce with:
git reset --hard &&
git clean -fdx &&
NOCONFIGURE=yes ./autogen.sh &&
./configure --enable-gtk-doc --enable-introspection &&
make -j distcheck V=1
We worked around this by setting "DIST_DEPENDS_ON_UPDATE_PO = yes",
however that (obviously) results in regenerating source files during
dist. "Source files" in the sense that the po files are commited to git
and get distributed in the release. Doing this is very ugly.
In particular it's ugly, because `make -C po update-po` is not reproducible
and the output depends on the current time (*had one job*).
Otherwise, we could just regenerate the files before doing a release.
This means, running "release.sh" script ends up with a dirty tree
afterwards. Also, the distributed po files are not the ones from the source
tree when we did the release. Also, since "release.sh rc1" does two distributions
(once for the rc1 and once for the next devel snapshot), the commit for the
second distribution will have a large diff for the po files.
This reverts commit 978d8eb699 ('po: make dist depend on update-po')
and hacks around the problem.
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1094#note_1435313
[2] https://lists.gnu.org/archive/html/bug-gettext/2022-06/msg00022.htmlhttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1405
(cherry picked from commit 7ee0da3eaf)
Previously, we used _nm_utils_ascii_str_to_bool(). That can accept any
kind of input (like "true"), so one might think that this is better to
use on user-input. However, NMSettingBond already validates the these
options are integers (either "0" or "1"). So a value like "true"
could never be here.
Use _nm_setting_bond_opt_value_as_intbool() because that asserts that
the option if of the expected type (integer).
(cherry picked from commit b1a72d0f21)
Bond option values are just strings, however, some of them get
validated to be numbers, etc.
We also have effectively boolean values, like "use-carrier". Internally,
this is not validates as a boolean (_nm_utils_ascii_str_to_bool()) but
instead is an integer of either "0" or "1".
Add a helper function_nm_setting_bond_opt_value_as_intbool() to access
and parse such values.
(cherry picked from commit 489a1b8f1e)
The bond setting does some minimal validation of the options.
At least for those number typed values, it validates that the
string can be interpreted as a number and is within a certain range.
Add nm_assert() checks to our opt_value_u$SIZE() functions, that the
requested option is validated to be in a range which is sufficiently
narrow to be converted to the requested type. If that were not the case,
we would need some special handling (or question whether the option should
be retrieved as this type).
(cherry picked from commit a19458e11d)
Sync/blocking methods are ugly. Their name should highlight this.
Also, we may have an async variant, so we will need the "good" name
for apply() and apply_finish().
(cherry picked from commit dc66fb7d04)
Blocking calls are ugly. Rename those to have a "_sync()" suffix.
Also, split from _fw_nft_set_shared() the part that constructs the
stdin for nft.
(cherry picked from commit 7362ad6266)
NMStrBuf can also contains NUL characters. We thus cannot use g_strndup(),
which uses strncpy() and truncates at the first NUL.
Fixes: 13d25f9d0b ('glib-aux: add support for starting with stack-allocated buffer in NMStrBuf')
(cherry picked from commit 520411623d)
It is allowed to have a connection with empty connection.slave-type
and a NMSettingBondPort; the property will be set automatically during
normalization if a master is set, otherwise the setting will be removed.
With this change, it becomes possible to remove a port from a bond
from nmcli, turning it into a non-slave connection. Before, this used
to fail with:
$ nmcli connection add type ethernet ifname test con-name test+ connection.master bond0 connection.slave-type bond
$ nmcli connection modify test+ connection.master '' connection.slave-type ''
Error: Failed to modify connection 'test+': connection.slave-type: A connection with a 'bond-port' setting must have the slave-type set to 'bond'
https://bugzilla.redhat.com/show_bug.cgi?id=2126262https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1382
Fixes: 9958510f28 ('bond: add support of queue_id of bond port')
(cherry picked from commit 23ce9cff99)
When disposing NMPolicy all the devices in the devices hash-table should
be unregistered and removed from the hash-table.
Fixes: 7e3d090acb ('policy: refactor tracking of registered devices')
(cherry picked from commit 5a87683b14)
NMPGlobalTracker allows to track objects for independent users/callers.
That is, callers that are not aware whether another caller tracks the
same/similar object. It thus groups all objects by their nmp_object_id_equal()
(as `TrackObjData` struct), while keeping a list of each individually tracked
object (as `TrackData` struct which honors the object and the user-tag parameter).
When the same caller (based on the user-tag) tracks the same object again, then
NMPGlobalTracker will only track it once and combine the objects. That is done by
also having a dictionary for the `TrackData` entries (`self->by_data`).
This latter dictionary lookup wrongly considered nmp_object_id_equal().
Instead, it needs to consider all minor differences of the objects, and
use nmp_object_equal().
For example, for NMPlatformMptcpAddress, only the "address" is part of
the ID. Other fields, like the MPTCP flags are not. Imagine a profile is
active with MPTCP endpoints configured with flags "subflow". During reapply,
the user can only update the MPTCP flags (e.g. to "signal"). When that happens,
the caller (NML3Cfg) would track a new NMPlatformMptcpAddress instance, that only
differs by MPTCP flags. In this case, we need to track the new address for the
differences that it has according to nmp_object_equal(), and not
nmp_object_id_equal().
Due to this bug, reapply might not work correctly. For other supported types (routing
rules and routes) this bug may have been harder to reproduce, because most attributes
of rules/routes are also part of the ID and because it's uncommon to reapply a minor
change to a rule/route.
https://bugzilla.redhat.com/show_bug.cgi?id=2120471
Fixes: b8398b9e79 ('platform: add NMPRulesManager for syncing routing rules')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1375
(cherry picked from commit d8aacba3b2)
For the ipoib connection, it is still considered as valid if the
profile does not set the device name. Also, the ifcfg reader should not
duplicate the checks that `nm_connection_verify()` performs (especially
not wrongly). Therefore, NM should skip validating the DEVICE when
reading the ifcfg file for the ipoib connection.
https://bugzilla.redhat.com/show_bug.cgi?id=2122703
(cherry picked from commit 4c32dd9d25)
When writing the p-key setting to the ifcfg file and reading the
setting back, the value has to be consistent. This is not limited to
p-key only, any setting value during the ifcfg write and read also has
to be consistent.
This was probably added in commit cb5606cf1c ('ifcfg-rh:
add support for Infiniband partitions') as this is also what
ifup-ib does ([1]). For NetworkManager profiles however, the
p-key is also valid without the high bit set, so the ifcfg-rh
reader must honor that.
[1] 0c9fb6ca7b/rdma.ifup-ib (L75)
(cherry picked from commit a4fe16a426)
We've been outright ignoring master-slave checks if the link ended up
without a master since commit 2e22880894 ('device: don't remove the
device from master if its link has no master').
This was done to deal with OpenVSwitch port-interface relationship,
where the interface's platform link lacked an actual master in platform
(what matters there is the OVSDB entry), but the fix was too wide.
Let's limit the special case to devices whose were not enslaved to
masters that lack a platform link, which pretty much happens for
OpenVSwitch only.
Morale: Write better commit messages of future you is going to be upset
Fixes: 2e22880894 ('device: don't remove the device from master if its link has no master')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1358
(cherry picked from commit a1de6810df)
It's wrong, and it breaks certain uses.
Fixes: 13d25f9d0b ('glib-aux: add support for starting with stack-allocated buffer in NMStrBuf')
(cherry picked from commit c5ec4ebd77)
Bond option netlink support requires primary property to be a ifindex
instead of the interface name. This is a workaround for supporting
specifying a primary that does not exist yet.
```
nmcli con add type bond ifname mybond0 bond.options "mode=active-backup,primary=veth1"
Connection 'bond-mybond0' (38100ef9-11e2-4003-aff9-cb2d152ce34f) successfully added.
nmcli con add type ethernet ifname veth1 master mybond0
cat /sys/class/net/mybond0/bonding/primary
veth1
```
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1362
Fixes: e064eb9d13 ('bond: use netlink to set bond options')
(cherry picked from commit 4fd90fb6cc)