Currently we accept the DHCPv6 just after addresses are configured on
kernel, without waiting DAD result. Instead, wait that DAD completes
and decline the lease if all addresses are detected as duplicate.
Note that when an address has non-infinite lifetime and fails DAD,
kernel removes it automatically. With iproute2 we see something like:
602: testX6 inet6 2620:🔢5678/128 scope global tentative dynamic noprefixroute
valid_lft 7500sec preferred_lft 7200sec
Deleted 602: testX6 inet6 2620:🔢5678/128 scope global dadfailed tentative dynamic noprefixroute
valid_lft 7500sec preferred_lft 7200sec
Since the address gets removed from the platform cache, at the moment
we don't have a way to check the flags of the removal
message. Therefore, we assume that any address that goes away in
tentative state was detected as duplicate.
https://bugzilla.redhat.com/show_bug.cgi?id=2096386
(cherry picked from commit a7eb77260a)
This partly reverts 1fe8166fc9 ('device: only deactivate when the master
we've enslaved to goes away').
If the controller fails while the port is not yet fully attached,
before this patch the following happened:
<info> [1664299566.1065] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299566.1073] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299566.1073] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (not enslaved) (configure)
<debug> [1664299566.1073] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<info> [1664299566.1080] device (eth1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Note that now eth1 has no controller, but it lingers in "ip-config" state indefinitely.
If we look at a case where the port is already attached we see:
<info> [1664299540.9661] device (bond0): state change: secondaries -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<warn> [1664299540.9667] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664299540.9667] device[6b76ac7314eb0b53] (bond0): master: release one slave a9f10ea824bb1725/eth1 (enslaved) (configure)
<debug> [1664299540.9667] platform: (eth1) link: releasing 10 from master 'bond0' (80)
...
<info> [1664299540.9740] device (bond0): detached bond port eth1
...
<debug> [1664299540.9749] device[a9f10ea824bb1725] (eth1): Activation: connection 'eth1' master failed
...
<warn> [1664299540.9749] device (eth1): queue-state[secondaries, reason:none, id:520]: replace previously queued state change
...
<debug> [1664299540.9750] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: queue state change
<debug> [1664299540.9751] device[a9f10ea824bb1725] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664299541.0201] device[a9f10ea824bb1725] (eth1): enslaved to unknown device 0 (??)
...
<debug> [1664299541.0227] device[a9f10ea824bb1725] (eth1): queue-state[deactivating, reason:dependency-failed, id:533]: change state
<info> [1664299541.0228] device (eth1): state change: ip-check -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Fix that by not ignoring the nm_device_slave_notify_release() call. Now we get:
<info> [1664391684.9757] device (bond0): state change: ip-config -> failed (reason 'config-failed', sys-iface-state: 'managed')
...
<debug> [1664391684.9759] active-connection[69c2b12d61f5b171]: set state deactivated (was activating)
<debug> [1664391684.9760] active-connection[142bb8240f6a696d]: check-master-ready: already signalled (state activating, master 0x56116f1480a0 is in state deactivated)
...
<debug> [1664391684.9762] manager: ActivatingConnection now (none)
...
<warn> [1664391684.9763] device (bond0): Activation: failed for connection 'bond0'
<trace> [1664391684.9763] device[142828814dec6e26] (bond0): master: release one slave 720791275fe8a68c/eth1 (not enslaved) (configure)
<debug> [1664391684.9763] device[720791275fe8a68c] (eth1): Activation: connection 'eth1' master failed
...
<debug> [1664391684.9764] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: queue state change
<debug> [1664391684.9765] device[720791275fe8a68c] (eth1): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings,!user-conf=0x0/0x179/managed], forget [is-slave=0x800], reason removed)
...
<debug> [1664391684.9797] device[720791275fe8a68c] (eth1): queue-state[deactivating, reason:dependency-failed, id:3047]: change state
<info> [1664391684.9797] device (eth1): state change: config -> deactivating (reason 'dependency-failed', sys-iface-state: 'managed')
Commit 1fe8166fc9 ('device: only deactivate when the master we've
enslaved to goes away') added the "return", but it seems to also add it
in cases where we need to handle this. Restrict the return to cases if
we do "no-config".
https://bugzilla.redhat.com/show_bug.cgi?id=2130287
Fixes: 1fe8166fc9 ('device: only deactivate when the master we've enslaved to goes away')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1406
(cherry picked from commit 2be9c693d9)
Instead, hack gettext's Makefile.
gettext has an issue with parallel make. See [1] and [2].
Reproduce with:
git reset --hard &&
git clean -fdx &&
NOCONFIGURE=yes ./autogen.sh &&
./configure --enable-gtk-doc --enable-introspection &&
make -j distcheck V=1
We worked around this by setting "DIST_DEPENDS_ON_UPDATE_PO = yes",
however that (obviously) results in regenerating source files during
dist. "Source files" in the sense that the po files are commited to git
and get distributed in the release. Doing this is very ugly.
In particular it's ugly, because `make -C po update-po` is not reproducible
and the output depends on the current time (*had one job*).
Otherwise, we could just regenerate the files before doing a release.
This means, running "release.sh" script ends up with a dirty tree
afterwards. Also, the distributed po files are not the ones from the source
tree when we did the release. Also, since "release.sh rc1" does two distributions
(once for the rc1 and once for the next devel snapshot), the commit for the
second distribution will have a large diff for the po files.
This reverts commit 978d8eb699 ('po: make dist depend on update-po')
and hacks around the problem.
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1094#note_1435313
[2] https://lists.gnu.org/archive/html/bug-gettext/2022-06/msg00022.htmlhttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1405
(cherry picked from commit 7ee0da3eaf)
Previously, we used _nm_utils_ascii_str_to_bool(). That can accept any
kind of input (like "true"), so one might think that this is better to
use on user-input. However, NMSettingBond already validates the these
options are integers (either "0" or "1"). So a value like "true"
could never be here.
Use _nm_setting_bond_opt_value_as_intbool() because that asserts that
the option if of the expected type (integer).
(cherry picked from commit b1a72d0f21)
Bond option values are just strings, however, some of them get
validated to be numbers, etc.
We also have effectively boolean values, like "use-carrier". Internally,
this is not validates as a boolean (_nm_utils_ascii_str_to_bool()) but
instead is an integer of either "0" or "1".
Add a helper function_nm_setting_bond_opt_value_as_intbool() to access
and parse such values.
(cherry picked from commit 489a1b8f1e)
The bond setting does some minimal validation of the options.
At least for those number typed values, it validates that the
string can be interpreted as a number and is within a certain range.
Add nm_assert() checks to our opt_value_u$SIZE() functions, that the
requested option is validated to be in a range which is sufficiently
narrow to be converted to the requested type. If that were not the case,
we would need some special handling (or question whether the option should
be retrieved as this type).
(cherry picked from commit a19458e11d)
Sync/blocking methods are ugly. Their name should highlight this.
Also, we may have an async variant, so we will need the "good" name
for apply() and apply_finish().
(cherry picked from commit dc66fb7d04)
Blocking calls are ugly. Rename those to have a "_sync()" suffix.
Also, split from _fw_nft_set_shared() the part that constructs the
stdin for nft.
(cherry picked from commit 7362ad6266)
NMStrBuf can also contains NUL characters. We thus cannot use g_strndup(),
which uses strncpy() and truncates at the first NUL.
Fixes: 13d25f9d0b ('glib-aux: add support for starting with stack-allocated buffer in NMStrBuf')
(cherry picked from commit 520411623d)
It is allowed to have a connection with empty connection.slave-type
and a NMSettingBondPort; the property will be set automatically during
normalization if a master is set, otherwise the setting will be removed.
With this change, it becomes possible to remove a port from a bond
from nmcli, turning it into a non-slave connection. Before, this used
to fail with:
$ nmcli connection add type ethernet ifname test con-name test+ connection.master bond0 connection.slave-type bond
$ nmcli connection modify test+ connection.master '' connection.slave-type ''
Error: Failed to modify connection 'test+': connection.slave-type: A connection with a 'bond-port' setting must have the slave-type set to 'bond'
https://bugzilla.redhat.com/show_bug.cgi?id=2126262https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1382
Fixes: 9958510f28 ('bond: add support of queue_id of bond port')
(cherry picked from commit 23ce9cff99)
When disposing NMPolicy all the devices in the devices hash-table should
be unregistered and removed from the hash-table.
Fixes: 7e3d090acb ('policy: refactor tracking of registered devices')
(cherry picked from commit 5a87683b14)
NMPGlobalTracker allows to track objects for independent users/callers.
That is, callers that are not aware whether another caller tracks the
same/similar object. It thus groups all objects by their nmp_object_id_equal()
(as `TrackObjData` struct), while keeping a list of each individually tracked
object (as `TrackData` struct which honors the object and the user-tag parameter).
When the same caller (based on the user-tag) tracks the same object again, then
NMPGlobalTracker will only track it once and combine the objects. That is done by
also having a dictionary for the `TrackData` entries (`self->by_data`).
This latter dictionary lookup wrongly considered nmp_object_id_equal().
Instead, it needs to consider all minor differences of the objects, and
use nmp_object_equal().
For example, for NMPlatformMptcpAddress, only the "address" is part of
the ID. Other fields, like the MPTCP flags are not. Imagine a profile is
active with MPTCP endpoints configured with flags "subflow". During reapply,
the user can only update the MPTCP flags (e.g. to "signal"). When that happens,
the caller (NML3Cfg) would track a new NMPlatformMptcpAddress instance, that only
differs by MPTCP flags. In this case, we need to track the new address for the
differences that it has according to nmp_object_equal(), and not
nmp_object_id_equal().
Due to this bug, reapply might not work correctly. For other supported types (routing
rules and routes) this bug may have been harder to reproduce, because most attributes
of rules/routes are also part of the ID and because it's uncommon to reapply a minor
change to a rule/route.
https://bugzilla.redhat.com/show_bug.cgi?id=2120471
Fixes: b8398b9e79 ('platform: add NMPRulesManager for syncing routing rules')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1375
(cherry picked from commit d8aacba3b2)
For the ipoib connection, it is still considered as valid if the
profile does not set the device name. Also, the ifcfg reader should not
duplicate the checks that `nm_connection_verify()` performs (especially
not wrongly). Therefore, NM should skip validating the DEVICE when
reading the ifcfg file for the ipoib connection.
https://bugzilla.redhat.com/show_bug.cgi?id=2122703
(cherry picked from commit 4c32dd9d25)
When writing the p-key setting to the ifcfg file and reading the
setting back, the value has to be consistent. This is not limited to
p-key only, any setting value during the ifcfg write and read also has
to be consistent.
This was probably added in commit cb5606cf1c ('ifcfg-rh:
add support for Infiniband partitions') as this is also what
ifup-ib does ([1]). For NetworkManager profiles however, the
p-key is also valid without the high bit set, so the ifcfg-rh
reader must honor that.
[1] 0c9fb6ca7b/rdma.ifup-ib (L75)
(cherry picked from commit a4fe16a426)
We've been outright ignoring master-slave checks if the link ended up
without a master since commit 2e22880894 ('device: don't remove the
device from master if its link has no master').
This was done to deal with OpenVSwitch port-interface relationship,
where the interface's platform link lacked an actual master in platform
(what matters there is the OVSDB entry), but the fix was too wide.
Let's limit the special case to devices whose were not enslaved to
masters that lack a platform link, which pretty much happens for
OpenVSwitch only.
Morale: Write better commit messages of future you is going to be upset
Fixes: 2e22880894 ('device: don't remove the device from master if its link has no master')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1358
(cherry picked from commit a1de6810df)
It's wrong, and it breaks certain uses.
Fixes: 13d25f9d0b ('glib-aux: add support for starting with stack-allocated buffer in NMStrBuf')
(cherry picked from commit c5ec4ebd77)
Bond option netlink support requires primary property to be a ifindex
instead of the interface name. This is a workaround for supporting
specifying a primary that does not exist yet.
```
nmcli con add type bond ifname mybond0 bond.options "mode=active-backup,primary=veth1"
Connection 'bond-mybond0' (38100ef9-11e2-4003-aff9-cb2d152ce34f) successfully added.
nmcli con add type ethernet ifname veth1 master mybond0
cat /sys/class/net/mybond0/bonding/primary
veth1
```
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1362
Fixes: e064eb9d13 ('bond: use netlink to set bond options')
(cherry picked from commit 4fd90fb6cc)
On netlink API, the attribute is indeed u32. However, this is an ifindex
which in most other kernel APIs and in NetworkManager code is a signed
integer. Note that of course kernel would only ever assign numbers that
are valid ifindexes, thus in the suitable range.
(cherry picked from commit c28dd78c05)
The IFLA_BOND_ACTIVE_SLAVE and IFLA_BOND_PRIMARY are not the same.
If the primary is not set, then that's it. Don't fallback.
Only NetworkManager API deprecated "active-slave" and uses it as
alias for "primary". That does not mean, kernel/netlink does that.
(cherry picked from commit 6d95c406db)
With C literals as prefix, this macro is more efficient as it
just expands to a strncmp(). Also, it accepts NULL string.
(cherry picked from commit f1c1d93bc5)
_nm_config_data_log_sort() is used for sorting the groups in the
keyfile during nm_config_data_log(). The idea is to present the keyfile
in a defined, but useful order.
However, it is not a total order. That is, it will return c=0 (equal) for
certain groups, if the pre-existing order in the GKeyFile should be
honored. For example, we want to sort all [device*] sections close to
each other, but we want to preserve their relative order. In that case,
the function would return 0 although the group names differed.
Also, _nm_config_data_log_sort() does not expect to receive duplicate names.
It would return c!=0 for comparing "device" and "device".
This means, _nm_config_data_log_sort() is fine for sorting the input as
we have it. However, it cannot be used to binary search the groups. This
caused that some sections might be duplicated in the `NetworkManager
--print-config` output. Otherwise, it had no bad effects.
Fixes(no-backport): 78d34d7c2e ('config: fix printing default values for missing sections')
(cherry picked from commit f6345180b1)
When serializing setting properties to GVariant/D-Bus, we usually
omit values that are set to the default. That is done by libnm(-core),
so it happens both on the daemon and client side. That might be
useful to avoid a large number of properties on D-Bus.
Before changing the default value for "ipv6.addr-gen-mode" ([1]), we
would not serialize the previous default value ("stable-privacy").
Now we would serialize the new default value ("default). This change
causes problems.
Scenario 1: have a profile in the daemon with "ipv6.addr-gen-mode=stable-privacy",
have an older daemon version before [1] and a newer client after [1]. Result:
The daemon exposes the profile on D-Bus without the addr-gen-mode
field (because it's the default). To the client, that is interpreted
differently, as "ipv6.addr-gen-mode=default". This is already somewhat
a problem.
More severe is when modifying the profile, the client would now serialize
the value "default" on D-Bus, which the older daemon rejects as invalid.
That means, you could not modify the profile, unless also resetting
addr-gen-mode to "stable-privacy" or "eui64".
You can imagine other scenarios where either the daemon or the client is
before/after change [1] and the addr-gen-mode is set to either "default"
or "stable-privacy". Depending on what scenario you look, that can either be
good or bad.
Scenario 1 is pretty bad, because it means `dnf upgrade NetworkManager
&& nmcli connection modify ...` will fail (if the daemon was not
restated). So try to fix Scenario 1, by also not serializing the new
default value on D-Bus. Of course, some of the scenarios will get
different problems, by exacerbating one side misunderstanding the actually
set value and interpreting a missing value on D-Bus wrongly. But those
problems are likely less severe.
In case both client and daemon are older/newer than [1], it doesn't
matter either way. The problem happens with different version and is
caused by a change of the default value.
[1] e6a33c04ebhttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1372
(cherry picked from commit 70060d570b)
The @dracut_NM_vlan_over_team_no_boot sometimes fails, among other
things, because it fails to assume an indicated connection after a
restart.
That seems to happen because after the decision to activate the
indicated connection, the device does not move from DISCONNECTED state
quickly enough. Another assumption recheck runs in between and decides
to generate a connection, because the assume state was already reset
in between.
First start, creates and activates b3a61b68-f744-4a4c-a513-61399c154a67
on vlan0017:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
...
settings: update[b3a61b68-f744-4a4c-a513-61399c154a67]: adding connection "vlan0017"
(45113870df0a4cfb/keyfile)
Second start:
NetworkManager (version 1.41.1-30921.55767cf5.el9) is starting...
(after a restart, asserts:10000, boot:caf7301a-19cd-498b-b5ba-5d36ee939ffe)
Assumption attempt successfully picks the right connection and thus
proceeds to reset the assume state:
manager: (vlan0017): assume: will attempt to assume matching connection 'vlan0017'
(b3a61b68-f744-4a4c-a513-61399c154a67) (indicated)
device[c7c5101cf0b73f5f] (vlan0017): assume-state: set guess-assume=0, connection=(null)
Everything great so far, activation of the right connection is enqueued
and the device moves away from unavailable state. However, the
activation can't proceed immediately:
device (vlan0017): state change: unmanaged -> unavailable
(reason 'connection-assumed', sys-iface-state: 'assume')
device (vlan0017): state change: unavailable -> disconnected
(reason 'connection-assumed', sys-iface-state: 'assume')
active-connection[0x55ba1162f1c0]: set device "vlan0017" [0x55ba1163c4f0]
device[c7c5101cf0b73f5f] (vlan0017): queue activation request waiting for carrier
Now another assumption attempt is done. The original assume state is
gone, so a connection is generated:
platform-linux: UDEV event: action 'add' subsys 'net' device 'vlan0017' (6); seqnum=1959
device[c7c5101cf0b73f5f] (vlan0017): queued link change for ifindex 6
manager: (vlan0017): assume: generated connection 'vlan0017' (57627119-8c20-4f9e-bf4d-4fc427b4a6a9)
keyfile: commit: 57627119-8c20-4f9e-bf4d-4fc427b4a6a9 (vlan0017) added as
"/run/NetworkManager/system-connections/vlan0017-57627119-8c20-4f9e-bf4d-4fc427b4a6a9.nmconnection"
(nm-generated,volatile,external)
I think this shouldn't have happened. We've picked the correct
connection already and it's enqueued for activation!
Change the check in nm_device_emit_recheck_assume() to also consider
any queued activation.
Fixes-test: @dracut_NM_vlan_over_team_no_boot
Co-authored-by: Lubomir Rintel <lkundrak@v3.sk>
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1351
(cherry picked from commit 9eb8cbca76)