When managing a device after it is realized, we previously always set
the NOW_MANAGED reason, that makes the device fully-managed.
This works based on the assumption that initially an external device
has unmanaged flag EXTERNAL_DOWN set, and therefore the device stays
unmanaged during realization.
It is possible that an external device appears already with addresses
(or attached to a controller); we need to set reason
CONNECTION_ASSUMED if it's an external device, so that we don't set
sys-iface-state=managed.
Reproducer:
ip link add br1 type bridge
killall -STOP NetworkManager
ip link add dummy1 type dummy
ip link set dummy1 master br1
ip link set dummy1 up
sleep .5
killall -CONT NetworkManager
After this, dummy1 is fully managed by NM while it shouldn't.
https://bugzilla.redhat.com/show_bug.cgi?id=2149012
If there are ports that refer the controllers by a device name, and
multiple autoconnectable controller devices of that name, the
situation gets messy. In particular, the autoconnect logic can start
activating a device with a higher autoconnect priority, but then a port
can override it by bringing up another controller of possibly lower
autoconnect priority.
Let's
1.) prefer controller connections with higher autoconnect priority
and
2.) prefer connections that are already active so that we don't
disrupt existing activation.
https://bugzilla.redhat.com/show_bug.cgi?id=2121451
Historically, initscripts' ifup-ib would set the highest bit of
PKEY_ID=. That changed and needs to be restored.
Note that it probably makes little sense to ever configure p-keys
without the highest bit set, because that flag indicates full membership
and kernel will automatically add it. At least, kernel will add the flag
for the p-key, but not for the automatically chosen interface name.
Meaning, writing 0x00f0 to create_child sysctl, results in an interface
"$parent.00f0", but `ip -d link` shows pkey 0x80f0.
As NetworkManager otherwise supports p-keys without the highest bit set,
and since that high bit is honored for the interface name, we cannot
just always add the high bit. NetworkManager always assuming the highest
bit is set, would change the interface names of existing configuration.
With this revert, when a user configures a small p-key and the profile
is stored in ifcfg-rh format, the settings backend will automatically
mangle the profile and set 0x8000. That is different from when the
profile is stored in keyfile format. Since using small p-keys is
probably an odd case, we don't try to workaround that any other way
(like that ifcfg format could represent the orignal value of the profile
and not doing such mangling, or to add the high bit throughout
NetworkManager to the p-key). It's an inconsistency, but given the
existing behaviors it seems best to stick (revert) to it.
This reverts commit a4fe16a426.
Affected versions were 1.42.2+ and 1.40.2+.
See-also: 05333c3602/f/rdma.ifup-ib (_75)https://bugzilla.redhat.com/show_bug.cgi?id=2209164
When a fixed address is assigned by the P2P group owner, then the code
would set the IPv4 configuration method to DISABLED internally. However,
this causes issues, because it means that IPv4 is considered to not have
come up internally which can cause the connection to later time out even
though it was configured properly.
As such, map this method to MANUAL in this case. The AUTO mapping
becomes then:
* MANUAL: If the remote part is the GO and assigned an IP address
* DHCP: If the remote part is the GO and did not assign an address
* SHARED: If we are the GO
This fixes an issue where the connection established by GNOME Network
Displays would fail once IPv6 configuration also times out.
See-also: https://gitlab.gnome.org/GNOME/gnome-network-displays/-/issues/279https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1636
This is the version shipped in Fedora 38. As Fedora 38 is now out, the
core developers switch to it. Our gitlab-ci will also use that as base
image for the check-{patch.tree} tests and to generate the pages. There
is a need that everybody agrees on which clang-format version to use,
and that version should be the one of the currently used Fedora release.
Also update the used Fedora image in "contrib/scripts/nm-code-format-container.sh"
script.
The gitlab-ci still needs update in the following commit. This change
in isolation will break the "check-tree" test.
In constructed(), NMDevice starts watching the D-Bus name owner or
monitoring the unix socket, and so it is always aware if teamd is
running. When it is, NMDevice connects to it and initializes
priv->tdc.
It is not useful to try to connect to teamd in update_connection()
because warnings will be generated by NM and by libteam if teamd is
not running. As explained above the connection is always initialized
when teamd is available, and so we can just check priv->tdc.
Fixes: ab586236e3 ('core: implement update_connection() for Team')
https://bugzilla.redhat.com/show_bug.cgi?id=2182029https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1631
Currently if the IPv6 link-local address is removed after it passed
DAD, NetworkManager tries to generate a new link-local address. If
this fails, which is always the case for EUI64, ipv6ll is considered
as failed and the connection can go down (depending on may-fail).
This is particularly bad for virtual interfaces because if somebody
removes the link-local address, the activation can fail and destroy
the interface, breaking all services that require it. Also, it's a
change in behavior introduced in 1.36.0.
It seems that a better approach here is to re-add the address that was
removed externally.
Fixes: aa070fb821 ('core: add NML3IPv6LL helper')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1622
When managing the interface after wake/reenable, the reason determines
whether the device will be sys-iface-state=managed or external.
Commit 5a9a7623c5 ('core: set STATE_REASON_CONNECTION_ASSUMED when
waking up') changed the reason from 'now-managed' to
'connection-assumed'; the effect was that devices that were fully
managed before sleeping become external after a wake up. For example:
$ nmcli connection add type ethernet ifname enp1s0
Connection 'ethernet-enp1s0' (47fcd81e-bf00-4c02-b25b-354894f5657e) successfully added.
$ nmcli device | grep enp1s0
enp1s0 ethernet connected ethernet-enp1s0
$ nmcli networking off
$ nmcli device | grep enp1s0
enp1s0 ethernet unmanaged --
$ nmcli networking on
$ nmcli device | grep enp1s0
enp1s0 ethernet unavailable --
Set the correct reason during wake up so that the previous state is
restored.
Fixes: 5a9a7623c5 ('core: set STATE_REASON_CONNECTION_ASSUMED when waking up')
https://bugzilla.redhat.com/show_bug.cgi?id=2193422
According to systemd, IPv6 forwarding is special anyway, and they only
enable forwarding for "net.ipv6.conf.all.forwarding" ([1]).
Since commit 46e63e03af ('device: announce the managed IPv6
configuration with ipv6.method=shared') we support "ipv6.method=shared"
and enable forwarding for IPv6, on the interface. Whether that makes
sense is questionable, given [1] and the claim that setting it
per-interface is not useful.
Anyway, since that change we always reset the "forwarding" sysctl to
zero, when we don't enable shared mode. That is not right, because the
user didn't explicitly ask for that (and there is no configuration
option like systemd-networkd's "IPForward=" setting to control that).
What we instead should do, not touch/reset the sysctl, unless we really
want to.
No longer set "forwarding" to zero by default. And only restore the
previous value (_dev_sysctl_save_ip6_properties()) if we actually
changed the value to "1".
[1] b8fba0cded/src/network/networkd-sysctl.c (L79)https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/923
Fixes: 46e63e03af ('device: announce the managed IPv6 configuration with ipv6.method=shared')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1616
For efficiently and conveniently lookup an NMSetting from the
NMConnection inside the NMSettingsConnection.
Note that this uses the NMMetaSettingType as lookup key. That is a novel
approach, compared to lookup by name (nm_connection_get_setting_by_name())
or GType (nm_connection_get_setting()).
Using the NMMetaSettingType enum is however faster, because it does not
require resolving the name/GType first. This is perfecly fine internal API,
we should use it.
It seems better for readability, because reacting based on the state-reason
is ugly already. This way, we access nm_device_state_reason_check(reason) only
at once place. With the if, it's not immediately obvious that both if/else
parts only switch on the reason too.
Cleanup logging to always print a "block-autoconnect:" prefix to related
lines. Also, make sure that everywhere where the state changes, a line
gets logged. Also, for devconf data print both the interface and the
profile.
We only have a few blocked reasons. Some of them can be only set on the
devcon data, and some only on the settings connection. Assert that we
don't mix that up.
Add per port priority support for bond active port re-selection during
failover. A higher number means a higher priority in selection. The
primary port still has the highest priority. This option is only
compatible with active-backup, balance-tlb and balance-alb modes.
sysfs is deprecated and kernel will not add new bond port options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
The previous logging happened, when the value did not change. Log
instead, when the value changes.
Fixes: 86bb09c93b ('dns: generate correct search domain for hostnames on non-public TLD')
dns-manager uses the Mozilla Public Suffix List to determine an
appropriate search domain when generating /etc/resolv.conf. It is
presumed that if the hostname is "example.com", the user does not want
to automatically search "com" for unqualified hostnames, which is
reasonable. To implement that, prior to the fix, domain_is_valid()
implicitly used the PSL "prevailing star rule", which had the
consequence of assuming that any top-level domain (TLD) is public
whether it is on the official suffix list or not. That meant
"example.local" or "example.localdomain" would not result in searching
"local" or "localdomain" respectively, but rather /etc/resolv.conf would
contain the full hostname "example.local" as the search domain and not
give users what they expect. The fix here uses the newer PSL API
function that allows us to turn off the "prevailing star rule" so that
"local" and "localdomain" are NOT considered public TLDs because they
are not literally on the suffix list. That in turn gives us the search
domain "local" or "localdomain" in /etc/resolv.conf and allows
unqualified hostname lookups "e.g., resolvectl query example" to find
example.local while example.com still maintains the previous behavior
(i.e., search domain of "example.com" rather than "com").
[thaller@redhat.com: reworded commit message]
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1281https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1613
Before commit a42682d44f ('device: take reference to device object
before 'delete_on_deactivate''), we used a weak pointer to track the
idle action.
As we now use a strong reference, we can store all data about the idle
action in NMDevice itself. Drop DeleteOnDeactivateData.
NMDevice holds a reference to NMManager, which holds a reference to NMPolicy.
It is not possible that we try to dispose NMPolicy while there are still devices
registered. That would be a bug, that we need to find and solve
differently. Add an assertion instead of trying to handle it.
Add an assertion to nm_policy_device_recheck_auto_activate_schedule(),
that the device is currently registered in NMPolicy. Calling it outside
would be odd, and likely a bug.
But if we only register the auto-activate while being registered, we
don't need to take an additional reference. We know that the object must
be be alive (also, we have assertions that in fact it is still alive).
Hook the information for tracking the activation of a device, to the
NMDevice itself. Sure, that slightly couples the NMPolicy closer to
NMDevice, but the result is still simpler code because we don't need a
separate ActivateData.
It also means we can immediately tell whether the auto activation check
for NMDevice is already scheduled and don't need to search through the
list.
NMPolicy really should be merged into NMManager. It has not a clear responsiblity
so that there are two separate objects only makes things confusing. Anyway. It
is permissible to look up the NMPolicy instance of a NMManager. Add an accessor.
It's the better name. Especially since there is no more signal involved,
the term "emit" doesn't match.
Note also how the previous approach using a signal tried to abstract
what is happening. So we were no longer rechecking-autoconnect, instead,
we were emitting-a-signal-to-recheck-autoconnect. Just be plain about
what it is doing and don't go through a layer of signal.
GObject signals don't make the code easier to understand, on the
contrary. They may have their purpose, when objects truly must/should
not be aware of each other, and need to be composed very loosely. That
is not the case here.
There really is only one subscriber to NM_DEVICE_RECHECK_AUTO_ACTIVATE
signal, and it only makes sense this way. Instead of going through a
signal invocation, just call the well known method directly. It becomes
clearer who calls this code (and it has a lower overhead).
When using cscope/ctags it also is easier to follow the code because the
tools understand function calls.
The delete_on_deactivate_link_delete() handler may be called after the
device was already removed from NMManager. Don't allow that.
Check whether the device is still exported on D-Bus as indication.
NM_reboot_openvswitch_vlan_configuration_var2 test exposes a race. What
the test does, is to create OVS profiles and repeatedly restart
NetworkManager, checking that those profiles autoconnect and the OVS
configuration gets created.
There is a race, where:
- the OVS interface exists, and an NMDeviceOvsInterface is created
- first ovsdb cleans up old interfaces, sending a json request.
- OVS deletes the interface, and NetworkManager first picks up the
platform signal (there is a race here, usually the ovsdb request
completes first, which will cleanup the NMDeviceOvsInterface in
a different way).
- when the device gets unrealized, we don't schedule a
check-autoactivate, so the device stays down.
See https://bugzilla.redhat.com/show_bug.cgi?id=2152864#c5 for a log
file with more details.
What should instead happen, is to autoactivate the OVS interface, which
then also fully configures the port and bridge interfaces.
Explicitly schedule an autoactivate when unrealizing devices.
Note that there are now several cases, where NetworkManager autoconnects
more eagerly. This even affects some CI tests and user-visible behavior.
But I think relying on "just don't call nm_device_emit_recheck_auto_activate()
to hope that autoconnect doesn't happen is wrong. It must always be
possible to trigger an autoconnect check, and the right thing must
happen. We only don't trigger autoconnect checks *all* the time, because
it would be a waste of CPU resources, but whenever we slightly suspect
that an autoconnect may happen, we must be allowed to trigger a check.
If a device is in a condition where it previously did not autoconnect,
and it also *should* not autoconnect, then we need to fix the code that
evaluates whether an autoconnect may happen (not avoid triggering a
check).
https://bugzilla.redhat.com/show_bug.cgi?id=2152864
Fixes-test: @NM_reboot_openvswitch_vlan_configuration_var2
Currently, when we delete a device then autoconnect does not kick in
right away. But that is only, because we happen not to schedule a
"autoactivate" recheck.
What should be happen, is that rechecking whether to autoconnect is
always allowed, and that we have the necessary state to know that
autoconnect currently should not work.
Instead, block autoconnect of the involved profile. That makes sense,
because clearly we don't want to autoconnect right again after `nmcli
device delete $iface`.
The "connection.stable-id" supports placeholders like "${CONNECTION}" or
"${DEVICE}".
The stable-id can also be specified in global connection defaults in
NetworkManager.conf, by leaving it unset in the profile. Global
connection defaults always follow the pattern, that they correspond to a
per-profile property, and only when the per-profile value indicates a
special default/unset value, the global connection default is consulted.
Finally, if the global connection default is also not configured in
NetworkManager.conf, a built-in default is used (which may not be
constant either, for example ipv6.ip6-privacy's built-in default depends
on a sysctl value).
In any case, every possible configuration that can be achieved should be
configurable both per-profile and via global connection default. That
was not given for the stable-id, because the built-in default generated
an ID in a way that could not be explicitly expressed otherwise.
So you could not:
- explicitly set the per-profile value to the built-in default, to avoid
that the global-connection-default overwrites it.
- explicitly set the global-connection-default to the built-in default,
to avoid that a lower priority [connection*] section overwrites the
stable-id again.
Fix that inconsistency to make it possible to explicitly set the
built-in default.
Change behavior for literally "default${CONNECTION}" and make it behave
as the built-in default. Also document that the built-in default has that
value.
It's unlikely that this breaks an existing configuration, but of course,
if any user configured "connection.stable-id=default${CONNECTION}", then
the behavior changes for them.