Commit graph

1484 commits

Author SHA1 Message Date
Beniamino Galvani
9d18510437 manager: set the right reason when managing device after realize
When managing a device after it is realized, we previously always set
the NOW_MANAGED reason, that makes the device fully-managed.

This works based on the assumption that initially an external device
has unmanaged flag EXTERNAL_DOWN set, and therefore the device stays
unmanaged during realization.

It is possible that an external device appears already with addresses
(or attached to a controller); we need to set reason
CONNECTION_ASSUMED if it's an external device, so that we don't set
sys-iface-state=managed.

Reproducer:

   ip link add br1 type bridge
   killall -STOP NetworkManager
   ip link add dummy1 type dummy
   ip link set dummy1 master br1
   ip link set dummy1 up
   sleep .5
   killall -CONT NetworkManager

After this, dummy1 is fully managed by NM while it shouldn't.

https://bugzilla.redhat.com/show_bug.cgi?id=2149012
2023-05-29 14:23:23 +02:00
Beniamino Galvani
8bdb53f7f8 device: add nm_device_get_manage_reason_external()
Move some code to determine the reason for managing devices to a new
function.
2023-05-29 14:23:23 +02:00
Lubomir Rintel
6adfd60630 manager: refine the find_master() logic
If there are ports that refer the controllers by a device name, and
multiple autoconnectable controller devices of that name, the
situation gets messy. In particular, the autoconnect logic can start
activating a device with a higher autoconnect priority, but then a port
can override it by bringing up another controller of possibly lower
autoconnect priority.

Let's

1.) prefer controller connections with higher autoconnect priority

  and

2.) prefer connections that are already active so that we don't
    disrupt existing activation.

https://bugzilla.redhat.com/show_bug.cgi?id=2121451
2023-05-29 12:50:16 +02:00
Thomas Haller
82f5bff882
ifcfg-rh: adjust infiniband p-key for later normalization when writing to file 2023-05-25 22:06:49 +02:00
Thomas Haller
f8e5e07355
Revert "infiniband: avoid normalizing the p-key when reading from ifcfg"
Historically, initscripts' ifup-ib would set the highest bit of
PKEY_ID=. That changed and needs to be restored.

Note that it probably makes little sense to ever configure p-keys
without the highest bit set, because that flag indicates full membership
and kernel will automatically add it. At least, kernel will add the flag
for the p-key, but not for the automatically chosen interface name.

Meaning, writing 0x00f0 to create_child sysctl, results in an interface
"$parent.00f0", but `ip -d link` shows pkey 0x80f0.

As NetworkManager otherwise supports p-keys without the highest bit set,
and since that high bit is honored for the interface name, we cannot
just always add the high bit. NetworkManager always assuming the highest
bit is set, would change the interface names of existing configuration.

With this revert, when a user configures a small p-key and the profile
is stored in ifcfg-rh format, the settings backend will automatically
mangle the profile and set 0x8000. That is different from when the
profile is stored in keyfile format. Since using small p-keys is
probably an odd case, we don't try to workaround that any other way
(like that ifcfg format could represent the orignal value of the profile
and not doing such mangling, or to add the high bit throughout
NetworkManager to the p-key). It's an inconsistency, but given the
existing behaviors it seems best to stick (revert) to it.

This reverts commit a4fe16a426.

Affected versions were 1.42.2+ and 1.40.2+.

See-also: 05333c3602/f/rdma.ifup-ib (_75)

https://bugzilla.redhat.com/show_bug.cgi?id=2209164
2023-05-25 14:55:37 +02:00
Benjamin Berg
d07383d3f3
wifi: fix IP address assignment by group owner
When a fixed address is assigned by the P2P group owner, then the code
would set the IPv4 configuration method to DISABLED internally. However,
this causes issues, because it means that IPv4 is considered to not have
come up internally which can cause the connection to later time out even
though it was configured properly.

As such, map this method to MANUAL in this case. The AUTO mapping
becomes then:
 * MANUAL: If the remote part is the GO and assigned an IP address
 * DHCP: If the remote part is the GO and did not assign an address
 * SHARED: If we are the GO

This fixes an issue where the connection established by GNOME Network
Displays would fail once IPv6 configuration also times out.

See-also: https://gitlab.gnome.org/GNOME/gnome-network-displays/-/issues/279

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1636
2023-05-23 22:15:42 +02:00
Thomas Haller
1a0fa85397
platform/tests: check errno on failure to umount() in test_netns_bind_to_path()
The umount() call failed in a test. Rework the assertion, so
we might see the errno of the problem.
2023-05-22 13:06:35 +02:00
Thomas Haller
bcadcc173a
core: improve logging of used IPv6 interface identifier 2023-05-19 12:51:58 +02:00
Thomas Haller
c275d24637
clang-format: reformat code with clang-format 16.0.2-1.fc38
This is the version shipped in Fedora 38. As Fedora 38 is now out, the
core developers switch to it. Our gitlab-ci will also use that as base
image for the check-{patch.tree} tests and to generate the pages. There
is a need that everybody agrees on which clang-format version to use,
and that version should be the one of the currently used Fedora release.

Also update the used Fedora image in "contrib/scripts/nm-code-format-container.sh"
script.

The gitlab-ci still needs update in the following commit. This change
in isolation will break the "check-tree" test.
2023-05-19 10:53:13 +02:00
Beniamino Galvani
93430627c2 team: don't try to connect to teamd in update_connection()
In constructed(), NMDevice starts watching the D-Bus name owner or
monitoring the unix socket, and so it is always aware if teamd is
running. When it is, NMDevice connects to it and initializes
priv->tdc.

It is not useful to try to connect to teamd in update_connection()
because warnings will be generated by NM and by libteam if teamd is
not running. As explained above the connection is always initialized
when teamd is available, and so we can just check priv->tdc.

Fixes: ab586236e3 ('core: implement update_connection() for Team')

https://bugzilla.redhat.com/show_bug.cgi?id=2182029
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1631
2023-05-16 13:18:36 +02:00
Beniamino Galvani
53ba9f4701 ipv6ll: don't regenerate the address when it's removed externally
Currently if the IPv6 link-local address is removed after it passed
DAD, NetworkManager tries to generate a new link-local address. If
this fails, which is always the case for EUI64, ipv6ll is considered
as failed and the connection can go down (depending on may-fail).

This is particularly bad for virtual interfaces because if somebody
removes the link-local address, the activation can fail and destroy
the interface, breaking all services that require it. Also, it's a
change in behavior introduced in 1.36.0.

It seems that a better approach here is to re-add the address that was
removed externally.

Fixes: aa070fb821 ('core: add NML3IPv6LL helper')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1622
2023-05-15 10:23:39 +02:00
Beniamino Galvani
5e3bef6ae9 manager: use the right reason for managing devices after wake/reenable
When managing the interface after wake/reenable, the reason determines
whether the device will be sys-iface-state=managed or external.

Commit 5a9a7623c5 ('core: set STATE_REASON_CONNECTION_ASSUMED when
waking up') changed the reason from 'now-managed' to
'connection-assumed'; the effect was that devices that were fully
managed before sleeping become external after a wake up. For example:

  $ nmcli connection add type ethernet ifname enp1s0
  Connection 'ethernet-enp1s0' (47fcd81e-bf00-4c02-b25b-354894f5657e) successfully added.
  $ nmcli device | grep enp1s0
  enp1s0  ethernet  connected               ethernet-enp1s0
  $ nmcli networking off
  $ nmcli device | grep enp1s0
  enp1s0  ethernet  unmanaged    --
  $ nmcli networking on
  $ nmcli device | grep enp1s0
  enp1s0  ethernet  unavailable  --

Set the correct reason during wake up so that the previous state is
restored.

Fixes: 5a9a7623c5 ('core: set STATE_REASON_CONNECTION_ASSUMED when waking up')
https://bugzilla.redhat.com/show_bug.cgi?id=2193422
2023-05-15 10:11:16 +02:00
Beniamino Galvani
1494774bd1 device: add functions to get and set sys-iface-state before sleep 2023-05-15 10:10:42 +02:00
Thomas Haller
a20d4a7a91
core/tests: add test for nm_firewall_nft_stdio_mlag()
If only to hit some of the code paths in our test, and to have valgrind
check (some of) the code paths.
2023-05-10 19:03:40 +02:00
Thomas Haller
2c716f04f9
bond: don't configure "counter" on nft rules for slb-bonding/mlag
Counters are convenient for debugging, but have a performance overhead.
Configure them only when debug logging in NetworkManager is enabled.
2023-05-10 19:03:40 +02:00
Thomas Haller
4c48301594
device: don't reset "net.ipv6.conf.$IFACE.forwarding"
According to systemd, IPv6 forwarding is special anyway, and they only
enable forwarding for "net.ipv6.conf.all.forwarding" ([1]).

Since commit 46e63e03af ('device: announce the managed IPv6
configuration with ipv6.method=shared') we support "ipv6.method=shared"
and enable forwarding for IPv6, on the interface. Whether that makes
sense is questionable, given [1] and the claim that setting it
per-interface is not useful.

Anyway, since that change we always reset the "forwarding" sysctl to
zero, when we don't enable shared mode. That is not right, because the
user didn't explicitly ask for that (and there is no configuration
option like systemd-networkd's "IPForward=" setting to control that).

What we instead should do, not touch/reset the sysctl, unless we really
want to.

No longer set "forwarding" to zero by default. And only restore the
previous value (_dev_sysctl_save_ip6_properties()) if we actually
changed the value to "1".

[1] b8fba0cded/src/network/networkd-sysctl.c (L79)

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/923

Fixes: 46e63e03af ('device: announce the managed IPv6 configuration with ipv6.method=shared')

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1616
2023-05-09 10:21:25 +02:00
Thomas Haller
429cf416fd
core: add nm_settings_connection_get_setting() helper
For efficiently and conveniently lookup an NMSetting from the
NMConnection inside the NMSettingsConnection.

Note that this uses the NMMetaSettingType as lookup key. That is a novel
approach, compared to lookup by name (nm_connection_get_setting_by_name())
or GType (nm_connection_get_setting()).

Using the NMMetaSettingType enum is however faster, because it does not
require resolving the name/GType first. This is perfecly fine internal API,
we should use it.
2023-05-04 12:01:57 +02:00
Thomas Haller
6e229a852f
core: only trigger recheck when something changes in activate_slave_connections()
We need to detect when nothing relevant changes, and shortcut doing things when they
are unnecessary.
2023-05-04 10:34:12 +02:00
Thomas Haller
7e15b4d562
core: return whether anything changed from nm_manager_devcon_autoconnect_retries_reset() 2023-05-04 10:34:12 +02:00
Thomas Haller
5492945fdc
core: use switch statement in device_state_changed()
It seems better for readability, because reacting based on the state-reason
is ugly already. This way, we access nm_device_state_reason_check(reason) only
at once place. With the if, it's not immediately obvious that both if/else
parts only switch on the reason too.
2023-05-04 10:34:12 +02:00
Thomas Haller
a019d965f7
core: avoid creating devcon data that we don't need
Otherwise, we create device × profiles entries, most of
them nonsensical.
2023-05-04 10:34:12 +02:00
Thomas Haller
87b46e1663
core: improve handling for blocking autoconnect
Cleanup logging to always print a "block-autoconnect:" prefix to related
lines. Also, make sure that everywhere where the state changes, a line
gets logged. Also, for devconf data print both the interface and the
profile.
2023-05-04 10:34:12 +02:00
Thomas Haller
fc624b8de8
core: assert for valid blocked reasons in autoconnect code
We only have a few blocked reasons. Some of them can be only set on the
devcon data, and some only on the settings connection. Assert that we
don't mix that up.
2023-05-04 10:34:12 +02:00
Fernando Fernandez Mancera
2f0571f193 bonding: add support to prio property in bond ports
Add per port priority support for bond active port re-selection during
failover. A higher number means a higher priority in selection. The
primary port still has the highest priority. This option is only
compatible with active-backup, balance-tlb and balance-alb modes.
2023-05-03 10:44:06 +02:00
Fernando Fernandez Mancera
e200b16291 platform: add support to prio property in bond ports 2023-05-03 10:43:58 +02:00
Fernando Fernandez Mancera
bb435674b5 platform: add netlink support for bond port options
sysfs is deprecated and kernel will not add new bond port options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
2023-05-03 09:55:45 +02:00
Thomas Haller
d3b5496362
firewall: create "dynamic" sets for nft rules for slb-bonding
A workaround for a nftables issue ([1]). I don't know why that matters.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2177667

Fixes: e9268e3924 ('firewall: add mlag firewall utils for multi chassis link aggregation (MLAG) for bonding-slb')

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1614
2023-05-03 08:12:15 +02:00
Thomas Haller
db3da65c6c
dns: refactor domain_is_valid() to combine #if blocks 2023-05-02 11:42:55 +02:00
Thomas Haller
4ddbf32f1b
dns/trivial: rename check_public_suffix parameter of domain_is_valid()
Names are important. The previous name was counter intuitive for what
the behavior was.
2023-05-02 11:42:49 +02:00
Thomas Haller
601605dbea
dns: use NM_STR_HAS_SUFFIX() instead of g_str_has_suffix()
It translates to a plain memcmp() as the argument is a string literal.
2023-05-02 11:40:34 +02:00
Thomas Haller
b4338de984
dns: fix logging for resetting the host-domain
The previous logging happened, when the value did not change. Log
instead, when the value changes.

Fixes: 86bb09c93b ('dns: generate correct search domain for hostnames on non-public TLD')
2023-05-02 11:40:33 +02:00
Tom Sobczynski
86bb09c93b
dns: generate correct search domain for hostnames on non-public TLD
dns-manager uses the Mozilla Public Suffix List to determine an
appropriate search domain when generating /etc/resolv.conf. It is
presumed that if the hostname is "example.com", the user does not want
to automatically search "com" for unqualified hostnames, which is
reasonable.  To implement that, prior to the fix, domain_is_valid()
implicitly used the PSL "prevailing star rule", which had the
consequence of assuming that any top-level domain (TLD) is public
whether it is on the official suffix list or not. That meant
"example.local" or "example.localdomain" would not result in searching
"local" or "localdomain" respectively, but rather /etc/resolv.conf would
contain the full hostname "example.local" as the search domain and not
give users what they expect.  The fix here uses the newer PSL API
function that allows us to turn off the "prevailing star rule" so that
"local" and "localdomain" are NOT considered public TLDs because they
are not literally on the suffix list. That in turn gives us the search
domain "local" or "localdomain" in /etc/resolv.conf and allows
unqualified hostname lookups "e.g., resolvectl query example" to find
example.local while example.com still maintains the previous behavior
(i.e., search domain of "example.com" rather than "com").

[thaller@redhat.com: reworded commit message]

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1281

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1613
2023-05-02 11:23:09 +02:00
Thomas Haller
b48c314328
core: simplify tracking of delete_on_deactivate idle action
Before commit a42682d44f ('device: take reference to device object
before 'delete_on_deactivate''), we used a weak pointer to track the
idle action.

As we now use a strong reference, we can store all data about the idle
action in NMDevice itself. Drop DeleteOnDeactivateData.
2023-04-27 08:40:12 +02:00
Thomas Haller
aede228974
core: assert that devices are not registered when disposing NMPolicy
NMDevice holds a reference to NMManager, which holds a reference to NMPolicy.
It is not possible that we try to dispose NMPolicy while there are still devices
registered. That would be a bug, that we need to find and solve
differently. Add an assertion instead of trying to handle it.
2023-04-27 08:40:12 +02:00
Thomas Haller
0dd4724446
core: don't take reference on NMDevice to track auto-activate
Add an assertion to nm_policy_device_recheck_auto_activate_schedule(),
that the device is currently registered in NMPolicy. Calling it outside
would be odd, and likely a bug.

But if we only register the auto-activate while being registered, we
don't need to take an additional reference. We know that the object must
be be alive (also, we have assertions that in fact it is still alive).
2023-04-27 08:40:12 +02:00
Thomas Haller
a22e5080a0
core: rework tracking of auto-activating devices in NMPolicy
Hook the information for tracking the activation of a device, to the
NMDevice itself. Sure, that slightly couples the NMPolicy closer to
NMDevice, but the result is still simpler code because we don't need a
separate ActivateData.

It also means we can immediately tell whether the auto activation check
for NMDevice is already scheduled and don't need to search through the
list.
2023-04-27 08:40:12 +02:00
Thomas Haller
520fcc8667
core: add nm_manager_get_policy() accessor
NMPolicy really should be merged into NMManager. It has not a clear responsiblity
so that there are two separate objects only makes things confusing. Anyway. It
is permissible to look up the NMPolicy instance of a NMManager. Add an accessor.
2023-04-27 08:40:12 +02:00
Thomas Haller
a81925ad32
core: call nm_manager_device_recheck_auto_activate_schedule() from "nm-manager.c"
No need to call down to the device, to call back up to the NMManager.
2023-04-27 08:40:12 +02:00
Thomas Haller
751b927cf2
core: rename nm_device_emit_recheck_auto_activate() to nm_device_recheck_auto_activate_schedule()
It's the better name. Especially since there is no more signal involved,
the term "emit" doesn't match.

Note also how the previous approach using a signal tried to abstract
what is happening. So we were no longer rechecking-autoconnect, instead,
we were emitting-a-signal-to-recheck-autoconnect. Just be plain about
what it is doing and don't go through a layer of signal.
2023-04-27 08:40:12 +02:00
Thomas Haller
3c59c6b393
core: drop NM_DEVICE_RECHECK_AUTO_ACTIVATE signal and call policy directly
GObject signals don't make the code easier to understand, on the
contrary.  They may have their purpose, when objects truly must/should
not be aware of each other, and need to be composed very loosely. That
is not the case here.

There really is only one subscriber to NM_DEVICE_RECHECK_AUTO_ACTIVATE
signal, and it only makes sense this way. Instead of going through a
signal invocation, just call the well known method directly. It becomes
clearer who calls this code (and it has a lower overhead).

When using cscope/ctags it also is easier to follow the code because the
tools understand function calls.
2023-04-27 08:35:28 +02:00
Thomas Haller
aa2569a9cd
core: use GSource for tracking reset_connections_retries idle action
The numeric source IDs are discouraged. Use a GSource instead.
2023-04-27 08:35:28 +02:00
Thomas Haller
1559c37b9f
core: use GSource for tracking _device_recheck_auto_activate_all_cb idle action
The numeric source IDs are discouraged. Use a GSource instead.
2023-04-27 08:35:28 +02:00
Thomas Haller
886786ee0b
core: rename internal function nm_policy_device_recheck_auto_activate_all_schedule()
The "all" variant is strongly related to nm_policy_device_recheck_auto_activate_schedule().
Rename, so that the names express that better.
2023-04-27 08:35:28 +02:00
Thomas Haller
f1c15f0ae7
core: expose and rename nm_policy_device_recheck_auto_activate_schedule()
Let's simplify this part of the code. This is the first step.
2023-04-27 08:35:27 +02:00
Thomas Haller
49c1e01519
core: don't trigger recheck to auto activate for deleted devices
The delete_on_deactivate_link_delete() handler may be called after the
device was already removed from NMManager. Don't allow that.

Check whether the device is still exported on D-Bus as indication.
2023-04-27 08:35:27 +02:00
Thomas Haller
e699dff46a
device: trigger a recheck to autoconnect when unrealizing ovs-interface
NM_reboot_openvswitch_vlan_configuration_var2 test exposes a race. What
the test does, is to create OVS profiles and repeatedly restart
NetworkManager, checking that those profiles autoconnect and the OVS
configuration gets created.

There is a race, where:

- the OVS interface exists, and an NMDeviceOvsInterface is created
- first ovsdb cleans up old interfaces, sending a json request.
- OVS deletes the interface, and NetworkManager first picks up the
  platform signal (there is a race here, usually the ovsdb request
  completes first, which will cleanup the NMDeviceOvsInterface in
  a different way).
- when the device gets unrealized, we don't schedule a
  check-autoactivate, so the device stays down.

See https://bugzilla.redhat.com/show_bug.cgi?id=2152864#c5 for a log
file with more details.

What should instead happen, is to autoactivate the OVS interface, which
then also fully configures the port and bridge interfaces.

Explicitly schedule an autoactivate when unrealizing devices.

Note that there are now several cases, where NetworkManager autoconnects
more eagerly. This even affects some CI tests and user-visible behavior.
But I think relying on "just don't call nm_device_emit_recheck_auto_activate()
to hope that autoconnect doesn't happen is wrong. It must always be
possible to trigger an autoconnect check, and the right thing must
happen. We only don't trigger autoconnect checks *all* the time, because
it would be a waste of CPU resources, but whenever we slightly suspect
that an autoconnect may happen, we must be allowed to trigger a check.
If a device is in a condition where it previously did not autoconnect,
and it also *should* not autoconnect, then we need to fix the code that
evaluates whether an autoconnect may happen (not avoid triggering a
check).

https://bugzilla.redhat.com/show_bug.cgi?id=2152864
Fixes-test: @NM_reboot_openvswitch_vlan_configuration_var2
2023-04-26 17:11:52 +02:00
Thomas Haller
14d429dd17
device: block autoconnect of profile when deleting device
Currently, when we delete a device then autoconnect does not kick in
right away. But that is only, because we happen not to schedule a
"autoactivate" recheck.

What should be happen, is that rechecking whether to autoconnect is
always allowed, and that we have the necessary state to know that
autoconnect currently should not work.

Instead, block autoconnect of the involved profile. That makes sense,
because clearly we don't want to autoconnect right again after `nmcli
device delete $iface`.
2023-04-26 11:05:18 +02:00
Thomas Haller
c68cbcb8fa
device: minor cleanup of code path in delete_cb() 2023-04-26 11:05:18 +02:00
Thomas Haller
7deea767d3
core: use NMStrBuf in nm_utils_stable_id_parse() 2023-04-21 12:51:43 +02:00
Thomas Haller
21cf2dc58f
libnm,core: make "default${CONNECTION}" the built-in stable ID
The "connection.stable-id" supports placeholders like "${CONNECTION}" or
"${DEVICE}".

The stable-id can also be specified in global connection defaults in
NetworkManager.conf, by leaving it unset in the profile. Global
connection defaults always follow the pattern, that they correspond to a
per-profile property, and only when the per-profile value indicates a
special default/unset value, the global connection default is consulted.
Finally, if the global connection default is also not configured in
NetworkManager.conf, a built-in default is used (which may not be
constant either, for example ipv6.ip6-privacy's built-in default depends
on a sysctl value).

In any case, every possible configuration that can be achieved should be
configurable both per-profile and via global connection default. That
was not given for the stable-id, because the built-in default generated
an ID in a way that could not be explicitly expressed otherwise.

So you could not:
- explicitly set the per-profile value to the built-in default, to avoid
  that the global-connection-default overwrites it.
- explicitly set the global-connection-default to the built-in default,
  to avoid that a lower priority [connection*] section overwrites the
  stable-id again.

Fix that inconsistency to make it possible to explicitly set the
built-in default.

Change behavior for literally "default${CONNECTION}" and make it behave
as the built-in default. Also document that the built-in default has that
value.

It's unlikely that this breaks an existing configuration, but of course,
if any user configured "connection.stable-id=default${CONNECTION}", then
the behavior changes for them.
2023-04-21 12:49:18 +02:00