Otherwise we're likely interfering with an in-progress activation.
Consider the following connections, first two being active:
id=bond0a type=bond interface-name=bond0, (Active)
id=dummy0a type=dummy interface-name=dummy0 master=bond0a, (Active)
id=bond0b type=bond interface-name=bond0
id=dummy0b type=dummy interface-name=dummy0 master=bond0b
Note there's two hierarchies with bond0 bond having a dummy0 port,
first one (bond0a, dummy0a) being active.
Suppose the users wants to bring the other one up (bond0b, dummy0b) and
does a "nmcli c up bond0b". This is what happens:
1.) bond0b starts activation due to user request
2.) bond0a starts deactivation due to new activation
3.) dummy0 loses its master, begins deactivation
4.) dummy0 finishes deactivation
5.) both dummy0 being deactivated and bond0b check for slaves enqueues
auto-activation check for dummy0
6.) auto-activation picks dummy0a for dummy0
7.) dummy0a begins activation
8.) dummy0a looks for master connection, picks bond0a
9.) bond0a starts activating on bond0, kicks bond0b away
10.) bond0a and dummy0a end up finishing activation
11.) Everybody unhappy :(
NM's auto-activation logic is only takes autoconnect priority into
account when figuring out a connection to activate and can't be expected
to bring up most sensible combination of connection when there's
multiple ones for the same devices with complex dependencies.
Nevertheless, it shouldn't ever undo the activations if the user is
bringing up the connections manually.
This patch prevents bringing up of master devices that are not
DISCONNECTED and therefore shouldn't be up for grabs. This was
previously done for hardware devices only whereas I believe it should be
the case for *all* realized devices.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/1172https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1364
The current behavior of "nmcli networking off" is that it starts
disconnecting the devices, but doesn't wait for them to actually
come down.
That is not too helpful: the user never knows when the network is
actually disconnected.
Some users, notably the NetworkManager-CI test suite, seem to expect the
devices are all disconnected after the command finishes. Even worse,
it immediately proceeds activating the connections:
@ovs_cloned_mac_set_on_iface
...
* Execute "nmcli networking off && nmcli networking on"
This results in pure utter chaos. In particular, the slave connections
sometimes refuse to activate after "nmcli networking on", because the
master connections are still getting disconnected in response to
preceding "nmcli networking off".
Let's make Enable(FALSE) and Sleep(TRUE) block until none of the devices
are expected to go down.
Note that this makes those call also return when Enable(TRUE) and
Sleep(FALSE) is issued in meanwhile. Therefore a return from
Enable(FALSE) doesn't necessarily imply the networking is disabled.
This is a feature, not a bug -- the actual manager state is available in
the "state" property.
Fixes-test: @ovs_cloned_mac_set_on_iface
https://bugzilla.redhat.com/show_bug.cgi?id=2093175https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1292
DHCP leases for a given interface are already exported on D-Bus
through DHCP4Config and DHCP6Config objects. It is useful to have the
same information also available on the filesystem so that it can be
easily used by scripts.
NM already saves some information about DHCP leases in /var, however
that directory can only be accessed by root, for good reasons.
Append lease options to the existing state file
/run/NetworkManager/devices/$ifindex. Contrary to /var this directory
is not persistent, but it seems more correct to expose the lease only
when it is active and not after it expired or after a reboot.
Since the file is in keyfile format, we add new [dhcp4] and [dhcp6]
sections; however, since some options have the same name for DHCPv4
and DHCPv6, we add a "dhcp4." or "dhcp6." prefix to make the parsing
by scripts (e.g. via "grep") easier.
The option name is the same we use on D-Bus. Since some DHCPv6 options
also have a "dhcp6_" prefix, the key name can contain "dhcp6" twice.
The new sections look like this:
[dhcp4]
dhcp4.broadcast_address=172.25.1.255
dhcp4.dhcp_lease_time=120
dhcp4.dhcp_server_identifier=172.25.1.4
dhcp4.domain_name_servers=172.25.1.4
dhcp4.domain_search=example.com
dhcp4.expiry=1641214444
dhcp4.ip_address=172.25.1.182
dhcp4.next_server=172.25.1.4
dhcp4.routers=172.25.1.4
dhcp4.subnet_mask=255.255.255.0
[dhcp6]
dhcp6.dhcp6_name_servers=fd01::1
dhcp6.dhcp6_ntp_servers=ntp.example.com
dhcp6.ip6_address=fd01::1aa
nm_dns_manager_get() is already a singleton. So users usually
can just get it whenever they need -- except during shutdown
after the singleton was destroyed. This is usually fine, because
users really should not try to get it late during shutdown.
However, if you subscribe a signal handler on the singleton, then you
will also eventually want to unsubscribe it. While the moment when you
subscribe it is clearly not during late-shutdown, it's not clear how
to ensure that the signal listener gets destroyed before the DNS manager
singleton.
So usually, whenever you are going to subscribe a signal, you need to
make sure that the target object stays alive long enough. Which may
mean to keep a reference to it.
Next, we will have NMDevice subscribe to the singleton. With above said,
that would mean that potentially every NMDevice needs to keep a
reference to the NMDnsManager. That is not best. Also, later NMManager
will face the same problem, because it will also subscribe to
NMDnsManager.
So, instead let NMManager own a reference to the NMDnsManager. This
ensures the lifetimes are properly guarded (NMDevice also references
NMManager already).
Also, access nm_dns_manager_get() lazy on first use, to only initialize
it when needed the first time (which might be quite late).
We store the timestamp when a profile activated the last time to
"/var/lib/NetworkManager/timestamps". There was also a timer which
would update the timestamp of activated connections every 300 seconds.
That seems unnecessary, drop it.
For one, waking up every 5 minutes and rewriting a file to disk seems
undesirable, for example if /var is a device where unnecessary writes
should be minimized.
Note that we already update the timestamp when a device goes down,
and of course when it comes up. Updating the timestamp in between seems
unnecessary.
This reverts commit 607350294d ('core: update timestamp in active
system connections every 5 mins (bgo #583756)').
An alternative would be to only update the timestamp in memory (so that
it would appear updated on D-Bus), but delay writing the file until
something important happens. `nm_key_file_db_*()` already tracks whether
there are changes ("dirty") and whether it's necessary to write the
file. It would be possible to track two dirty flags: one that requires
immediate update, and one that only ensures we will re-write dirty files
eventually.
See-also: https://bugzilla.gnome.org/show_bug.cgi?id=583756https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1171
Introduce a RadioFlags property on the manager object. For now it
contains two bits WLAN_AVAILABLE, WWAN_AVAILABLE to indicate whether
any radio interface is present in the system. The presence of a radio
is detected by looking at devices and rfkill switches.
In future, any radio-related read-only boolean flag can be exposed via
this property, including the already existing WirelessHardwareEnabled
and WwanHardwareEnabled properties.
This will allow migrating a connection. If specified, the connection will
be confined to a particular settings plugin when written back. If the
plugin differs from the existing one, it will be removed from the old one.
When we have a bridge interface with ports attached externally (that is,
not by NetworkManager itself), then it can make sense that during
checkpoint rollback we want to keep those ports attached.
During rollback, we may need to deactivate the bridge device and
re-activate it. Implement this, by setting a flag before deactivating,
which prevents external ports to be detached. The flag gets cleared,
when the device state changes to activated (the following activation)
or unmanaged.
This is an ugly solution, for several reasons.
For one, NMDevice tracks its ports in the "slaves" list. But what
it does is ugly. There is no clear concept to understand what it
actually tacks. For example, it tracks externally added interfaces
(nm_device_sys_iface_state_is_external()) that are attached while
not being connected. But it also tracks interfaces that we want to attach
during activation (but which are not yet actually enslaved). It also tracks
slaves that have no actual netdev device (OVS). So it's not clear what this
list contains and what it should contain at any point in time. When we skip
the change of the slaves states during nm_device_master_release_slaves_all(),
it's not really clear what the effects are. It's ugly, but probably correct
enough. What would be better, if we had a clear purpose of what the
lists (or several lists) mean. E.g. a list of all ports that are
currently, physically attached vs. a list of ports we want to attach vs.
a list of OVS slaves that have no actual netdev device.
Another problem is that we attach state on the device
("activation_state_preserve_external_ports"), which should linger there
during the deactivation and reactivation. How can we be sure that we don't
leave that flag dangling there, and that the desired following activation
is the one we cared about? If the follow-up activation fails short (e.g. an
unmanaged command comes first), will we properly disconnect the slaves?
Should we even? In practice, it might be correct enough.
Also, we only implement this for bridges. I think this is where it makes
the most sense. And after all, it's an odd thing to preserve unknown,
external things during a rollback -- unknown, because we have no knowledge
about why these ports are attached and what to do with them.
Also, the change doesn't remember the ports that were attached when the
checkpoint was created. Instead, we preserve all ports that are attached
during rollback. That seems more useful and easier to implement. So we
don't actually rollback to the configuration when the checkpoint was
created. Instead, we rollback, but keep external devices.
Also, we do this now by default and introduce a flag to get the previous
behavior.
https://bugzilla.redhat.com/show_bug.cgi?id=2035519https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/ # 909
The kernel may add a reason for hardware rfkill. Make the NetworkManager
able eto fetch it and parse it.
For now, no action will be taken upon the new reasons.
The different reasons that the kernel can expose are either the radio
was switched off by a hardware rfkill switch. This reason is adveritsed
by bit 0 in the bitmap returned by RFKILL_STATE_REASON udev property.
This is the rfkill that existed until now.
The new reason is mapped to bit 1 and teaches the user space that the
wifi device is currently used by the CSME firmware on the platform. In
that case, the NetworkManager can ask CSME (through the iwlmei kernel
module) what BSSID the CSME firmware is associated to. Once the
NetworkManager gets to the conclusion is has the credentials to connect
to that very same AP, it can request the wifi device and the CSME
firmware will allow the host to take the ownership on the device. CSME
will give 3 seconds to the host to get an IP or it'll take the device
back. In order to complete all the process until we get the DHCP ACK
within 3 seconds, the NetworkManager will need to optimize the scan and
limit the scan to that specific BSSID on that specific channel.
All this flow is not implemented yet, but the first step is to identify
that the device is not owned by the host.
The signal parameters are G_TYPE_UINT. We should not assume that our
enums are a compatible integer type.
In practice of course they always were. Let's just be clear about when
we have an integer and when we have an enum.
Naming is important. Especially when we have a 8k LOC monster, that
manages everything.
Rename things related to Rfkill to give them a common prefix.
Also, move code around so it's beside each other.
RadioState contained both the mutable user-data and meta-data about the
radio state. The latter is immutable. The parts that cannot change, should
be separate from those that can change. Move them to a separate array.
Of course, we only have on NMManager instance, so having these fields embedded
in NMManagerPrivate did not waste memory. This change is done, because immutable
fields should be const. In this case, they are now const global data, which
is even protected by the linker from accidental mutation.
Names in header files should have an "nm" prefix. We do that pretty
consistently. Fix the offenders RfKillState and RfKillType.
Also, rename the RfKillState enums to follow the type name. For example,
NM_RFKILL_STATE_SOFT_BLOCKED instead of RFKILL_SOFT_BLOCKED.
Also, when we camel-case a typedef (NMRfKillState) we would want that
the lower-case names use underscore between the words. So it should be
`nm_rf_kill_state_to_string()`. But that looks awkward. So the right solution
here is to also rename "RfKill" to "Rfkill". That make is consistent
with the spelling of the existing `NMRfkillManager` type and the
`nm-rfkill-manager.h` file.
- use "const RadioState" where possible
- use "bool" bitfields in RadioState (boolean flags in structs
should be `bool` types for consistency) and order fields by
alignment.
- break lines for variable declaration in manager_rfkill_update_one_type().
- return (and nm_assert()) from update_rstate_from_rfkill(). By
not adding a default case, compiler would warn if we forget to
handle an enum value. We can easily do that, by just returning,
and let the "default" case be handled by nm_assert_not_reached()
-- which unlike g_warn_if_reached() compiles to nothing in
release build.
- add nm_assert() that `priv->radio_states[rtype]` is not out of range.
- use designated initializers for priv->radio_states[].
When the NM_UNMANAGED_PLATFORM_INIT flag is cleared last in
device_link_changed(), a recheck-assume is scheduled and then the
device goes immediately to UNAVAILABLE. During the state transition,
addresses and routes are removed from the interface. Then,
recheck-assume finds that the device can be assumed but it's too late
since the device was already deconfigured.
This is a problem as the whole point of assuming a device is to
activate a connection while leaving the device untouched.
In the NMCI "dracut_NM_vlan_over_bridge and dracut_NM_vlan_over_bond"
test, NM in real root tries to assume a vlan device that was activated
in initrd. When the interface gets deconfigured in UNAVAILABLE, the
connection to the NFS server breaks and the rootfs becomes
inaccessible.
The fix to this problem is to delay state transitions in
device_link_changed() to a idle handler, so that recheck-assume can
run before.
Fixes-test: @dracut_NM_vlan_over_bridge
Fixes-test: @dracut_NM_vlan_over_bond
https://bugzilla.redhat.com/show_bug.cgi?id=2047302
We have at least static and transient hostnames. Let's be clear which
one we are talking about.
Note that also NM_SETTINGS_HOSTNAME gets renamed to
NM_SETTINGS_STATIC_HOSTNAME, because it seems clearer.
The only purpose of NM_SETTINGS_STATIC_HOSTNAME is to be the backing
property for the "Hostname" D-Bus property for the NMDBusObject glue.
So, while the new name makes more sense to me, it's now also
inconsistent with it's primary use (the D-Bus property). Still...
When the device gets realized, similar to the situation that the device
is unmanaged by platform-init, if the device is still unmanaged by
parent and we clear the assume state. Then, when the device becomes
managed, NM is not able to properly assume the device using the UUID.
Therefore, we should not clear the assume state if the device has only
the NM_UNMANAGED_PLATFORM_INIT or the NM_UNMANAGED_PARENT flag set
in the unmanaged flags.
The previous commit 3c4450aa4d ('core: don't reset assume state too
early') did something similar for NM_UNMANAGED_PLATFORM_INIT flag only.
We use clang-format for automatic formatting of our source files.
Since clang-format is actively maintained software, the actual
formatting depends on the used version of clang-format. That is
unfortunate and painful, but really unavoidable unless clang-format
would be strictly bug-compatible.
So the version that we must use is from the current Fedora release, which
is also tested by our gitlab-ci. Previously, we were using Fedora 34 with
clang-tools-extra-12.0.1-1.fc34.x86_64.
As Fedora 35 comes along, we need to update our formatting as Fedora 35
comes with version "13.0.0~rc1-1.fc35".
An alternative would be to freeze on version 12, but that has different
problems (like, it's cumbersome to rebuild clang 12 on Fedora 35 and it
would be cumbersome for our developers which are on Fedora 35 to use a
clang that they cannot easily install).
The (differently painful) solution is to reformat from time to time, as we
switch to a new Fedora (and thus clang) version.
Usually we would expect that such a reformatting brings minor changes.
But this time, the changes are huge. That is mentioned in the release
notes [1] as
Makes PointerAligment: Right working with AlignConsecutiveDeclarations. (Fixes https://llvm.org/PR27353)
[1] https://releases.llvm.org/13.0.0/tools/clang/docs/ReleaseNotes.html#clang-format
Completely rework IP configuration in the daemon. Use NML3Cfg as layer 3
manager for the IP configuration of an interface. Use NML3ConfigData as
pieces of configuration that the various components collect and
configure. NMDevice is managing most of the IP configuration at a higher
level, that is, it starts DHCP and other IP methods. Rework the state
handling there.
This is a huge rework of how NetworkManager daemon handles IP
configuration. Some fallout is to be expected.
It appears the patch deletes many lines of code. That is not accurate, because
you also have to count the files `src/core/nm-l3*`, which were unused previously.
Co-authored-by: Beniamino Galvani <bgalvani@redhat.com>
The two nm-sudo helper functions are only used by the OVS device plugin,
but they are part of NetworkManager core binary.
This is done commonly, where the NetworkManager binary has symbols that
are used by the plugins. But in this case, NetworkManager itself doesn't
use the symbols. That will case the linker to drop them.
A previous solution for that was commit 684f2acffe ('build: add way to
keep unused symbols when linking NetworkManager'), but that doesn't seem
to work with clang 3.4 (rhel-7).
Instead, actually use the symbol so that it cannot be dropped.
Naming is important, because the name of a thing should give you a good
idea what it does. Also, to find a thing, it needs a good name in the
first place. But naming is also hard.
Historically, some strv helper API was named as nm_utils_strv_*(),
and some API had a leading underscore (as it is internal API).
This was all inconsistent. Do some renaming and try to unify things.
We get rid of the leading underscore if this is just a regular
(internal) helper. But not for example from _nm_strv_find_first(),
because that is the implementation of nm_strv_find_first().
- _nm_utils_strv_cleanup() -> nm_strv_cleanup()
- _nm_utils_strv_cleanup_const() -> nm_strv_cleanup_const()
- _nm_utils_strv_cmp_n() -> _nm_strv_cmp_n()
- _nm_utils_strv_dup() -> _nm_strv_dup()
- _nm_utils_strv_dup_packed() -> _nm_strv_dup_packed()
- _nm_utils_strv_find_first() -> _nm_strv_find_first()
- _nm_utils_strv_sort() -> _nm_strv_sort()
- _nm_utils_strv_to_ptrarray() -> nm_strv_to_ptrarray()
- _nm_utils_strv_to_slist() -> nm_strv_to_gslist()
- nm_utils_strv_cmp_n() -> nm_strv_cmp_n()
- nm_utils_strv_dup() -> nm_strv_dup()
- nm_utils_strv_dup_packed() -> nm_strv_dup_packed()
- nm_utils_strv_dup_shallow_maybe_a() -> nm_strv_dup_shallow_maybe_a()
- nm_utils_strv_equal() -> nm_strv_equal()
- nm_utils_strv_find_binary_search() -> nm_strv_find_binary_search()
- nm_utils_strv_find_first() -> nm_strv_find_first()
- nm_utils_strv_make_deep_copied() -> nm_strv_make_deep_copied()
- nm_utils_strv_make_deep_copied_n() -> nm_strv_make_deep_copied_n()
- nm_utils_strv_make_deep_copied_nonnull() -> nm_strv_make_deep_copied_nonnull()
- nm_utils_strv_sort() -> nm_strv_sort()
Note that no names are swapped and none of the new names existed
previously. That means, all the new names are really new, which
simplifies to find errors due to this larger refactoring. E.g. if
you backport a patch from after this change to an old branch, you'll
get a compiler error and notice that something is missing.
Add a new 'keep-configuration' device option, set to 'yes' by
default. When set to 'no', on startup NetworkManager ignores that the
interface is pre-configured and doesn't try to keep its
configuration. Instead, it activates one of the persistent
connections.
Turns out, we call nm_settings_get_connection_clone() *a lot* with sort order
nm_settings_connection_cmp_autoconnect_priority_p_with_data().
As we cache the (differently sorted) list of connections, also cache
the presorted list. The only complication is that every time we still
need to check whether the list is still sorted, because it would be
more complicated to invalidate the list when an entry changes which
affects the sort order. Still, such a check is usually successful
and requires "only" N-1 comparisons.
Commit 33b9fa3a3c ("manager: Keep volatile/external connections
while referenced by async_op_lst") changed active_connection_find() to
also return active connections that are not yet activating but are
waiting authorization.
This has side effect for other callers of the function. In particular,
_get_activatable_connections_filter() should exclude only ACs that are
really active, not those waiting for authorization.
Otherwise, in ensure_master_active_connection() all the ACs waiting
authorization are missed and we might fail to find the right master
AC.
Add an argument to active_connection_find to select whether include
ACs waiting authorization.
Fixes: 33b9fa3a3c ('manager: Keep volatile/external connections while referenced by async_op_lst')
https://bugzilla.redhat.com/show_bug.cgi?id=1955101
If the device is still unmanaged by platform-init (which means that
udev didn't emit the event for the interface) when the device gets
realized, we currently clear the assume state. Later, when the device
becomes managed, NM is not able to properly assume the device using
the UUID.
This situation arises, for example, when NM already configured the
device in initrd; after NM is restarted in the real root, udev events
can be delayed causing this race condition.
Among all unamanaged flags, platform-init is the only one that can be
delayed externally. We should not clear the assume state if the device
has only platform-init in the unmanaged flags.
D-Bus 1.3.1 (2010) introduced the standard "PropertiesChanged" signal
on "org.freedesktop.DBus.Properties". NetworkManager is old, and predates
this API. From that time, it still had it's own PropertiesChanged signal
that are emitted together with the standard ones. NetworkManager
supports the standard PropertiesChanged signal since it switched to
gdbus library in version 1.2.0 (2016).
These own signals are deprecated for a long time already ([1], 2016), and
are hopefully not used by anybody anymore. libnm-glib was using them and
relied on them, but that library is gone. libnm does not use them and neither
does plasma-nm.
Hopefully no users are left that are affected by this API break.
[1] 6fb917178a
Active-connections in the async_op_lst are not guaranteed to have a
settings-connection. In particular, the settings-connection for an
AddAndActivate() AC is set only after the authorization succeeds. Use
the non-asserting variant of the function to fix the following
failure:
nm_active_connection_get_settings_connection: assertion 'sett_conn' failed
1 _g_log_abort()
2 g_logv()
3 g_log()
4 _nm_g_return_if_fail_warning.constprop.14()
5 nm_active_connection_get_settings_connection()
6 active_connection_find()
7 _get_activatable_connections_filter()
8 nm_settings_get_connections_clone()
9 nm_manager_get_activatable_connections()
10 auto_activate_device_cb()
11 g_idle_dispatch()
12 g_main_context_dispatch()
13 g_main_context_iterate.isra.21()
14 g_main_loop_run()
15 main()
Fixes: 33b9fa3a3c ('manager: Keep volatile/external connections while referenced by async_op_lst')
https://bugzilla.redhat.com/show_bug.cgi?id=1933719https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/834
When the link goes away the manager keeps software devices alive as
unrealized because there is still a connection for them.
If the device is software and has a NM-generated connection, keeping
the device alive means that also the generated connection stays
alive. The result is that both stick around forever even if there is
no longer a kernel link.
Add a check to avoid this situation.
https://bugzilla.redhat.com/show_bug.cgi?id=1945282
Fixes: cd0cf9229d ('veth: add support to configure veth interfaces')
Along with NM_SETTINGS_CONNECTION_UPDATE_REASON_RESET_SYSTEM_SECRETS
and NM_SETTINGS_CONNECTION_UPDATE_REASON_RESET_AGENT_SECRETS, which can
be used in the NMSettingConnection's "updated" handlers to track secrets
updates, add NM_SETTINGS_CONNECTION_UPDATE_REASON_UPDATE_NON_SECRET so
that the handlers can tell when something other than secrets has been
updated in the connection.
It can also potentially be used in _connection_changed_update in
src/core/settings/nm-settings.c to stop emitting the
NetworkManager.Settings.Connection.Updated() dbus signal if only secrets
are being updated (on agent queries etc.) if it is deemed to be correct.