This is done so that AddAndActivate() will return sensible errors in a
future patch that makes it support creating virtual devices.
In effect, all errors are logged in one place, therefore the log levels
are different. I don't think we're losing anything of value by being
a little less verbose here.
(cherry picked from commit 45d82f720c)
Preventing the activation of unavailable devices for all device types is
too aggresive and leads to race conditions, e.g when a non-virtual bond
port gets a carrier, preventing the device to be a good candidate for
the connection.
Instead, enforce this check only on OVS interfaces as NetworkManager
just makes sure that ovsdb->ready is set to TRUE.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/2139
Fixes: 774badb151 ('core: prevent the activation of unavailable devices')
(cherry picked from commit a1c05d2ce6)
When autoconnecting ports of a controller, we look for all candidate
(device,connection) tuples through the following call trace:
-> autoconnect_ports()
-> find_ports()
-> nm_manager_get_best_device_for_connection()
-> nm_device_check_connection_available()
-> _nm_device_check_connection_available()
The last function checks that a specific device is available to be
activated with the given connection. For virtual devices, it only
checks that the device is compatible with the connection based on the
device type and characteristics, without considering any live network
information.
For OVS interfaces, this doesn't work as expected. During startup, NM
performs a cleanup of the ovsdb to remove entries that were previously
added by NM. When the cleanup is terminated, NMOvsdb sets the "ready"
flag and is ready to start the activation of new OVS interfaces. With
the current mechanism, it is possible that a OVS-interface connection
gets activated via the autoconnect-ports mechanism without checking
the "ready" flag.
Fix that by also checking that the device is available for activation.
Rename "unavailable_devices" to "exclude_devices", as the
"unavailable" term has a specific, different meaning in NetworkManager
(i.e. the device is in the UNAVAILABLE state). Also, use
nm_g_hash_table_contains() when needed.
If the connection didn't exist in advance, there's no unrealized device,
and find_device_by_iface() is not going to get us one.
Call system_create_virtual_device() afrer nm_utils_complete_generic()
completes the connection for virtual devices. Make sure we do proper
cleanup if we happen to fail the activation later, so that de device
doesn't end up hanging there.
Make validate_activation_request() only do the validation -- split the
determination of the device into find_device_for_activation().
The point of this is to be able complete the connection and actually
create a virtual device after the validation.
I believe this is also somewhat easier to follow now that the procedure
does what its name says.
The point is to get rid of device/connection type specific arguments, to
eventually be able to complete the connection on AddAndActivate before knowing
which factory is going to take care of creating the device.
Aside from that, the whole thing is pretty awful -- with complicated
macros and variadic argument (ugh). Let's get rid of that.
By default, on reapply we were only syncing the main routes table. This
causes that routes added by NM to other tables are not removed on
reapply. This was done to preserve routes added externally, but routes
added by NM itself should be removed.
Add a new route table syncing mode "main + NM routes". This mode
maintains the normal behaviour of syncing completely the main table,
and for other tables removes only routes that were added by us, leaving
the rest untouched. Use this mode by default, as this is what a user
would expect on reapply.
Note: this might not work if NM is restarted between the profile being
modified and the reapply, because NM forgets what routes were added by
itself because of the restart. This is a rare corner case, though.
Use the D-Bus property "VersionInfo" to expose a capability flag
indicating that this bug is fixed. It is the first capability that we
expose in this way. However, it is convenient to do it this way as it's
something that clients like nmstate needs to know, so they can decide
whether a conn down is needed or not. It is not enough to decide that by
version number because it might be fixed via a downstream patch in distros
like RHEL.
https://issues.redhat.com/browse/RHEL-67324https://issues.redhat.com/browse/RHEL-66262
Fixes: e9c17fcc9b ('l3cfg: default to 'main' route table sync mode')
Managed type = managed is a bit unclear, because all managed types are
for devices that are managed, but with different levels. Managed type =
managed could be interpreted as other types are unmanaged. Change it to
managed type = full.
As part of the conscious language effort we must provide an alternative
option to configure autoconnect-ports system-wide on NetworkManager
configuration file.
When activating a port with its controller deactivating by new
activation, NM will register `state-change` signal waiting controller to
have new active connections. Once controller got new active connection,
the port will invoke `nm_active_connection_set_controller()` which lead
to assert error on
g_return_if_fail(!nm_dbus_object_is_exported(NM_DBUS_OBJECT(self)))
because this active connection is already exposed as DBUS object.
To fix the problem, we remove the restriction on controller been
write-only and notify DBUS object changes for controller property.
Signed-off-by: Gris Ge <fge@redhat.com>
Connection timestamps are updated (saved to disk) on connection up and
down. This way, the last used connection will take precedence for
autoconnect if they have the same priority.
But as we don't actually do connection down when NM stops, the last
connection timestamp of all active connections is the timestamp of when
they were brought up. Then, the activation order might be wrong on next
start.
One case where timestamps are wrong (although it is not clear how
important it is because the connections are activated on different
interfaces):
1. Activate con1 <- timestamp updated
2. Activate con2 <- timestamp updated
3. Deactivate con2 <- timestamp updated
4. Stop NM <- timestamp of con2 is higher than con1, but con1 was still
active when con2 was brought down.
Other case that is reproducible (from
https://issues.redhat.com/browse/RHEL-35539):
1. Activate con1
2. Activate con2 on same interface:
- As a consequence con1 is deactivated and its timestamp updated
- The timestamp of con2 is also updated
3. Stop NM <- timestamp of con1 and con2 is the same, next activation
order will be undefined.
Fix by saving the timestamps on NM shutdown.
Problem:
Given a OVS port with `autoconnect-ports` set to default or false,
when reactivation required for checkpoint rollback,
previous activated OVS interface will be in deactivate state after
checkpoint rollback.
The root cause:
The `activate_stage1_device_prepare()` will mark the device as
failed when controller is deactivating or deactivated.
In `activate_stage1_device_prepare()`, the controller device is
retrieved from NMActiveConnection, it will be NULL when NMActiveConnection
is in deactivated state. This will cause device been set to
`NM_DEVICE_STATE_REASON_DEPENDENCY_FAILED` which prevent all follow
up `autoconnect` actions.
Fix:
When noticing controller is deactivating or deactivated with reason
`NM_DEVICE_STATE_REASON_NEW_ACTIVATION`, use new function
`nm_active_connection_set_controller_dev()` to wait on controller
device state between NM_DEVICE_STATE_PREPARE and
NM_DEVICE_STATE_ACTIVATED. After that, use existing
`nm_active_connection_set_controller()` to use new
NMActiveConnection of controller to move on.
Resolves: https://issues.redhat.com/browse/RHEL-31972
Signed-off-by: Gris Ge <fge@redhat.com>
Fix the following:
NetworkManager: file ../src/libnm-core-impl/nm-connection.c: line 321 (nm_connection_get_setting): should not be reached
NetworkManager.service: Main process exited, code=dumped, status=5/TRAP
Fixes: bd38a19832 ('connection: add support to down-on-poweroff')
While enumerating devices at startup, we take a snapshot of existing
links from platform and we start creating device instances for
them. It's possible that in the meantime, while processing netlink
events in platform_link_added(), a link gets renamed. If that happens,
then we have two different views of the same ifindex: the cached link
from `links` and the link in platform.
This can cause issues: in platform_link_added() we create the device
with the cached name; then in NMDevice's constructor(), we look up
from platform the ifindex for the given name. Because of the rename,
this lookup can match a newly created, different link.
The end result is that the ifindex from the initial snapshot doesn't
get a NMDevice and is not handled by NetworkManager.
Fix this problem by fetching the latest version of the link from
platform to make sure we have a consistent view of the state.
https://issues.redhat.com/browse/RHEL-25808https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1897
When creating VLAN over OVS internal interface which holding the same
name as its controller OVS bridge, NetworkManager will fail with error:
Error: Connection activation failed: br0.101 failed to create
resources: cannot retrieve ifindex of interface br0 (Open vSwitch
Bridge)
Expanded the `find_device_by_iface()` with additional argument
`child: NmConnection *` which will validate whether candidate is
suitable to be parent device.
In `nm_device_check_parent_connection_compatible()`, we only not allow OVS
bridge and OVS port being parent.
Resolves: https://issues.redhat.com/browse/RHEL-26753
Signed-off-by: Gris Ge <fge@redhat.com>
With `NM_CHECKPOINT_CREATE_FLAG_TRACK_INTERNAL_GLOBAL_DNS` flag set on
checkpoint creation, the checkpoint rollback will restore the
global DNS in internal configure file
`/var/lib/NetworkManager/NetworkManager-intern.conf`.
If user has set global DNS in /etc folder, this flag will not take any
effect.
Resolves: https://issues.redhat.com/browse/RHEL-23446
Signed-off-by: Gris Ge <fge@redhat.com>
The new option at NMSettingConnection allow the user to specify if the
connection needs to be down when powering off the system. This is useful
for IP address removal prior powering off. In order to accomplish that,
we listen on "Shutdown" systemd DBus signal.
The option is set to FALSE by default, it can be specified globally on
configuration file or per profile.
The code that is adding the devices to the sleeping list and taking them
down should be moved to a separated function. This way we can reuse it
and we avoid duplicating code.
In order to provide the NMSleepMonitor a more generic usage, let's
rename the whole module to NMPowerMonitor. Nothing is exposed to the API
so it is a trivial renaming.
If a generic device is present and the name matches, it is compatible
with any link type.
For example, if a generic connection has a device-handler that creates
a dummy interface, the link is compatible with the NMDeviceGeneric.
(cherry picked from commit 5978fb2b27)
When a generic connection has a custom device-handler, it always
generates a NMDeviceGeneric, even when the link that gets created is
of a type natively supported by NM. On service restart, we need to
keep track that the device is generic or otherwise a different device
type will be instantiated.
(cherry picked from commit f2613be150)
Mark the methods/properties deprecated in the D-Bus API (via
org.freedesktop.DBus.Introspectable.Introspect(), [1]).
It affects those properties that are documented as deprecated in
introspection XML.
$ busctl -j call \
org.freedesktop.NetworkManager \
/org/freedesktop/NetworkManager \
org.freedesktop.DBus.Introspectable \
Introspect | \
jq '.data[0]' -r | \
grep -5 Deprecated
[1] https://dbus.freedesktop.org/doc/dbus-specification.html#standard-interfaces-introspectable
This option was only introduced only to allow keeping the old behavior
in RHEL7, while the default order was changed from 'ifindex' to 'name'
in RHEL8. The usefulness of this option is questionable, as 'name'
together with predictable interface names should give predictable order.
When not using predictable interface names, the name is unpredictable
but so is the ifindex.
https://issues.redhat.com/browse/NMT-926https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1814
Remove all the code that was added for the CSME coexistence.
The Intel WiFi team can't commit on when, if at all, this feature will
be completely integrated and tested in the NetworkManager.
The preferred solution for now is the solution that involves the kernel
only.
Remove the code that was merged so far.
The device authentication request is an async process, it can not know
the answer right away, it is not guarantee that device is still
exported on D-Bus when authentication finishes. Thus, do not return
SUCCESS and abort the authentication request when device is not alive.
https://bugzilla.redhat.com/show_bug.cgi?id=2210271
When activating a port connection it will require the controller
connection is active or a valid controller device candidate is available
for activation.
One of the conditions we consider for a controller device to be a valid
candidate for the connection is that it is not active, therefore we
should also consider as valid a device that is currently deactivating.
Otherwise, we could fail during the port activation just because the
deactivation of the controller device candidate didn't finish yet.
https://bugzilla.redhat.com/show_bug.cgi?id=2125615https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1693
By default, bond/bridge/team devices ignore carrier, and so do their
ports. However, it can make sense to set '[device*].ignore-carrier' for
the controller device. Meaningfully support that.
This is a follow up to commit 8c91422954 ('device: handle carrier
changes for master device differently'), which didn't fully solve the
problem.
What already works, is that when you set ignore-carrier for the
controller, then after loss of carrier and a carrier wait timeout, the
controller and ports go down. If both the controller and port profiles
have autoconnect disabled, they stay down and that's it. It works as
expected, but is not very useful, because when we want to automatically
react on carrier loss, we also want to automatically reconnect.
For controller profiles, carrier only makes sense when ports are
attached. However, we can (auto) activate controller profiles without
ports. So when the user enables autoconnect for the controller profile,
then the profile will eagerly reconnect. That means, after loss of
carrier, the device goes down and reconnects right away. It means, when
configuring a bond with ignore-carrier=no and autoconnect=yes, then
the sensible thing happens (an immediate reconnect). That is just not
a useful configuration.
The useful way to configure configure ignore-carrier=no for a controller
device, autoconnect on the master must be disabled while being enabled
on the ports. After all, it's the ports that will autoconnect based on
the carrier state and bring up the controller with them.
Note that at the moment when a port decides to autoconnect, the
controller profile is not yet selected. That only happens later during
_internal_activate_device() after searching it with find_master(). At
that point, the port profile checks whether it should autoconnect based
on its own carrier state, and abort if not.
If autoconnect is aborted due to lack of carrier, the profile gets
blocked from autoconnect with reason "failed". Hence, when the carrier
returns, we need to clear any "failed" blocked reasons and schedule
another autoconnect check,
Note that this really only works if the port is itself a simple device,
like an ethernet. If the port is itself a software device (like a bond,
or a VLAN), then the carrier state in _internal_activate_device() is
unknown, and we cannot avoid autoconnect. It's unclear how that could
make sense, if at all.
This setup can be combined with "connection.autoconnect-slaves=yes". In
that case, we have the first port to autoconnect when they get carrier,
bringing up the controller too. Usually the other ports that don't have
carrier would not autoconnect, but with autoconnect-slaves they will.
The effect is, that we autoconnect whenever any of the ports has
carrier, and then we immediately also bring up the ports that don't have
carrier (which we usually would not).
https://bugzilla.redhat.com/show_bug.cgi?id=2156684https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1658
When managing a device after it is realized, we previously always set
the NOW_MANAGED reason, that makes the device fully-managed.
This works based on the assumption that initially an external device
has unmanaged flag EXTERNAL_DOWN set, and therefore the device stays
unmanaged during realization.
It is possible that an external device appears already with addresses
(or attached to a controller); we need to set reason
CONNECTION_ASSUMED if it's an external device, so that we don't set
sys-iface-state=managed.
Reproducer:
ip link add br1 type bridge
killall -STOP NetworkManager
ip link add dummy1 type dummy
ip link set dummy1 master br1
ip link set dummy1 up
sleep .5
killall -CONT NetworkManager
After this, dummy1 is fully managed by NM while it shouldn't.
https://bugzilla.redhat.com/show_bug.cgi?id=2149012