The deactivation can happen while we are waiting for the ifindex, and
it can happen via two code paths, depending on the state. For a
regular deactivation, method deactivate_async() is called. Otherwise,
if the device goes directly to UNMANAGED or UNAVAILABLE, deactivate()
is called. We need to make sure that signal and source handlers are
disconnected, so that they are not called at the wrong time.
Fixes: 99a6c6eda6 ('ovs, dpdk: fix creating ovs-interface when the ovs-bridge is netdev')
nm_strv_find_first() is useful (and used) to find the first index (if
any). I can thus also used to check for membership.
However, we also have nm_strv_contains(), which seems better for
readability, when we check for membership. Use it.
When IPv6 is disabled in kernel but ipv6.method is set to auto, NetworkManager repeatedly attempts
IPv6 configuration internally, resulting in unnecessary warning messages being output infinitely.
platform-linux: do-add-ip6-address[2: fe80::5054:ff:fe7c:4293]: failure 95 (Operation not supported)
ipv6ll[e898db403d9b5099,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
platform-linux: do-add-ip6-address[2: fe80::5054:ff:fe7c:4293]: failure 95 (Operation not supported)
ipv6ll[e898db403d9b5099,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
platform-linux: do-add-ip6-address[2: fe80::5054:ff:fe7c:4293]: failure 95 (Operation not supported)
ipv6ll[e898db403d9b5099,ifindex=2]: changed: no IPv6 link local address to retry after Duplicate Address Detection failures (back off)
To prevent this issue, let's disable IPv6 in NetworkManager when it is disabled in the kernel.
In order to do it in activate_stage3_ip_config() only once during activation,
the firewall initialization needed to be moved earlier. Otherwise, the IPv6 disablement could occur
twice during activation because activate_stage3_ip_config() is also executed from subsequent of fw_change_zone().
`_ethtool_*_reset()` functions already check that the state is not
NULL, no need to check it before. The only exception was for "feature"
settings, where the check was missing.
Sending a client-id is not mandatory according to RFC2131. It is
mandatory according to RFC4361 that superseedes it.
Some weird DHCP servers conforming RFC2131 can get confused and break
existing DHCP leases if they start receiving a client-id when it was not
being previously received. Users that were using other DHCP client like
dhclient, but want to use NetworkManager's internal DHCP client, can
suffer this problem.
Add "none" as accepted value in ipv4.dhcp-client-id to specify that
client-id must not be sent. Note that this is generally not recommended
unless it's explicitly needed for some reason like the explained above.
Client-id is mandatory in DHCPv6.
This commit allow to set the "none" value and properly parse it in the
NMDhcpClientConfig struct. Next commits will modify the different DHCP
plugins to honor it.
sysfs is deprecated and kernel will not add new bridge port options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
It seems more useful to have a best effort approach and configure
everything we can; in that way we achieve at least some connectivity,
and then sysadmin can check the logs in case something is
missing. Currently instead, the whole activation fails (so, no address
is configured) if just one of the addresses fails DAD.
Ideally, we should have a way to make this configurable; but for now,
implement the more useful behavior as default.
IPv4 and IPv6 DAD work slightly differently: for IPv4 the presence or
absence of carrier doesn't have any effect on the duration of the
probe; for IPv6, DAD never completes without carrier because kernel
never removes the tentative flag.
In both cases, we shouldn't ignore the DAD result because that would
mean that we complete the ipmanual method without addresses actually
configured.
Currently, IPv4 shared mode fails to start when DAD is enabled because
dnsmasq tries to bind to an address that is not yet configured on the
interface. Delay the start of dnsmasq until the shared4 l3cd is ready.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
The condition in `_get_maybe_ipv6_disabled()` is improperly set which
returns the wrong value on if an device is disabled or not when
generating the assume connection. And when
`/proc/sys/net/ipv6/conf/$DEV/disable_ipv6` is not existed (not
disabling ipv6 through sysctl setting), IPv6 is disabled by default.
Fixes: be655e6ed1 ('core: read "disable_ipv6" sysctl before nm_ip6_config_create_setting()')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1743
On very particular timing, if a connection is currently activating
on a modem device and user remove the remote settings associated
an device state change:
prepare -> deactivating (reason 'connection-removed', sys-iface-state: 'managed')
pops before entering into modem_prepare_result, resulting to a crash
on assertion.
We can simply check for the modem state to failed, set the success flag
to FALSE and continue.
Closes: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1354
Signed-off-by: Frederic Martinsons <frederic.martinsons@unabiz.com>
l3cfg emits a log for ACD conflicts. However, l3cfg is not aware of
what are the related NMDevice or the currently active connection, and
so it can't log the proper metadata fields (NM_DEVICE and
NM_CONNECTION) to the journal.
Instead, let NMDevice log about ACD collisions; in this way, it is
possible to get the message when filtering by device and connection.
For example:
$ journalctl -e NM_CONNECTION=d1df47be-721f-472d-a1bf-51815ac7ec3d + NM_DEVICE=veth0
<info> device (veth0): IP address 172.25.42.1 cannot be configured because it is already in use in the network by host 00:99:88:77:66:55
<info> device (veth0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
<warn> device (veth0): Activation: failed for connection 'veth0+'
Parse the access point announced bandwidth in MHz. This is considering
both HT and VHT. Please notice that for VHT 80+80 MHz we are representing it
as 160 MHz.
Software devices that are controllers like bond/bridge/team when
configured to not ignore carrier are being deleted when deactivating the
device. Software devices that are not controllers, shouldn't be deleted.
Otherwise, if a VLAN link is deleted because the ethernet carrier-change
then NetworkManager won't be able to reactivate the VLAN once the
ethernet gets carrier because the link is not present.
This is restoring the previous behaviour and it's know to be relied on
by users.
https://bugzilla.redhat.com/show_bug.cgi?id=2224479https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1701
Fixes: efa63aef3a ('device: delete software device when software devices lose carrier')
The `nm_device_hw_addr_reset()` should only set MAC address on NIC
with valid(>0) interface index.
The failure was found by `ovs_mtu` test of NMCI, failed to reproduce
the original problem (`ovs_mtu` test of NMCI) with 100 times retry.
And no trace log found for original test failure, hence cannot tell why
`nm_device_hw_addr_reset()` been invoked with iface index 0.
Signed-off-by: Gris Ge <fge@redhat.com>
We delete devices when the connection goes down and NetworkManager
created the device earlier.
Software devices like bond/bridge/team default to ignoring carrier.
However, when configuring them to not ignore carrier
([device].ignore-carrier), they were not deleted when deactivating the
devices.
This adjusts commit d0c2a24b71 ('device: do not remove software devices
on initial disconnected (rh #1035814)'). Note that back then there was
no check whether the device has an activation queued, so it behaved
differently then.
When the software device enters the UNAVAILABLE state from UNMANAGED,
during cleanup we shouldn't delete the link.
Co-Authored-By: Beniamino Galvani <bgalvani@redhat.com>
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1686
When user are changing SR-IOV VF settings for options like `max-tx-rate`
which some hardware not supported yet, the failure of this VF will fail
the whole activation, then the SR-IOV will be disabled means all the VFs
will be deleted.
Deleting VFs might break network connectivity and this collateral
damage of VF option failure is not acceptable for OpenShift use cases
even they have checkpoint protection.
This patch only log warn message on failure of VF options and will not
fail the activation.
NetworkManager also ignore MTU failure during activation, I believe this
fit into the same assumption.
User case reference: https://bugzilla.redhat.com/show_bug.cgi?id=2210164
Signed-off-by: Gris Ge <fge@redhat.com>
The macro uses g_alloca(). Using alloca() is potentially dangerous. For
example, it must never be used in an unbounded loop. This should be
immediately obvious from the name, so we don't accidentally use them
in the wrong context.
All other alloca() macros should have such a prefix already. And they
always have to be macros, because you couldn't use alloca() to return
memory from a function.
By default, bond/bridge/team devices ignore carrier, and so do their
ports. However, it can make sense to set '[device*].ignore-carrier' for
the controller device. Meaningfully support that.
This is a follow up to commit 8c91422954 ('device: handle carrier
changes for master device differently'), which didn't fully solve the
problem.
What already works, is that when you set ignore-carrier for the
controller, then after loss of carrier and a carrier wait timeout, the
controller and ports go down. If both the controller and port profiles
have autoconnect disabled, they stay down and that's it. It works as
expected, but is not very useful, because when we want to automatically
react on carrier loss, we also want to automatically reconnect.
For controller profiles, carrier only makes sense when ports are
attached. However, we can (auto) activate controller profiles without
ports. So when the user enables autoconnect for the controller profile,
then the profile will eagerly reconnect. That means, after loss of
carrier, the device goes down and reconnects right away. It means, when
configuring a bond with ignore-carrier=no and autoconnect=yes, then
the sensible thing happens (an immediate reconnect). That is just not
a useful configuration.
The useful way to configure configure ignore-carrier=no for a controller
device, autoconnect on the master must be disabled while being enabled
on the ports. After all, it's the ports that will autoconnect based on
the carrier state and bring up the controller with them.
Note that at the moment when a port decides to autoconnect, the
controller profile is not yet selected. That only happens later during
_internal_activate_device() after searching it with find_master(). At
that point, the port profile checks whether it should autoconnect based
on its own carrier state, and abort if not.
If autoconnect is aborted due to lack of carrier, the profile gets
blocked from autoconnect with reason "failed". Hence, when the carrier
returns, we need to clear any "failed" blocked reasons and schedule
another autoconnect check,
Note that this really only works if the port is itself a simple device,
like an ethernet. If the port is itself a software device (like a bond,
or a VLAN), then the carrier state in _internal_activate_device() is
unknown, and we cannot avoid autoconnect. It's unclear how that could
make sense, if at all.
This setup can be combined with "connection.autoconnect-slaves=yes". In
that case, we have the first port to autoconnect when they get carrier,
bringing up the controller too. Usually the other ports that don't have
carrier would not autoconnect, but with autoconnect-slaves they will.
The effect is, that we autoconnect whenever any of the ports has
carrier, and then we immediately also bring up the ports that don't have
carrier (which we usually would not).
https://bugzilla.redhat.com/show_bug.cgi?id=2156684https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1658
Struct allow named arguments, which seems easier to maintain instead of
a function with many arguments. Also, adding a new parameter does not
require changes to most of the callers.
The real advantage of this is that we encode all the search parameters
in one argument. And we can add that argument to
_match_section_infos_lookup(), alongside lookup by NMDevice or
NMPlatformLink.
All callers eventually want a boolean instead of a NMMatchSpecMatchType.
I think the NMMatchSpecMatchType enum still has value at the lower
layers, where the enum values are clearer (when reading the code). So
don't drop NMMatchSpecMatchType entirely.
However, let's add nm_match_spec_match_type_to_bool() to convert the
match-type to a boolean to avoid duplicating the code.
Arguably, a kernel link is needed for DHCP and so the interface name
univocally identifies a device (for example, the OVS interface). But
for consistency and clarity, store the device type to be used for
logging.
When logging, messages include the interface name to specify what
device they refer to. In most case the interface name is unique.
There are some devices that don't have a kernel link associated, and
their interface name is not guaranteed to be unique. This is currently
the case for OVS bridges and OVS ports. When reading a log with
duplicate interface names, it is difficult to understand what is
happening. And this is made worse by the fact that it is common
practice to assign the same name to all devices in a OVS hierarchy
(bridge, port, interface).
To make logs unambiguous, we want to print the device type together
with the name; however we don't want to *always* print the type
because in most cases it's not useful and it would consume valuable
real estate on the screen. Adopt a simple heuristic of showing the
type only for OVS devices.
This commit adds a helper function to return the device type to show
in logs, when it is needed.
It's not obvious, why we couldn't have a pending dever action
at that point. Maybe we cannot, but just to be explicit about it,
handle that we potentially might.
For example, we tend to schedule the timeout priv->carrier_defer_source
only from within nm_device_set_carrier() if `priv->carrier` is FALSE.
At the same time, nm_device_set_carrier() does nothing `if
(priv->carrier == carrier)`. So probably there is no problem.
However, we also set priv->carrier directly in
nm_device_set_carrier_from_platform() without clearing the timer. It's
hard to imagine whether there can be a case where we might have two
timeouts pending.
commit_option() was used in the past to set both bridge and bridge port
options using sysfs. Currently it is only used for bridge port options.
This patch removes the dead code for bridge options and unify it on
commit_port_options(). This is simplifying the work needed to support
bridge port option through netlink.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1643