while the default value of the same in NM is 0, which causes warnings to
arise, change this default value.
Allow the range in NM to stay 0-255 as 0 is used to indicate arp_missed_max
is unset (for modes that don't support the setting), however do not let it
be set beyond the kernel permissible range for the modes that support it, set
it to the kernel default of 2 instead.
Do not apply or reapply the arp_missed_max setting when it is not
supported.
Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
In kernel, the onlink flag (RTNH_F_ONLINK) is associated with each
nexthop (rtnh_flags) rather than the route as a whole. NM previously
stored it only per-route in NMPlatformIPRoute.r_rtm_flags, which meant
that two nexthops only differing with the onlink flag were combined
as one entry in the platform cache.
Fix this by tracking the onlink flag per-nexthop.
Resolves: https://issues.redhat.com/browse/NMT-1486
GENEVE (Generic Network Virtualization Encapsulation) is a network
tunneling protocol that provides a flexible encapsulation format for
overlay networks. It uses UDP as the transport protocol and supports
variable-length metadata in the tunnel header.
This patch adds GENEVE tunnel to NM's platform layer:
- Add platform API functions (nm_platform_link_geneve_add,
nm_platform_link_get_lnk_geneve)
- Netlink message parsing for the following attributes:
* IFLA_GENEVE_ID - VNI (Virtual Network Identifier)
IPv4 and IPv6 remote
* IFLA_GENEVE_REMOTE
* IFLA_GENEVE_REMOTE6
TTL, TOS, and DF flags
* IFLA_GENEVE_TTL
* IFLA_GENEVE_TOS
* IFLA_GENEVE_DF
UDP destination port
* IFLA_GENEVE_PORT
- Add test cases for GENEVE tunnel creation and detection with two test
modes covering IPv4 and IPv6.
The implementation tries to follow the same patterns as other tunnel
types (GRE, VXLAN, etc.) and integrates with the existing platform
abstraction layer.
This new endpoint type has been recently added to the kernel in v6.18
[1]. It will be used to create new subflows from the associated address
to additional addresses announced by the other peer. This will be done
if allowed by the MPTCP limits, and if the associated address is not
already being used by another subflow from the same MPTCP connection.
Note that the fullmesh flag takes precedence over the laminar one.
Without any of these two flags, the path-manager will create new
subflows to additional addresses announced by the other peer by
selecting the source address from the routing tables, which is harder to
configure if the announced address is not known in advance.
The support of the new flag is easy: simply by declaring a new flag for
NM, and adding it in the related helpers and existing checks looking at
the different MPTCP endpoint. The documentation now references the new
endpoint type.
Note that only the new 'define' has been added in the Linux header file:
this file has changed a bit since the last sync, now split in two files.
Only this new line is needed, so the minimum has been modified here.
Link: https://git.kernel.org/torvalds/c/539f6b9de39e [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
(cherry picked from commit 2b03057de0)
The function should modify the "ip6_address" member of the union. In
practice, it doesn't matter because the ifindex is the first member of
both "ip4_address" and "ip6_address".
Rename nm_linux_platform_get_link_fdb_table() to
nm_linux_platform_get_bridge_fdb(). The new name better indicates that
the function returns the bridge FDB entries.
The string is freed with g_free(), it needs to be allocated with
g_strdup(). In practice, the GLib allocator uses malloc() nowadays,
but it is better to be consistent.
The RT_VIA attribute is used to specify a gateway of a different
address family. It is currently used only for IPv4 routes.
[bgalvani@redhat.com: amended the commit message]
Introduce some basic infrastructure to perform ethtool operations via
netlink. As a proof of concept, implement the pause settings.
Netlink has some advantages over ioctl():
- it can be easily extended with new attributes;
- it can return descriptive error messages via the extended ack
mechanism. For example, when setting the ring parameters to a value
outside the allowed range, userspace receives error code -EINVAL
and message "requested ring size exceeds maximum". ioctl() gets
only -EINVAL, which is shared among many error reasons;
- since it's possible to specify an ifindex in the request, there are
no race conditions when the interface name changes;
New ethtool API is available only via netlink; however it makes sense
to start using netlink also for the old API that NM is already using
(pause, eee, rings, etc.) over ioctl() because of the advantages
described above.
We're going to replace most of the ioctl-based ethtool functions with
a netlink-based equivalent. Move the ioctl ones to a separate file so
that it's easier to see what still needs to be converted. Also add a
common prefix to the function names.
iproute2 and the kernel accept 0 as valid rto_min value:
# ip route add 172.16.0.1 dev enp1s0 rto_min 0ms
# ip route show
172.16.0.1 dev enp1s0 scope link rto_min lock 0ms
Even if a value of 0ms would not be useful in practice, it is better
to exactly track what kernel reports, instead of assuming that when
the value is zero it is "not set".
The function introduced queries the FDB table via netlink socket. It
accepts a list of ifindexes to filter out the FDB content not related to
it. It returns an array of MAC addresses.
To cltarify this function is unusually exposed directly on
nm-linux-platform.h as we don't want this be part of the whole
NMPlatform object or cache. This, is an exception to the rule to
simplify the integration of this functionality on NetworkManager.
In addition, it also doesn't use the async mechanism that is widely used
on netlink communication across nm-linux-platform. Again, the reason is
to simplify its use, as async communication won't provide a benefit to
the use cases we have planned for this, i.e balance-slb RARP announcing.
By default, on reapply we were only syncing the main routes table. This
causes that routes added by NM to other tables are not removed on
reapply. This was done to preserve routes added externally, but routes
added by NM itself should be removed.
Add a new route table syncing mode "main + NM routes". This mode
maintains the normal behaviour of syncing completely the main table,
and for other tables removes only routes that were added by us, leaving
the rest untouched. Use this mode by default, as this is what a user
would expect on reapply.
Note: this might not work if NM is restarted between the profile being
modified and the reapply, because NM forgets what routes were added by
itself because of the restart. This is a rare corner case, though.
Use the D-Bus property "VersionInfo" to expose a capability flag
indicating that this bug is fixed. It is the first capability that we
expose in this way. However, it is convenient to do it this way as it's
something that clients like nmstate needs to know, so they can decide
whether a conn down is needed or not. It is not enough to decide that by
version number because it might be fixed via a downstream patch in distros
like RHEL.
https://issues.redhat.com/browse/RHEL-67324https://issues.redhat.com/browse/RHEL-66262
Fixes: e9c17fcc9b ('l3cfg: default to 'main' route table sync mode')
The difference between FULL and ALL was not obvious without reading the
documentation. Moreover, a new mode is going to be introduced so the
confusion could grow. Rename to a more explicit name.
Introducing support of ethtool FEC mode:
D-BUS API: `fec-mode: uint32_t`.
Keyfile:
```
[ethtool]
fec-mode=<uint32_t>
```
nmcli: `ethtool.fec-mode` allowing values are any combination of:
* auto
* off
* rs
* baser
* llrs
Unit test cases included.
Resolves: https://issues.redhat.com/browse/RHEL-24055
Signed-off-by: Gris Ge <fge@redhat.com>
This patch add support to IPVLAN interface. IPVLAN is a driver for a
virtual network device that can be used in container environment to
access the host network. IPVLAN exposes a single MAC address to the
external network regardless the number of IPVLAN device created inside
the host network. This means that a user can have multiple IPVLAN
devices in multiple containers and the corresponding switch reads a
single MAC address. IPVLAN driver is useful when the local switch
imposes constraints on the total number of MAC addresses that it can
manage.
Move the static _ip4_address_is_link_local() check to a new global
nm_platform_ip4_address_is_link_local() helper so we can check if
an IPv4 is link local in other files
If the socket's RX buffer is full it's probably because other
process is doing lot of changes very quickly, faster than we
can process them. Let's give the writer a small time to finish:
1. Avoid contending the kernel's RTNL lock, so we don't make
the whole situation even worse and it can finish earlier.
2. Avoid having to resync again and again due to trying to
resync while the writer is still doing quick changes, so
we are unable to catch up yet.
This won't help if this situation takes a long time or is
continuous, but that's unlikely to happen, and if it does,
it's the writer's fault for starving the whole system.
There is no need to progresively increase the backoff time
for the same reason: if this situation takes lot of time,
it's the writer's fault. It's neither a good idea because the whole NM
process will end being sleeping long times, not doing anything at all,
without being able to react when the Netlink messages burst stops.
Add a function to compare two arrays of NMPlatformBridgeVlan. It will
be used in the next commit to compare the VLANs from platform to the
ones we want to set.
To compare in a performant way, the vlans need to be normalized (no
duplicated VLANS, ranges into their minimal expression...). Add the
function nmp_utils_bridge_vlan_normalize.
Co-authored-by: Íñigo Huguet <ihuguet@redhat.com>
Currently, nm_platform_link_set_bridge_vlans() accepts an array of
pointers to vlan objects; to avoid multiple allocations,
setting_vlans_to_platform() creates the array by piggybacking the
actual data after the pointers array.
In the next commits, the array will need to be manipulated and
extended, which is difficult with the current structure. Instead, pass
separately an array of objects and its size.
In case the platform fails dumping a specific route protocol, retry
multiple times. If all attempts fail, emit a warning and proceed as
there is nothing more to do.
When doing a dump of routes, we want to exclude routes having
protocols we do not care about. Since the netlink socket has
STRICT_CHK enabled, we can request multiple dumps for the protocols we
need.
While doing 6 dumps is less efficient than doing 1, it normally
doesn't matter. However, the new implementation is more efficient when
there are e.g. millions of BGP routes that can be excluded from the
results.
Introduce an array of tracked route protocols that will be used in the
next commit. To have the list of protocols defined in a single place,
define a macro.