The function should modify the "ip6_address" member of the union. In
practice, it doesn't matter because the ifindex is the first member of
both "ip4_address" and "ip6_address".
We're going to replace most of the ioctl-based ethtool functions with
a netlink-based equivalent. Move the ioctl ones to a separate file so
that it's easier to see what still needs to be converted. Also add a
common prefix to the function names.
This patch add support to IPVLAN interface. IPVLAN is a driver for a
virtual network device that can be used in container environment to
access the host network. IPVLAN exposes a single MAC address to the
external network regardless the number of IPVLAN device created inside
the host network. This means that a user can have multiple IPVLAN
devices in multiple containers and the corresponding switch reads a
single MAC address. IPVLAN driver is useful when the local switch
imposes constraints on the total number of MAC addresses that it can
manage.
When we recibe a Netlink message with a "route change" event, normally
we just ignore it if it's a route that we don't track (i.e. because of
the route protocol).
However, it's not that easy if it has the NLM_F_REPLACE flag because
that means that it might be replacing another route. If the kernel has
similar routes which are candidates for the replacement, it's hard for
NM to guess which one of those is being replaced (as the kernel doesn't
have a "route ID" or similar field to indicate it). Moreover, the kernel
might choose to replace a route that we don't have on cache, so we know
nothing about it.
It is important to note that we cannot just discard Netlink messages of
routes that we don't track if they has the NLM_F_REPLACE. For example,
if we are tracking a route with proto=static, we might receive a replace
message, changing that route to proto=other_proto_that_we_dont_track. We
need to process that message and remove the route from our cache.
As NM doesn't know what route is being replaced, trying to guess will
lead to errors that will leave the cache in an inconsistent state.
Because of that, it just do a cache resync for the routes.
For IPv4 there was an optimization to this: if we don't have in the
cache any route candidate for the replacement there are only 2 possible
options: either add the new route to the cache or discard it if we are
not interested on it. We don't need a resync for that.
This commit is extending that optimization to IPv6 routes. There is no
reason why it shouldn't work in the same way than with IPv4. This
optimization will only work well as long as we find potential candidate
routes in the same way than the kernel (comparing the same fields). NM
calls to this "comparing by WEAK_ID". But this can also happen with IPv4
routes.
It is worth it to enable this optimization because there are routing
daemons using custom routing protocols that makes tens or hundreds of
updates per second. If they use NLM_F_REPLACE, this caused NM to do a
resync hundreds of times per second leading to a 100% CPU usage:
https://issues.redhat.com/browse/RHEL-26195
An additional but smaller optimization is done in this commit: if we
receive a route message for routes that we don't track AND doesn't have
the NLM_F_REPLACE flag, we can ignore the entire message, thus avoiding
the memory allocation of the nmp_object. That nmp_object was going to be
ignored later, anyway, so better to avoid these allocations that, with
the routing daemon of the above's example, can happen hundreds of times
per second.
With this changes, the CPU usage doing `ip route replace` 300 times/s
drops from 100% to 1%. Doing `ip route replace` as fast as possible,
without any rate limitting, still keeps NM with a 3% CPU usage in the
system that I have used to test.
This patch add support to HSR/PRP interface. Please notice that PRP
driver is represented as HSR too. They are different drivers but on
kernel they are integrated together.
HSR/PRP is a network protocol standard for Ethernet that provides
seamless failover against failure of any network component. It intends
to be transparent to the application. These protocols are useful for
applications that request high availability and short switchover time
e.g electrical substation or high power inverters.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1791
There really is no way around this. As we don't cache all the routes
(e.g. ignored based on rtm_protocol or rtm_type), we cannot know which
route was replaced, when we get a NLM_F_REPLACE message.
We need to request a new dump in that case, which can be expensive, if
there are a lot of routes or if replace happens frequently.
The only possible solutions would be:
1) NetworkManager caches all routes, but it also needs to make sure to
get *everything* right. In particular, to understand every relevant
route attribute (including those added in the future, which is
impossible).
2) kernel provides a reasonable API (rhbz#1337855, rhbz#1337860) that
allows to sufficiently understand what is going on based on the
netlink notifications.
We don't cache certain routes, for example based on the protocol. This is
a performance optimization to ignore routes that we usually don't care
about.
Still, if the user does `ip route replace` with such a route, then we
need to pass it to nmp_cache_update_netlink_route(), so that we can
properly remove the replaced route.
Knowing which route was replaces might be impossible, as our cache does
not contain all routes. Likely all that nmp_cache_update_netlink_route()
can to is to set "resync_required" for NLM_F_REPLACE. But for that it
should see the object first.
This also means, if we ever write a BPF filter to filter out messages
that contain NLM_F_REPLACE, because that would lead to cache inconsistencies.
The route table is part of the weak-id. You can see that with:
ip route replace unicast 1.2.3.4/32 dev eth0 table 57
ip route replace unicast 1.2.3.4/32 dev eth0 table 58
afterwards, `ip route show table all` will list both routes. The replace
operation is only per-table. Note that NMP_CACHE_ID_TYPE_ROUTES_BY_WEAK_ID
already got this right.
Fixes: 10ac675299 ('platform: add support for routing tables to platform cache')
We sometimes have functions foo() and foo_full(), in which case
foo() has fewer arguments and just calls foo_full(). The "full"
function here is the more powerful one, and foo() is implemented
in terms of the former.
nm_platform_ip4_route_cmp_full() and m_platform_ip4_route_cmp() inverted
that pattern. The "_full" there stands for the full comparison, to not
allowing to select the comparison type.
That inconsistency is ugly. Also, these wrappers were used at only few
places. Let's drop them.
While at it, also drop nm_platform_qdisc_cmp() and rename
nm_platform_qdisc_cmp_full(). Here cmp()/cmp_full() followed the common
pattern foo()/foo_full(), but it's still hardly used and unnecessary.
When reading from netlink an ECMP IPv4 route, we need to parse the
multiple nexthops. In order to do that, we are introducing
NMPlatformIP4RtNextHop struct.
The first nexthop information will be kept at the original
NMPlatformIP4Route and the new property n_nexthops will indicate how
many nexthops we need to consider.
The NMPObject is a tagged union. There is no need to initialize anything
after the size of the actually used union field. Change this, so maybe we
get a valgrind warning about uninitialized memory if we wrongly try to
access it.
On the other hand, the object really is supposed to be a full
NMPObject. Previously, we would get a valgrind warning, if we tried to
pass fewer data there.
It really doesn't matter much, but all other functions don't assume
that there is any important data after the size indicated by the class.
We will extend IPv4 routes with the list of next hops. This field will
be heap allocated and be part of the NMPObjectIP4Route object, while
also being part of the identity. To support the ID operator that checks
fields of the NMPObject, add a "for_id" argument to the hash/cmp hooks.
Also, a function that sets cmd_obj_{hash_update,cmp}() MUST not set
cmd_plobj_id_{hashupdate,cmp}(), as it would have overlapping
functionality. Therefore, the objects that define
cmd_obj_{hash_update,cmp}() need to fully implement the ID comparison.
A NMPClass that has data outside the plobj part, needs
to implement the cmd_obj_*() hooks, instead of cmd_plobj_*().
For those objects, reasoning only about the plobj part is not
sufficient. Implementing both hooks is also unnecessary and
confusing.
Ensure that if we have cmd_obj_*() hooks set, that the corresponding
cmd_plobj_*() hooks are unset.
The if-else-if was wrong. It meant that if an object did not implement
cmd_plobj_id_copy(), nothign was copied (for id-only).
I think this code path was not actually hit, because we never clone
an object only by ID.
Fixes: c91a4617a1 ('nmp-object: allow missing implementations for certain virtual functions')
- name things related to `in_addr_t`, `struct in6_addr`, `NMIPAddr` as
`nm_ip4_addr_*()`, `nm_ip6_addr_*()`, `nm_ip_addr_*()`, respectively.
- we have a wrapper `nm_inet_ntop()` for `inet_ntop()`. This name
of our wrapper is chosen to be familiar with the libc underlying
function. With this, also name functions that are about string
representations of addresses `nm_inet_*()`, `nm_inet4_*()`,
`nm_inet6_*()`. For example, `nm_inet_parse_str()`,
`nm_inet_is_normalized()`.
<<<<
R() {
git grep -l "$1" | xargs sed -i "s/\<$1\>/$2/g"
}
R NM_CMP_DIRECT_IN4ADDR_SAME_PREFIX NM_CMP_DIRECT_IP4_ADDR_SAME_PREFIX
R NM_CMP_DIRECT_IN6ADDR_SAME_PREFIX NM_CMP_DIRECT_IP6_ADDR_SAME_PREFIX
R NM_UTILS_INET_ADDRSTRLEN NM_INET_ADDRSTRLEN
R _nm_utils_inet4_ntop nm_inet4_ntop
R _nm_utils_inet6_ntop nm_inet6_ntop
R _nm_utils_ip4_get_default_prefix nm_ip4_addr_get_default_prefix
R _nm_utils_ip4_get_default_prefix0 nm_ip4_addr_get_default_prefix0
R _nm_utils_ip4_netmask_to_prefix nm_ip4_addr_netmask_to_prefix
R _nm_utils_ip4_prefix_to_netmask nm_ip4_addr_netmask_from_prefix
R nm_utils_inet4_ntop_dup nm_inet4_ntop_dup
R nm_utils_inet6_ntop_dup nm_inet6_ntop_dup
R nm_utils_inet_ntop nm_inet_ntop
R nm_utils_inet_ntop_dup nm_inet_ntop_dup
R nm_utils_ip4_address_clear_host_address nm_ip4_addr_clear_host_address
R nm_utils_ip4_address_is_link_local nm_ip4_addr_is_link_local
R nm_utils_ip4_address_is_loopback nm_ip4_addr_is_loopback
R nm_utils_ip4_address_is_zeronet nm_ip4_addr_is_zeronet
R nm_utils_ip4_address_same_prefix nm_ip4_addr_same_prefix
R nm_utils_ip4_address_same_prefix_cmp nm_ip4_addr_same_prefix_cmp
R nm_utils_ip6_address_clear_host_address nm_ip6_addr_clear_host_address
R nm_utils_ip6_address_same_prefix nm_ip6_addr_same_prefix
R nm_utils_ip6_address_same_prefix_cmp nm_ip6_addr_same_prefix_cmp
R nm_utils_ip6_is_ula nm_ip6_addr_is_ula
R nm_utils_ip_address_same_prefix nm_ip_addr_same_prefix
R nm_utils_ip_address_same_prefix_cmp nm_ip_addr_same_prefix_cmp
R nm_utils_ip_is_site_local nm_ip_addr_is_site_local
R nm_utils_ipaddr_is_normalized nm_inet_is_normalized
R nm_utils_ipaddr_is_valid nm_inet_is_valid
R nm_utils_ipx_address_clear_host_address nm_ip_addr_clear_host_address
R nm_utils_parse_inaddr nm_inet_parse_str
R nm_utils_parse_inaddr_bin nm_inet_parse_bin
R nm_utils_parse_inaddr_bin_full nm_inet_parse_bin_full
R nm_utils_parse_inaddr_prefix nm_inet_parse_with_prefix_str
R nm_utils_parse_inaddr_prefix_bin nm_inet_parse_with_prefix_bin
R test_nm_utils_ip6_address_same_prefix test_nm_ip_addr_same_prefix
./contrib/scripts/nm-code-format.sh -F
Since the generic netlink API does (currently) not support notifications
about changes of the MPTCP addresses, we won't get notifications when
they change, and it seems wrong to put such things in the NMPlatform
cache.
We can just get the list of endpoints by polling, so add a function
nm_platform_mptcp_addrs_dump() for that.
Also, add nm_platform_mptcp_addr_update() which can add/remove/update
MPTCP addresses.
It's not very clear what the best identity is.
For example, in kernel you cannot add two MPTCP addresses that only differ by
ifindex. Thus (as far as kernel is concerned), the ifindex is not part of the
identity. Still, as we will have an interface centric view, this will be
useful for us.
On the other hand, to kernel is the "id" a second primary key, along
side "addr:port". However, to us it's not useful to consider that as
part of nmp_object_id_equal(), because usually kernel will pick an "id"
for us, and when we track objects that we are about to add, they don't
have an "id" yet.
So, adjust nmp_object_id_equal(). However -- somewhat unusual -- let it
deviate from kernel's understanding of what defines an MPTCP address.
sysfs is deprecated and kernel people will not add new bond options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
An NMPObject is hashable, can be compared and printed. That is useful.
Make an NMPObject for MPTCP addresses. It will hold the content of
MPTCP_PM_ATTR_ADDR netlink attribute. But like other NMPObject types it
will also be used to represent the data as NetworkManager tracks it.
An object type that doesn't implement cmd_plobj_id_copy(), either:
- implements cmd_obj_copy(), but then we cannot copy the ID only
to a stack instance, because that cannot track ownership.
This is a bug in the caller. We cannot use stackinit for an
object of a type that is not a plain old data (in C++ terms).
- fallback to plain memcpy(). That is in line with nmp_object_clone().
and nmp_object_copy().
In the past, nmp_lookup_init_object() could both lookup all object for a
certain ifindex, and lookup all objects of a type. That fallback path
already leads to an assertion failure fora while now, so nobody should
be using this function to lookup all objects of a certain type (for
what, we have nmp_lookup_init_obj_type()).
Now, remove the fallback path, and rename the function to what it really
does.
NMPObject is a union. It's not clear to me that C guarnatees that
designated initializers will meaningfully set all fields to zero. Use
memset() instead.
We already have a comparison of NMPlatformIPXAddress with the modes
"full" and "id". The former is needed to fully compare two addresses,
the latter as identity for tracking addresses in the cache.
In NetworkManager we also use the NMPlatformIP[46]Address structure to
track the addresses we want to configure. When we add them in kernel,
we will later see them in the platform cache. However, some fields
will be slightly different. For example, "addr_source" address will
always be "kernel", because that one is not a field we configure in
kernel. Also, the "n_ifa_flags" probably differ (getting "permanent"
and "secondary" flags).
Add a compare function that can ignore such differences.
Also add nm_platform_vtable_address for accessing the IPv4 and IPv6
methods generically (based on an "IS_IPv4" variable).
(cherry picked from commit ef1b60c061)
So far, certain NMObject types could not have an ifindex of zero. Hence,
nmp_lookup_init_object() took such an ifindex to mean lookup all objects
of that type.
Soon, we will support blackhole/unreachable/prohibit route types, which
have their ifindex set to zero. It is still useful to lookup those routes
types via nmp_lookup_init_object().
Change behaviour how to interpret the ifindex. Note that this also
affects various callers of nmp_lookup_init_object(). If somebody was
relying on the previous behavior, it would need fixing.
(cherry picked from commit d4ad9666bd)
_vt_cmd_obj_is_alive_ipx_route() is called by nmp_object_is_alive().
Non-alive objects are not put into the cache.
That certainly makes sense for RTM_F_CLONED routes, because they are
generated ad-hoc during the `ip route get` request.
Checking for the ifindex is not necessary. For one, some route types
(blackhole, unreachable, prohibit) don't have an ifindex. Also, the
purpose of _vt_cmd_obj_is_alive_ipx_route() is not to validate the
object. Just don't create objects without an ifindex, if you think the
route needs an ifindex. Checking here is not useful.
We also don't check that other fields like rt_source are valid, so there
is no need to do it for the ifindex either.
(cherry picked from commit 1123d3a5fb)
We use clang-format for automatic formatting of our source files.
Since clang-format is actively maintained software, the actual
formatting depends on the used version of clang-format. That is
unfortunate and painful, but really unavoidable unless clang-format
would be strictly bug-compatible.
So the version that we must use is from the current Fedora release, which
is also tested by our gitlab-ci. Previously, we were using Fedora 34 with
clang-tools-extra-12.0.1-1.fc34.x86_64.
As Fedora 35 comes along, we need to update our formatting as Fedora 35
comes with version "13.0.0~rc1-1.fc35".
An alternative would be to freeze on version 12, but that has different
problems (like, it's cumbersome to rebuild clang 12 on Fedora 35 and it
would be cumbersome for our developers which are on Fedora 35 to use a
clang that they cannot easily install).
The (differently painful) solution is to reformat from time to time, as we
switch to a new Fedora (and thus clang) version.
Usually we would expect that such a reformatting brings minor changes.
But this time, the changes are huge. That is mentioned in the release
notes [1] as
Makes PointerAligment: Right working with AlignConsecutiveDeclarations. (Fixes https://llvm.org/PR27353)
[1] https://releases.llvm.org/13.0.0/tools/clang/docs/ReleaseNotes.html#clang-format
With LTO builds, it assumes that the assertion-failed code-paths
can be reached, and thus a warning gets emitted:
In function nmp_cache_lookup,
inlined from nm_platform_lookup at src/libnm-platform/nm-platform.c:3377:12,
inlined from nm_platform_lookup_object at ./src/libnm-platform/nmp-object.h:975:12:
src/libnm-platform/nmp-object.h:742:46: error: lookup.cache_id_type may be used uninitialized [-Werror=maybe-uninitialized]
742 | return nmp_cache_lookup_all(cache, lookup->cache_id_type, &lookup->selector_obj);
| ^
./src/libnm-platform/nmp-object.h: In function nm_platform_lookup_object:
./src/libnm-platform/nmp-object.h:972:15: note: lookup declared here
972 | NMPLookup lookup;
| ^
The "utils" part does not seem useful in the name.
Note that we also have NMStrBuf, which is named nm_str_buf_*().
There is an unfortunate similarity between the two, but it's still
distinct enough (in particular, because one takes an NMStrBuf and
the other not).
For one, "src_n_map" must always be greater than zero at this point.
lgtm.com warns about that, and the point of this patch is to avoid
that warning.
Still, the check really isn't needed, also because nm_memdup() explicitly
handles buffers sizes of zero.