G_TYPE_CHECK_INSTANCE_CAST() can trigger a "-Wcast-align":
src/core/devices/nm-device-macvlan.c: In function 'parent_changed_notify':
/usr/include/glib-2.0/gobject/gtype.h:2421:42: error: cast increases required alignment of target type [-Werror=cast-align]
2421 | # define _G_TYPE_CIC(ip, gt, ct) ((ct*) ip)
| ^
/usr/include/glib-2.0/gobject/gtype.h:501:66: note: in expansion of macro '_G_TYPE_CIC'
501 | #define G_TYPE_CHECK_INSTANCE_CAST(instance, g_type, c_type) (_G_TYPE_CIC ((instance), (g_type), c_type))
| ^~~~~~~~~~~
src/core/devices/nm-device-macvlan.h:13:6: note: in expansion of macro 'G_TYPE_CHECK_INSTANCE_CAST'
13 | (G_TYPE_CHECK_INSTANCE_CAST((obj), NM_TYPE_DEVICE_MACVLAN, NMDeviceMacvlan))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Avoid that by using _NM_G_TYPE_CHECK_INSTANCE_CAST().
This can only be done for our internal usages. The public headers
of libnm are not changed.
For MACsec interfaces, kernel announces the parent ifindex in the
generic IFLA_LINK netlink attribute, which we save in
NMPlatformLink.parent. There is no need to have a dedicate member in
NMPlatformLnkMacsec.
The dedicate member was never set and during a restart of
NetworkManager the parent of the MACsec device could be unset leading
to a failed assertion:
act_stage2_config: assertion 'parent' failed
Fixes: 85103656e9 ('platform: add support for macsec links')
https://bugzilla.redhat.com/show_bug.cgi?id=2122564https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1481
With gcc-12.2.1-4.fc37 on i686 we get:
./src/libnm-platform/nmp-object.h: In function 'nmp_object_ref':
./src/libnm-platform/nmp-object.h:626:12: error: cast increases required alignment of target type [-Werror=cast-align]
626 | return (const NMPObject *) nm_dedup_multi_obj_ref((const NMDedupMultiObj *) obj);
| ^
cc1: all warnings being treated as errors
Work around that be increasing the alignment of NMDedupMultiObj.
It has no downsides, because we usually put a NMDedupMultiObj in heap
allocated memory, which is already suitably aligned. Or we put it on
the stack, where wasting a few bytes for the alignment doesn't matter.
We basically never embed NMDedupMultiObj in an array where the increase
of alignment would waste additional space.
The warning "-Wcast-align=strict" seems useful and will be enabled
next. Fix places that currently cause the warning by using the
new macro NM_CAST_ALIGN(). This macro also nm_assert()s that the alignment
is correct.
We put all these structs inside the tagged union NMPObject.
Also, in a sense NMPlatformObject is the base "type" of all
these structs, meaning, it should be able to up and downcast.
Ensure the alignment matches.
This helps to avoid "-Wcast-align" warnings when trying to cast
a (NMPlatformObject*) to another (NMPlatformXXX *) type. Something
we commonly do.
Support managing the loopback interface through NM as the users want to
set the proper mtu for loopback interface when forwarding the packets.
Additionally, the IP addresses, DNS, route and routing rules are also
allowed to configure for the loopback connection profiles.
https://bugzilla.redhat.com/show_bug.cgi?id=2060905
clang-3.4.2-9.el7 does not like nesting NM_MAX() macro inside nm_hash_update_vals() macro.
Workaround by using MAX() instead. NM_MAX() uses an expression statement and NM_UNIQ()
to evaluate the arguments only once. We don't need that here and glib's MAX() suffices.
CC src/libnm-platform/src_libnm_platform_libnm_platform_la-nm-platform.lo
../src/libnm-platform/nm-platform.c:8247:53: error: in-class initializer for static data member is not a constant expression
(guint8) NM_MAX(obj->weight, 1u));
^
../src/libnm-std-aux/nm-std-aux.h:399:40: note: expanded from macro 'NM_MAX'
#define NM_MAX(a, b) __NM_MAX(NM_UNIQ, a, NM_UNIQ, b)
^
../src/libnm-std-aux/nm-std-aux.h:402:39: note: expanded from macro '__NM_MAX'
typeof(a) NM_UNIQ_T(A, aq) = (a); \
^
../src/libnm-glib-aux/nm-hash-utils.h:124:36: note: expanded from macro 'nm_hash_update_vals'
NM_HASH_COMBINE_VALS(_val, __VA_ARGS__); \
^
Fixes: 8cc41d41fe ('platform: add NM_PLATFORM_IP_ROUTE_CMP_TYPE_ECMP_ID for comparing ECMP base route')
This is the version shipped in Fedora 37. As Fedora 37 is now out, the
core developers switch to it. Our gitlab-ci will also use that as base
image for the check-{patch.tree} tests and to generate the pages. There
is a need that everybody agrees on which clang-format version to use,
and that version should be the one of the currently used Fedora release.
Also update the used Fedora image in "contrib/scripts/nm-code-format-container.sh"
script.
The gitlab-ci still needs update in the following commit. The change
in isolation will break the "check-tree" test.
We sometimes have functions foo() and foo_full(), in which case
foo() has fewer arguments and just calls foo_full(). The "full"
function here is the more powerful one, and foo() is implemented
in terms of the former.
nm_platform_ip4_route_cmp_full() and m_platform_ip4_route_cmp() inverted
that pattern. The "_full" there stands for the full comparison, to not
allowing to select the comparison type.
That inconsistency is ugly. Also, these wrappers were used at only few
places. Let's drop them.
While at it, also drop nm_platform_qdisc_cmp() and rename
nm_platform_qdisc_cmp_full(). Here cmp()/cmp_full() followed the common
pattern foo()/foo_full(), but it's still hardly used and unnecessary.
When adding a new route we need to consider it contains extra nexthops
i.e it is a ECMP route. As we cannot modify the NMPObject once created,
we need to pass the extra nexthops as an argument.
We cannot use the original NMPObject because normalization is happening
during when adding the route.
When reading from netlink an ECMP IPv4 route, we need to parse the
multiple nexthops. In order to do that, we are introducing
NMPlatformIP4RtNextHop struct.
The first nexthop information will be kept at the original
NMPlatformIP4Route and the new property n_nexthops will indicate how
many nexthops we need to consider.
The NMPObject is a tagged union. There is no need to initialize anything
after the size of the actually used union field. Change this, so maybe we
get a valgrind warning about uninitialized memory if we wrongly try to
access it.
On the other hand, the object really is supposed to be a full
NMPObject. Previously, we would get a valgrind warning, if we tried to
pass fewer data there.
It really doesn't matter much, but all other functions don't assume
that there is any important data after the size indicated by the class.
We will extend IPv4 routes with the list of next hops. This field will
be heap allocated and be part of the NMPObjectIP4Route object, while
also being part of the identity. To support the ID operator that checks
fields of the NMPObject, add a "for_id" argument to the hash/cmp hooks.
Also, a function that sets cmd_obj_{hash_update,cmp}() MUST not set
cmd_plobj_id_{hashupdate,cmp}(), as it would have overlapping
functionality. Therefore, the objects that define
cmd_obj_{hash_update,cmp}() need to fully implement the ID comparison.
A NMPClass that has data outside the plobj part, needs
to implement the cmd_obj_*() hooks, instead of cmd_plobj_*().
For those objects, reasoning only about the plobj part is not
sufficient. Implementing both hooks is also unnecessary and
confusing.
Ensure that if we have cmd_obj_*() hooks set, that the corresponding
cmd_plobj_*() hooks are unset.
The if-else-if was wrong. It meant that if an object did not implement
cmd_plobj_id_copy(), nothign was copied (for id-only).
I think this code path was not actually hit, because we never clone
an object only by ID.
Fixes: c91a4617a1 ('nmp-object: allow missing implementations for certain virtual functions')
If an address is removed during pruning and it had the TENTATIVE flag
before, the most likely cause of the removal is that it failed DAD. It
could also be that the user removed it at the same time we needed to
resync the platform cache, but that seems more unlikely.
Cleanup the handling of close().
First of all, closing an invalid (non-negative) file descriptor (EBADF) is
always a serious bug. We want to catch that. Hence, we should use nm_close()
(or nm_close_with_error()) which asserts against such bugs. Don't ever use
close() directly, to get that additional assertion.
Also, our nm_close() handles EINTR internally and correctly. Recent
POSIX defines that on EINTR the close should be retried. On Linux,
that is never correct. After close() returns, the file descriptor is
always closed (or invalid). nm_close() gets this right, and pretends
that EINTR is a success (without retrying).
The majority of our file descriptors are sockets, etc. That means,
often an error from close isn't something that we want to handle. Adjust
nm_close() to return no error and preserve the caller's errno. That is
the appropriate reaction to error (ignoring it) in most of our cases.
And error from close may mean that there was an IO error (except EINTR
and EBADF). In a few cases, we may want to handle that. For those
cases we have nm_close_with_error().
TL;DR: use almost always nm_close(). Unless you want to handle the error
code, then use nm_close_with_error(). Never use close() directly.
There is much reading on the internet about handling errors of close and
in particular EINTR. See the following links:
https://lwn.net/Articles/576478/https://askcodes.net/coding/what-to-do-if-a-posix-close-call-fails-https://www.austingroupbugs.net/view.php?id=529https://sourceware.org/bugzilla/show_bug.cgi?id=14627https://news.ycombinator.com/item?id=3363819https://peps.python.org/pep-0475/
When there are many VFs the default buffer size of 1 memory page is
not enough. Each VF can take up to ~120 bytes and so when the page
size is 4KiB at most ~34 VFs can be added.
Specify the buffer size when allocating the message.
Add a len argument to nlmsg_alloc() and nlmsg_alloc_simple(). After
that, nlmsg_alloc_size() can be dropped. Also, rename
nlmsg_alloc_simple() to nlmsg_alloc_new().
Prefer it over strncmp(), because it seems easier to understand (to me).
Prefer it over g_str_has_prefix(), because it can directly expand
to a plain strncmp() -- instead of first humping to glib, then calling
strlen() before calling strncmp().
Turns out, modern rmnet_* devices doesn't use ARPHRD_ETHER arptype, but
ARPHRD_RAWIP. Also complicating the fact is that ARPHRD_RAWIP is
actually added in v4.14, but devices using kernel before that version
define this value as "530" in an out-of-tree patch [1].
Recognize this case and check explicitly for 3 values of arptype.
[1] 54948008c2https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1392
Later, we should move all such objects. And we should rename
the API to have a unique prefix, like "NMPPlObjIP[4]Address".
This is just a first step that introduces more inconsistencies than it
solves. It will get better afterwards.
"nmp-base.h" really should only contain simple defines like enum types
or #define. As such, it almost does not need a source file.
However, the enum-to-string methods for the enums of "nmp-base.h" need a
place. Add "nmp-base.c" for that.
Our naming in libnm-platform is bad.
We have NMPlatform, which is a cache of objects. Consequently we have
platform methods like nm_platform_get_link().
We also have various other types that share the NMPlatform prefix, like
NMPlatformIP4Address. For those we have nm_platform_ip4_address_to_string().
"methods" of a type should have the same prefix as the type,
and we should not have types that share the same prefix.
Also, "NMPlatformIP4Address" is a long name, and inconsistent with the
strongly related NMPObjectIP4Address.
Add new files to move and rename parts of the platform API.
This changes a few places where we might have looked up the ifname in
NMPlatform to only print the ifindex. Since the ifindex is the real identifier,
and the logfile is already full of lines that associate the ifname with the ifindex,
this is fine.
NMPGlobalTracker allows to track objects for independent users/callers.
That is, callers that are not aware whether another caller tracks the
same/similar object. It thus groups all objects by their nmp_object_id_equal()
(as `TrackObjData` struct), while keeping a list of each individually tracked
object (as `TrackData` struct which honors the object and the user-tag parameter).
When the same caller (based on the user-tag) tracks the same object again, then
NMPGlobalTracker will only track it once and combine the objects. That is done by
also having a dictionary for the `TrackData` entries (`self->by_data`).
This latter dictionary lookup wrongly considered nmp_object_id_equal().
Instead, it needs to consider all minor differences of the objects, and
use nmp_object_equal().
For example, for NMPlatformMptcpAddress, only the "address" is part of
the ID. Other fields, like the MPTCP flags are not. Imagine a profile is
active with MPTCP endpoints configured with flags "subflow". During reapply,
the user can only update the MPTCP flags (e.g. to "signal"). When that happens,
the caller (NML3Cfg) would track a new NMPlatformMptcpAddress instance, that only
differs by MPTCP flags. In this case, we need to track the new address for the
differences that it has according to nmp_object_equal(), and not
nmp_object_id_equal().
Due to this bug, reapply might not work correctly. For other supported types (routing
rules and routes) this bug may have been harder to reproduce, because most attributes
of rules/routes are also part of the ID and because it's uncommon to reapply a minor
change to a rule/route.
https://bugzilla.redhat.com/show_bug.cgi?id=2120471
Fixes: b8398b9e79 ('platform: add NMPRulesManager for syncing routing rules')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1375
These variants provide additional nm_assert() checks, and are thus
preferable.
Note that we cannot just blindly replace &g_array_index() with
&nm_g_array_index(), because the latter would not allow getting a
pointer at index [arr->len]. That might be a valid (though uncommon)
usecase. The correct replacement of &g_array_index() is thus
nm_g_array_index_p().
I checked the code manually and replaced uses of nm_g_array_index_p()
with &nm_g_array_index(), if that was a safe thing to do. The latter
seems preferable, because it is familar to &g_array_index().
On netlink API, the attribute is indeed u32. However, this is an ifindex
which in most other kernel APIs and in NetworkManager code is a signed
integer. Note that of course kernel would only ever assign numbers that
are valid ifindexes, thus in the suitable range.
The IFLA_BOND_ACTIVE_SLAVE and IFLA_BOND_PRIMARY are not the same.
If the primary is not set, then that's it. Don't fallback.
Only NetworkManager API deprecated "active-slave" and uses it as
alias for "primary". That does not mean, kernel/netlink does that.
This optimization seems unnecessary. Just initialize a new route struct
and use it. The advantage is that we can have the variable in the scope
closer to where it's used, and don't need to think about what happens
outside the scope.
Previously, nm_platform_ip_address_sync() would always add the "IFA_F_NOPREFIXROUTE"
flag. Add a way to let the caller control that.
Add a flags argument, with a new flag "with-noprefixroute". By default
(with flags "none"), nm_platform_ip_address_sync() would no longer
add "IFA_F_NOPREFIXROUTE" flag, but the caller can now opt-in to that.
The purpose is that on "lo" interface we will want to let kernel
handle the prefix route. So have a per-ifindex opt-in for controlling
this.
During nm_platform_ip_address_flush() we use "none" flags, because the
function anyway doesn't add any addresses, so it wouldn't matter.
There is no change in behavior.
Co-authored-by: Thomas Haller <thaller@redhat.com>