Routing rules are unlike addresses or routes not tied to an interface.
NetworkManager thinks in terms of connection profiles. That works well
for addresses and routes, as one profile configures addresses and routes
for one device. For example, when activating a profile on a device, the
configuration does not interfere with the addresses/routes of other
devices. That is not the case for routing rules, which are global, netns-wide
entities.
When one connection profile specifies rules, then this per-device configuration
must be merged with the global configuration. And when a device disconnects later,
the rules must be removed.
Add a new NMPRulesManager API to track/untrack routing rules. Devices can
register/add there the routing rules they require. And the sync method will
apply the configuration. This is be implemented on top of NMPlatform's
caching API.
Add and implement NMPlatformRoutingRule types and let the platform cache
handle rules.
Rules are special in two ways:
- they don't have an ifindex. That makes them different from all other
currently existing NMPlatform* types, which have an "ifindex" field and
"implement" NMPlatformObjWithIfindex.
- they have an address family, but contrary to addresses and routes, there
is only one NMPlatformRoutingRule object to handle both address
families.
Both of these points require some special considerations.
Kernel treats routing-rules quite similar to routes. That is, kernel
allows to add different rules/routes, as long as they differ in certain
fields. These "fields" make up the identity of the rules/routes. But
in practice, it's not defined which fields contribute to the identity
of these objects. That makes using the netlink API very hard. For
example, when kernel gains support for a new attribute which
NetworkManager does not know yet, then users can add two rules/routes
that look the same to NetworkManager. That can easily result in cache
inconsistencies.
Another problem is, that older kernel versions may not yet support all
fields, which NetworkManager (and newer kernels) considers for identity.
The older kernel will not simply reject netlink messages with these unknown
keys, instead it will proceed adding the route/rule without it. That means,
the added route/rule will have a different identity than what NetworkManager
intended to add.
Currently, there is a directy one to one relation between
- DELAYED_ACTION_TYPE_REFRESH_ALL_*
- REFRESH_ALL_TYPE_*
- NMP_OBJECT_TYPE_*
For IP addresses, routes and routing policy rules, when we request
a refresh-all (NLM_F_DUMP), we want to specify the address family.
For addresses and routes that is currently solved by having two
sets of NMPObject types, for each address family one.
I think that is cumbersome because the implementations of both address
families are quite similar. By implementing both families as different
object types, we have a lot of duplicate code and it's hard to see where
the families actually differ. It would be better to have only one NMPObject
type, but then when we "refresh-all" such types, we still want to be able
to dump all (AF_UNSPEC) or only a particular address family (AF_INET, AF_INET6).
Decouple REFRESH_ALL_TYPE_* from NMP_OBJECT_TYPE_* to make that
possible.
While these numbers are strongly related to DELAYED_ACTION_TYPE_REFRESH_ALL_*,
they differ in their meaning.
These are the refresh-all-types that we support. While some of the delayed-actions
are indeed for refresh-all, they are not the same thing.
Rename the enum.
The function is unused. It would require redesign to work with
future changes, and since it's unused, just drop it.
The long reasoning is:
Currently, a refresh-all is tied to an NMPObjectType. However, with
NMPObjectRoutingRule (for policy-routing-rules) that will no longer
be the case.
That is because NMPObjectRoutingRule will be one object type for
AF_INET and AF_INET6. Contrary to IPv4 addresses and routes, where
there are two sets of NMPObject types.
The reason is, that it's preferable to treat IPv4 and IPv6 objects
similarly, that is: as the same type with an address family property.
That also follows netlink, which uses RTM_GET* messages for both
address families, and the address family is expressed inside the
message.
But then an API like nm_platform_refresh_all() makes little sense,
it would require at least an addr_family argument. But since the
API is unused, just drop it.
This allows the compiler to see that this function does nothing for %NULL.
That is not so unusual, as we use nm_auto_nmpobj to free objects. Often
the compiler can see that these pointers are %NULL.
Until now, all implemented NMPObject types have an ifindex field (from
links, addresses, routes, qdisc to tfilter).
The NMPObject structure contains a union of all available types, that
makes it easier to down-case from an NMPObject pointer to the actual
content.
The "object" field of NMPObject of type NMPlatformObject is the lowest
common denominator.
We will add NMPlatformRoutingRules (for policy routing rules). That type
won't have an ifindex field.
Hence, drop the "ifindex" field from NMPlatformObject type. But also add
a new type NMPlatformObjWithIfindex, that can represent all types that
have an ifindex.
The condition got accidentally reversed, which means we're always
undecided and thus wrongly assuming support and never being able to set
any addresses.
This would bother the few that are struck with 3.4 android kernels. Very
few indeed, given this got unnoticed since 1.10.
Fixes: 8670aacc7c ('platform: cleanup detecting kernel support for IFA_FLAGS and IPv6LL')
Older versions of meson don't support building the same names
multiple times.
Meson encountered an error in file src/tests/meson.build, line 14, column 2:
Tried to create target "test-general", but a target of that name already exists.
We really need to use unique filenames everywhere. Revert the name
change for now.
This breaks again the valgrind workaround in "tools/run-nm-test.sh".
This reverts commit 5466edc63e.
Meson and autotools should name the tests the same way.
Also, all tests binaries built by autotools start on purpose
with "test-". Do that for meson too.
Also, otherwise "tools/run-nm-test.sh" fails to workaround
valgrind failures for platform tests as it does not expect
the tests to be named that way:
if [ $HAS_ERRORS -eq 0 ]; then
# valgrind doesn't support setns syscall and spams the logfile.
# hack around it...
if [ "$TEST_NAME" = 'test-link-linux' -o \
"$TEST_NAME" = 'test-acd' ]; then
if [ -z "$(sed -e '/^--[0-9]\+-- WARNING: unhandled .* syscall: /,/^--[0-9]\+-- it at http.*\.$/d' "$LOGFILE")" ]; then
HAS_ERRORS=1
fi
fi
fi
The defaults for test timeouts in meson is 30 seconds. That is not long
enough when running
$ NMTST_USE_VALGRIND=1 ninja -C build test
Note that meson supports --timeout-multiplier, and automatically
increases the timeout when running under valgrind. However, meson
does not understand that we are running tests under valgrind via
NMTST_USE_VALGRIND=1 environment variable.
Timeouts are really not expected to be reached and are a mean of last
resort. Hence, increasing the timeout to a large value is likely to
have no effect or to fix test failures where the timeout was too rigid.
It's unlikely that the test indeed hangs and the increase of timeout
causes a unnecessary increase of waittime before aborting.
We already have NMPLookup to express a set of objects that can be looked
up via the multi-index.
Change the behavior of nmp_cache_dirty_set_all(), not to accept an
object-type. Instead, make it generally useful and accept a lookup
instance that is used to filter the elements that should be set as
dirty.
for-each macros can be nice to use. However, such macros are potentially
error prone, as they abstract C control statements and scoping blocks.
I find it ugly to expand the macro to:
for (...)
if (...)
Instead, move the additional "if" check inside the loop's condition
expression, so that the macro only expands to one "for(;;)" statement.
It does not matter too much, but the code optimizes for deletion events.
In that case, we don't require parsing the full netlink message.
Instead, it's enough to parse only the ID part so that we can find the
object in the cache that is to be removed removed.
Fix that for qdisc/tfilter objects.
I think the code before was correct. At the very least because
we only run the while-loop at most once because multipath routes
are not supported.
However, it seems odd that the while loop checks for
"tlen >= rtnh->rtnh_len"
but later we do
"tlen -= RTNH_ALIGN (rtnh->rtnh_len)"
Well, arguably, tlen itself is aligned to 4 bytes (as kernel sends
the netlink message that way). So, it was indeed fine.
Still, confusing. Try to check more explicitly for the buffer sizes.
nla_data_as() has a static assertion that casting to the pointer is
valid (regarding the alignment of the structure). It also contains
an nm_assert() that the data is in fact large enough.
It's safer, hence prefer it a some places where it makes sense.
- nlmsg_append_struct() determines the size based on the argument.
It avoids typing, but more importantly: it avoids typing redundant
information (which we might get wrong).
- also, declare the structs as const, where possible.
nla_get_u64() was unlike all other nla_get_u*() implementations, in that it
would allow for a missing/invalid nla argument, and return 0.
Don't do this. For one, don't behave different than other getters.
Also, there really is no space to report errors. Hence, the caller must
make sure that the attribute is present and suitable -- like for other
nla_get_*() functions.
None of the callers relied on being able to pass NULL attribute.
Also, inline the function and use unaligned_read_ne64(). That is our
preferred way for reading unaligned data, not memcpy().
These are only nm_assert(), meaning on non-DEBUG builds they
are not enabled.
Callers are supposed to first check that the netlink attribute
is suitable. Hence, we just assert.
The policy for strings must indicate a minlen of at least 1.
Everything else is a bug, because the policy contains invalid
data -- and is determined at compile-time.
- use size_t arguments for the memory sizes. While sizes from netlink
API currently are int typed and inherrently limited, use the more
appropriate data type.
- rename the arguments. The "count" is really the size of the
destination buffer.
- return how many bytes we wanted to write (like g_strlcpy()).
That makes more sense than how many bytes we actually wrote
because previously, we could not detect truncation.
Anyway, none of the callers cared about the return-value either
way.
- let nla_strlcpy() return how many bytes we would like to have
copied. That way, the caller could detect string truncation.
In practice, no caller cared about that.
- the code before would also fill the entire buffer with zeros first,
like strncpy(). We still do that. However, only copy the bytes up
to the first NUL byte. The previous version would have copied
"a\0b\0" (with srclen 4) as "a\0b". Strip all bytes after the
first NUL character from src. That seems more correct here.
- accept nla argument as %NULL.
- drop explicit MAX sizes like
static const struct nla_policy policy[IFLA_INET6_MAX+1] = {
The compiler will deduce that.
It saves redundant information (which is possibly wrong). Also,
the max define might be larger than we actually need it, so we
just waste a few bytes of static memory and do unnecesary steps
during validation.
Also, the compiler will catch bugs, if the array size of policy/tb
is too short for what we access later (-Warray-bounds).
- avoid redundant size specifiers like:
static const struct nla_policy policy[IFLA_INET6_MAX+1] = {
...
struct nlattr *tb[IFLA_INET6_MAX+1];
...
err = nla_parse_nested (tb, IFLA_INET6_MAX, attr, policy);
- use the nla_parse*_arr() macros that determine the maxtype
based on the argument types.
- move declaration of "static const struct nla_policy policy" variable
to the beginning, before auto variables.
- drop unneeded temporay error variables.
The common idiom is to stack allocate the tb array. Hence,
the maxtype is redundant. Add macros that autodetect the
maxtype based on the C type infomation.
Also, there is a static assertion that the size of the policy
(if provided) matches.