Surisingly, the compiler may detect the remaining obj_type in
the default switch. Then, inlining nmp_class_from_type() it may detect
that this is only possible to hit with an out or range access to
_nmp_classes array.
Rework the code to avoid that compiler warning. It's either way not
supposed to happen.
Also, drop the default switch case and explicitly list the enum values.
Otherwise it is error prone to forget a switch case.
(cherry picked from commit 9848589fbf)
(cherry picked from commit 6f189da7b6)
(cherry picked from commit 6da20c24cd)
GCC 10 complains about accesses to elements of zero-length arrays that
overlap other members of the same object:
src/platform/nm-platform-utils.c: In function ‘nmp_utils_ethtool_get_permanent_address’:
src/platform/nm-platform-utils.c:854:29: error: array subscript 0 is outside the bounds of an interior zero-length array ‘__u8[0]’ {aka ‘unsigned char[0]’} [-Werror=zero-length-bounds]
854 | if (NM_IN_SET (edata.e.data[0], 0, 0xFF)) {
./shared/nm-glib-aux/nm-macros-internal.h:731:20: note: in definition of macro ‘_NM_IN_SET_EVAL_N’
Fix this warning.
(cherry picked from commit d892a35395)
(cherry picked from commit c1417087c8)
(cherry picked from commit f7b9d06306)
GCC 10 complains about accesses to elements of zero-length arrays that
overlap other members of the same object:
src/platform/nm-platform-utils.c: In function ‘ethtool_get_stringset’:
src/platform/nm-platform-utils.c:355:27: error: array subscript 0 is outside the bounds of an interior zero-length array ‘__u32[0]’ {aka ‘unsigned int[0]’} [-Werror=zero-length-bounds]
355 | len = sset_info.info.data[0];
| ~~~~~~~~~~~~~~~~~~~^~~
In file included from src/platform/nm-platform-utils.c:12:
/usr/include/linux/ethtool.h:647:8: note: while referencing ‘data’
647 | __u32 data[0];
| ^~~~
Fix this warning.
(cherry picked from commit 16e1e44c5e)
(cherry picked from commit 286bb2f029)
(cherry picked from commit b474ed0044)
Clang (3.4.2-9.el7) on CentOS 7.6 fails related to nm_hash_update_vals().
Clang seems to dislike passing certain complex arguments to typeof().
I'd prefer to fix nm_hash_update_vals() to not have this problem,
but I don't know how.
This works around the issue.
(cherry picked from commit 5113c5bd00)
clang (3.4.2-9.el7) on CentOS 7.6 fails related to nm_hash_update_vals().
I am not even quoting the error message, it's totally non-understandable.
nm_hash_update_vals() uses typeof(), and in some obscure cases, clang dislikes
when the argument itself is some complex macro. I didn't fully understand why,
but this works around it.
I would prefer to fix nm_hash_update_vals() to not have this limitation.
But I don't know how.
There is probably no downside to have this an inline function instead of
a macro.
(cherry picked from commit ad06cc78dc)
Older versions of meson don't like building multiple artifacts
with the same name (even if they are in different directories). We
have multiple tests called "test-general.c", and it would be natural
to compile a test binary of the same name.
Meson encountered an error in file src/tests/meson.build, line 14, column 2:
Tried to create target "test-general", but a target of that name already exists.
It's generally a bad idea to have in our source tree multiple files with the
same name. Rename the test.
Fixes: 16cd84d346 ('build/meson: rename platform tests to use same name as autotools'):
(cherry picked from commit 041aa3d605)
First of all, all file names in our source-tree should be unique. We should
not have stuff like "libnm-core/tests/test-general.c" and "src/tests/test-general.c".
The problem here are the C source files, and consequently also the test
binaries have duplicate names. We should avoid that in general. However,
our binaries should have a matching name with the C source. If
"test-general.c" is not good enough, that needs renaming. Not building
"platform-test-general" out of it.
On the other hand, all our tests should have a filename "*/tests/test-*", like
they do for autotools.
Rename the meson platform tests.
It's also important because "tools/run-nm-test.sh" relies on the test
name to workaround valgrind warnings.
(cherry picked from commit 16cd84d346)
IP addresses, routes, TC and QDiscs are all tied to a certain interface.
So when NetworkManager manages an interface, it can be confident that
all related entires should be managed, deleted and modified by NetworkManager.
Routing policy rules are global. For that we have NMPRulesManager which
keeps track of whether NetworkManager owns a rule. This allows multiple
connection profiles to specify the same rule, and NMPRulesManager can
consolidate this information to know whether to add or remove the rule.
NMPRulesManager would also support to explicitly block a rule by
tracking it with negative priority. However that is still unused at
the moment. All that devices do is to add rules (track with positive
priority) and remove them (untrack) once the profile gets deactivated.
As rules are not exclusively owned by NetworkManager, NetworkManager
tries not to interfere with rules that it knows nothing about. That
means in particular, when NetworkManager starts it will "weakly track"
all rules that are present. "weakly track" is mostly interesting for two
cases:
- when NMPRulesManager had the same rule explicitly tracked (added) by a
device, then deactivating the device will leave the rule in place.
- when NMPRulesManager had the same rule explicitly blocked (tracked
with negative priority), then it would restore the rule when that
block gets removed (as said, currently nobody actually does this).
Note that when restarting NetworkManager, then the device may stay and
the rules kept. However after restart, NetworkManager no longer knows
that it previously added this route, so it would weakly track it and
never remove them again.
That is a problem. Avoid that, by whenever explicitly tracking a rule we
also make sure to no longer weakly track it. Most likely this rule was
indeed previously managed by NetworkManager. If this was really a rule
added by externally, then the user really should choose distinct
rule priorities to avoid such conflicts altogether.
(cherry picked from commit 15b1304477)
In general, all fields of public NMPlatform* structs must be
plain/simple. Meaning: copying the struct must be possible without
caring about cloning/duplicating memory.
In other words, if there are fields which lifetime is limited,
then these fields cannot be inside the public part NMPlatform*.
That is why
- "NMPlatformLink.kind", "NMPlatformQdisc.kind", "NMPlatformTfilter.kind"
are set by platform code to an interned string (g_intern_string())
that has a static lifetime.
- the "ingress_qos_map" field is inside the ref-counted struct NMPObjectLnkVlan
and not NMPlatformLnkVlan. This field requires managing the lifetime
of the array and NMPlatformLnkVlan cannot provide that.
See also for example NMPClass.cmd_obj_copy() which can deep-copy an object.
But this is only suitable for fields in NMPObject*. The purpose of this
rule is that you always can safely copy a NMPlatform* struct without
worrying about the ownership and lifetime of the fields (the field's
lifetime is unlimited).
This rule and managing of resource lifetime is the main reason for the
NMPlatform*/NMPObject* split. NMPlatform* structs simply have no mechanism
for copying/releasing fields, that is why the NMPObject* counterpart exists
(which is ref-counted and has a copy and destructor function).
This is violated in tc_commit() for the "kind" strings. The lifetime
of these strings is tied to the setting instance.
We cannot intern the strings (because these are arbitrary strings
and interned strings are leaked indefinitely). We also cannot g_strdup()
the strings, because NMPlatform* is not supposed to own strings.
So, just add comments that warn about this ugliness.
The more correct solution would be to move the "kind" fields inside
NMPObjectQdisc and NMPObjectTfilter, but that is a lot of extra effort.
(cherry picked from commit f2ae994b23)
There is only one caller, hence it's simpler to see it all in one place.
I prefer this, because then I can read the code top to bottom and
see what's happening, without following helper functions.
Also, this way we can "reuse" the nla_put_failure label and assertion. Previously,
if the assertion was hit we would not rewind the buffer but continue
constructing the message (which is already borked). Not that it matters
too much, because this was on an "failed-assertion" code path.
(cherry picked from commit 04bd404dff)
These lines can be reached if the allocated buffer is too
small to hold the netlink message. That is actually a bug
that we need to fix. Assert.
(cherry picked from commit 3784a2a2ec)
Arguably, the structure is used inside a union with another (larger)
struct, hence no memory is saved.
In fact, it may well be slower performance wise to access a boolean bitfield
than a gboolean (int).
Still, boolean fields in structures should be bool:1 bitfields for
consistency.
(cherry picked from commit 36d6aa3bcd)
Kernel calls the netlink attribute TCA_FQ_CODEL_MEMORY_LIMIT. Likewise,
iproute2 calls this "memory_limit".
Rename because TC parameters are inherrently tied to the kernel
implementation and we should use the familiar name.
(cherry picked from commit 666d58802b)
iproute2 uses the special value ~0u to indicate not to set
TCA_FQ_CODEL_CE_THRESHOLD in RTM_NEWQDISC. When not explicitly
setting the value, kernel treats the threshold as disabled.
However note that 0xFFFFFFFFu is not an invalid threshold (as far as
kernel is concerned). Thus, we should not use that as value to indicate
that the value is unset. Note that iproute2 uses the special value ~0u
only internally thereby making it impossible to set the threshold to
0xFFFFFFFFu). But kernel does not have this limitation.
Maybe the cleanest way would be to add another field to NMPlatformQDisc:
guint32 ce_threshold;
bool ce_threshold_set:1;
that indicates whether the threshold is enable or not.
But note that kernel does:
static void codel_params_init(struct codel_params *params)
{
...
params->ce_threshold = CODEL_DISABLED_THRESHOLD;
static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
...
if (tb[TCA_FQ_CODEL_CE_THRESHOLD]) {
u64 val = nla_get_u32(tb[TCA_FQ_CODEL_CE_THRESHOLD]);
q->cparams.ce_threshold = (val * NSEC_PER_USEC) >> CODEL_SHIFT;
}
static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
{
...
if (q->cparams.ce_threshold != CODEL_DISABLED_THRESHOLD &&
nla_put_u32(skb, TCA_FQ_CODEL_CE_THRESHOLD,
codel_time_to_us(q->cparams.ce_threshold)))
goto nla_put_failure;
This means, kernel internally uses the special value 0x83126E97u to indicate
that the threshold is disabled (WTF). That is because
(((guint64) 0x83126E97u) * NSEC_PER_USEC) >> CODEL_SHIFT == CODEL_DISABLED_THRESHOLD
So in kernel API this value is reserved (and has a special meaning
to indicate that the threshold is disabled). So, instead of adding a
ce_threshold_set flag, use the same value that kernel anyway uses.
(cherry picked from commit 973db2d41b)
The memory-limit is an unsigned integer. It is ugly (if not wrong) to compare unsigned
values with "-1". When comparing with the default value we must also use an u32 type.
Instead add a define NM_PLATFORM_FQ_CODEL_MEMORY_LIMIT_UNSET.
Note that like iproute2 we treat NM_PLATFORM_FQ_CODEL_MEMORY_LIMIT_UNSET
to indicate to not set TCA_FQ_CODEL_MEMORY_LIMIT in RTM_NEWQDISC. This
special value is entirely internal to NetworkManager (or iproute2) and
kernel will then choose a default memory limit (of 32MB). So setting
NM_PLATFORM_FQ_CODEL_MEMORY_LIMIT_UNSET means to leave it to kernel to
choose a value (which then chooses 32MB).
See kernel's net/sched/sch_fq_codel.c:
static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
...
q->memory_limit = 32 << 20; /* 32 MBytes */
static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
...
if (tb[TCA_FQ_CODEL_MEMORY_LIMIT])
q->memory_limit = min(1U << 31, nla_get_u32(tb[TCA_FQ_CODEL_MEMORY_LIMIT]));
Note that not having zero as default value is problematic. In fields like
"NMPlatformIP4Route.table_coerced" and "NMPlatformRoutingRule.suppress_prefixlen_inverse"
we avoid this problem by storing a coerced value in the structure so that zero is still
the default. We don't do that here for memory-limit, so the caller must always explicitly
set the value.
(cherry picked from commit 46a904389b)
When using nm_utils_strbuf_*() API, the buffer gets always moved to the current
end. We must thus remember and return the original start of the buffer.
(cherry picked from commit b658e3da08)
In practice, there is no difference when representing 0 or 1 as signed/unsigned 32
bit integer. But still use the correct type that also kernel uses.
Also, the implicit conversation from uint32 to bool was correct already.
Still, explicitly convert the uint32 value to boolean in _new_from_nl_qdisc().
It's no change in behavior.
(cherry picked from commit a1099a1fab)
"NM_CMP_FIELD (a, b, fq_codel.ecn == TRUE)" is quite a hack as it relies on
the implementation of the macro in a particular way. The problem is, that
NM_CMP_FIELD() uses typeof() which cannot be used with bitfields. So, the
nicer solution is to use NM_CMP_FIELD_UNSAFE() which exists exactly for bitfields
(it's "unsafe", because it evaluates arguments more than once as it avoids
the temporary variable with typeof()).
Same with nm_hash_update_vals() which uses typeof() to avoid evaluating
arguments more than once. But that again does not work with bitfields.
The "proper" way is to use NM_HASH_COMBINE_BOOLS().
(cherry picked from commit 47d8bee113)
On Ubuntu 14.04 kernel (4.4.0-146-generic, x86_64) this easily causes
test failures:
make -j 8 src/platform/tests/test-route-linux \
&& while true; do \
NMTST_SEED_RANDOM= ./tools/run-nm-test.sh src/platform/tests/test-route-linux -p /route/rule \
|| break; \
done
outputs:
...
/route/rule/1:
nmtst: initialize nmtst_get_rand() with NMTST_SEED_RAND=22892021
OK
/route/rule/2: >>> failing...
>>> no fuzzy match between: [routing-rule,0x205ab30,1,+alive,+visible; [6] 0: from all suppress_prefixlen 8 none]
>>> and: [routing-rule,0x205c0c0,1,+alive,+visible; [6] 0: from all suppress_prefixlen -1579099242 none]
**
test:ERROR:src/platform/tests/test-route.c:1695:test_rule: code should not be reached
(cherry picked from commit d5a2b70909)
Why didn't we get a compiler warning about this bug?
At least clang (3.8.0-2ubuntu4, Ubuntu 16.04) warns:
CC src/platform/src_libNetworkManagerBase_la-nm-platform.lo
../src/platform/nm-platform.c:5389:14: error: data argument not used by format string [-Werror,-Wformat-extra-args]
lnk->remote ? nm_sprintf_buf (str_remote, " remote %s", nm_utils_inet4_ntop (lnk->remote, str_remote1)) : "",
^
Fixes: 4c2862b958 ('platform: add gretap tunnels support')
(cherry picked from commit dfb899f465)
"libnm-core" implements common functionality for "NetworkManager" and
"libnm".
Note that clients like "nmcli" cannot access the internal API provided
by "libnm-core". So, if nmcli wants to do something that is also done by
"libnm-core", , "libnm", or "NetworkManager", the code would have to be
duplicated.
Instead, such code can be in "libnm-libnm-core-{intern|aux}.la".
Note that:
0) "libnm-libnm-core-intern.la" is used by libnm-core itsself.
On the other hand, "libnm-libnm-core-aux.la" is not used by
libnm-core, but provides utilities on top of it.
1) they both extend "libnm-core" with utlities that are not public
API of libnm itself. Maybe part of the code should one day become
public API of libnm. On the other hand, this is code for which
we may not want to commit to a stable interface or which we
don't want to provide as part of the API.
2) "libnm-libnm-core-intern.la" is statically linked by "libnm-core"
and thus directly available to "libnm" and "NetworkManager".
On the other hand, "libnm-libnm-core-aux.la" may be used by "libnm"
and "NetworkManager".
Both libraries may be statically linked by libnm clients (like
nmcli).
3) it must only use glib, libnm-glib-aux.la, and the public API
of libnm-core.
This is important: it must not use "libnm-core/nm-core-internal.h"
nor "libnm-core/nm-utils-private.h" so the static library is usable
by nmcli which couldn't access these.
Note that "shared/nm-meta-setting.c" is an entirely different case,
because it behaves differently depending on whether linking against
"libnm-core" or the client programs. As such, this file must be compiled
twice.
(cherry picked from commit af07ed01c0)
From the files under "shared/nm-utils" we build an internal library
that provides glib-based helper utilities.
Move the files of that basic library to a new subdirectory
"shared/nm-glib-aux" and rename the helper library "libnm-core-base.la"
to "libnm-glib-aux.la".
Reasons:
- the name "utils" is overused in our code-base. Everything's an
"utils". Give this thing a more distinct name.
- there were additional files under "shared/nm-utils", which are not
part of this internal library "libnm-utils-base.la". All the files
that are part of this library should be together in the same
directory, but files that are not, should not be there.
- the new name should better convey what this library is and what is isn't:
it's a set of utilities and helper functions that extend glib with
funcitonality that we commonly need.
There are still some files left under "shared/nm-utils". They have less
a unifying propose to be in their own directory, so I leave them there
for now. But at least they are separate from "shared/nm-glib-aux",
which has a very clear purpose.
(cherry picked from commit 80db06f768)
We built (among others) two libraries from the sources in "shared/nm-utils":
"libnm-utils-base.la" and "libnm-utils-udev.la".
It's confusing. Instead use directories so there is a direct
correspondence between these internal libraries and the source files.
(cherry picked from commit 2973d68253)
"shared/nm-utils" contains general purpose utility functions that only
depend on glib (and extend glib with some helper functions).
We will also add code that does not use glib, hence it would be good
if the part of "shared/nm-utils" that does not depend on glib, could be
used by these future projects.
Also, we use the term "utils" everywhere. While that covers the purpose
and content well, having everything called "nm-something-utils" is not
great. Instead, call this "nm-std-aux", inspired by "c-util/c-stdaux".
(cherry picked from commit b434b9ec07)
Also, in nm_platform_routing_rule_cmp() always compare the routing
table field, also if l3mdev is set. For kernel, we cannot set table and
l3mdev together, hence such rules don't really exist (or if we try to
configure it, it will be rejected by kernel). But as far as
nm_platform_routing_rule_cmp() is concerned, if the table is set,
always compare it.
(cherry picked from commit b6ff02e76f)
For routes and routing rules, kernel uses a certain (not stictly defined) set
of attributes to decide whether to routes/rules are identical.
That is a problem, as different kernel versions disagree on whether
two routes/rules are the same (EEXIST) or not.
Note that when NetworkManager tries to add a rule with protocol set to
anything but RTPROT_UNSPEC, then kernel will ignore the attribute if it
doesn't have support for it. Meaning: the added rule will have a
different protocol setting then intended.
Note that NMPRulesManager will add a rule if it doesn't find it in the
platform cache so far. That means, when looking into the platform cache
we must ignore or honor the protocol like kernel does.
This does not only affect FRA_PROTOCOL, but all attributes where kernel
and NetworkManager disagrees. But the protocol is the most prominent
one, because the rules tracked by nmp_rules_manager_track_default()
specify the protocol.
(cherry picked from commit ef4f8ccf6d)
Next we will need to detect more kernel features. First refactor the
handling of these to require less code changes and be more efficient.
A plain nm_platform_kernel_support_get() only reqiures to access an
array in the common case.
The other important change is that the function no longer requires a
NMPlatform instance. This allows us to check kernel support from
anywhere. The only thing is that we require kernel support to be
initialized before calling this function. That means, an NMPlatform
instance must have detected support before.
(cherry picked from commit ee269b318e)
In some cases it is convenient to specify ranges of bridge vlans, as
already supported by iproute2 and natively by kernel. With this commit
it becomes possible to add a range in this way:
nmcli connection modify eth0-slave +bridge-port.vlans "100-200 untagged"
vlan ranges can't be PVIDs because only one PVID vlan can exist.
https://bugzilla.redhat.com/show_bug.cgi?id=1652910
(cherry picked from commit 7093515777)
Policy routing rules are global, and unlike routes not tied to an interface by ifindex.
That means, while we take full control over all routes of an interface during a sync,
we need to consider that multiple parties can contribute to the global set of rules.
That might be muliple connection profiles providing the same rule, or rules that are added
externally by the user. NMPRulesManager mediates for that.
This is done by NMPRulesManager "tracking" rules.
Rules that are not tracked by NMPRulesManager are completely ignored (and
considered externally added).
When tracking a rule, the caller provides a track-priority. If multiple
parties track a rule, then the highest (absolute value of the) priority
wins.
If the highest track-priority is positive, NMPRulesManager will add the rule if
it's not present.
When the highest track-priority is negative, then NMPRulesManager will remove the
rule if it's present (enforce its absence).
The complicated part is, when a rule that was previously tracked becomes no
longer tracked. In that case, we need to restore the previous state.
If NetworkManager added the rule earlier, then untracking the rule
NMPRulesManager will remove the rule again (restore its previous absent
state).
By default, if NetworkManager had a negative tracking-priority and removed the
rule earlier (enforced it to be absent), then when the rule becomes no
longer tracked, NetworkManager will not restore the rule.
Consider: the user adds a rule externally, and then activates a profile that
enforces the absence of the rule (causing NetworkManager to remove it).
When deactivating the profile, by default NetworkManager will not
restore such a rule! It's unclear whether that is a good idea, but it's
also unclear why the rule is there and whether NetworkManager should
really restore it.
Add weakly tracked rules to account for that. A tracking-priority of
zero indicates such weakly tracked rules. The only difference between an untracked
rule and a weakly tracked rule is, that when NetworkManager earlier removed the
rule (due to a negative tracking-priority), it *will* restore weakly
tracked rules when the rules becomes no longer (negatively) tracked.
And it attmpts to do that only once.
Likewise, if the rule is weakly tracked and already exists when
NMPRulesManager starts posively tracking the rule, then it would not
remove again, when no longer positively tracking it.
- fix the argument type to be "gint32" and not "int".
- assert in nmp_rules_manager_track_default() for the input
arguments.
- use boolean bitfield in private data.
The name "priority" is overused. Also rules have a "priority", but that'
something else.
Rename the priority of how rules are tracked by NMPRuleManager to
"track_priority".
All that setting track-default does, is calling nmp_rules_manager_track_default()
when the rules are first accessed.
That is not right API. Since nmp_rules_manager_track_default() is already public
API (good), every caller that wishes this behavior should track these routes explicitly.
Seems on a busy system, we can hit this timeout. Increase it.
ERROR:../src/platform/tests/test-common.c:939:_ip_address_add: code should not be reached
'sriov_drivers_autoprobe' was added in kernel 4.12. With previous
kernel versions NM is currently unable to set any SR-IOV parameter
because it tries to read 'sriov_drivers_autoprobe' which doesn't
exist, assumes that current value is -1 and tries to change it,
failing.
When the file doesn't exist, drivers are automatically probed so we
can assume the value is 1. In this way NM is able to activate a
connection with sriov.autoprobe-drivers=1 (the default) even on older
kernel versions.
Fixes: 1e41495d9a ('platform: sriov: write new values when we can't read old ones')
https://bugzilla.redhat.com/show_bug.cgi?id=1695093