When searching an element that is lower than the first list element (for
example RTNL type "batadv"), imax will be -1 after the last iteration.
Use int instead of unsigned to make the termination condition imin > imax
work in this case. This fixes NetworkManager crashing due to an
out-of-bounds array access whenever interfaces of such types exist.
Fixes: 19ad044359 ('platform: use binary search to lookup NMLinkType for rtnl_type')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/515
Add nm_utils_invoke_on_timeout() beside nm_utils_invoke_on_idle().
They are fundamentally similar, except one schedules an idle handler
and the other a timeout.
Also, use the current g_main_context_get_thread_default() as context
instead of the singleton instance. That is a change in behavior, but
the only caller of nm_utils_invoke_on_idle() is the daemon, which
doesn't use different main contexts. Anyway, to avoid anybody being
tripped up by this also change the order of arguments. It anyway
seems nicer to first pass the cancellable, and the callback and user
data as last arguments. It's more in line with glib's asynchronous
methods.
Also, in the unlikely case that the cancellable is already cancelled
from the start, always schedule an idle action to complete fast.
(cherry picked from commit cd5157a0c3)
g_clear_pointer() would always cast the destroy notify function
pointer to GDestroyNotify. That means, it lost some type safety, like
GPtrArray *ptr_arr = ...
g_clear_pointer (&ptr_arr, g_array_unref);
Since glib 2.58 ([1]), g_clear_pointer() is also more type safe. But
this is not used by NetworkManager, because we don't set
GLIB_VERSION_MIN_REQUIRED to 2.58.
[1] f9a9902aac
We have nm_clear_pointer() to avoid this issue for a long time (pre
1.12.0). Possibly we should redefine in our source tree g_clear_pointer()
as nm_clear_pointer(). However, I don't like to patch glib functions
with our own variant. Arguably, we do patch g_clear_error() in
such a manner. But there the point is to make the function inlinable.
Also, nm_clear_pointer() returns a boolean that indicates whether
anything was cleared. That is sometimes useful. I think we should
just consistently use nm_clear_pointer() instead, which does always
the preferable thing.
Replace:
sed 's/\<g_clear_pointer *(\([^;]*\), *\([a-z_A-Z0-9]\+\) *)/nm_clear_pointer (\1, \2)/g' $(git grep -l g_clear_pointer) -i
Several macros are used to define function. They had a "_STATIC" variant,
to define the function as static.
I think those macros should not try to abstract entirely what they do.
They should not accept the function scope as argument (or have two
variants per scope). This also because it might make sense to add
additional __attribute__(()) to the function. That only works, if
the macro does not pretend to *not* define a plain function.
Instead, embrace what the function does and let the users place the
function scope as they see fit.
This also follows what is already done with
static NM_CACHED_QUARK_FCN ("autoconnect-root", autoconnect_root_quark)
- track the broadcast address in NMPlatformIP4Address. For addresses
that we receive from kernel and that we cache in NMPlatform, this
allows us to show the additional information. For example, we
can see it in debug logging.
- when setting the address, we still mostly generate our default
broadcast address. This is done in the only relevant caller
nm_platform_ip4_address_sync(). Basically, we merely moved setting
the broadcast address to the caller.
That is, because no callers explicitly set the "use_ip4_broadcast_address"
flag (yet). However, in the future some caller might want to set an explicit
broadcast address.
In practice, we currently don't support configuring special broadcast
addresses in NetworkManager. Instead, we always add the default one with
"address|~netmask" (for plen < 31).
Note that a main point of IFA_BROADCAST is to add a broadcast route to
the local table. Also note that kernel anyway will add such a
"address|~netmask" route, that is regardless whether IFA_BROADCAST is
set or not. Hence, setting it or not makes very little difference for
normal broadcast addresses -- because kernel tends to add this route either
way. It would make a difference if NetworkManager configured an unusual
IFA_BROADCAST address or an address for prefixes >= 31 (in which cases
kernel wouldn't add them automatically). But we don't do that at the
moment.
So, while what NM does has little effect in practice, it still seems
more correct to add the broadcast address, only so that you see it in
`ip addr show`.
Previously NetworkManager would wrongly add a broadcast address for the
network prefix that would collide with the IP address of the host on
the other end of the point-to-point link thus exhausting the IP address
space of the /31 network and preventing communication between the two
nodes.
Configuring a /31 address before this commit:
IP addr -> 10.0.0.0/31, broadcast addr -> 10.0.0.1
If 10.0.0.1 is configured as a broadcast address the communication
with host 10.0.0.1 will not be able to take place.
Configuring a /31 address after this commit:
IP addr -> 10.0.0.0/31, no broadcast address
Thus 10.0.0.0/31 and 10.0.0.1/31 are able to correctly communicate.
See RFC-3021. https://tools.ietf.org/html/rfc3021https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/295https://bugzilla.redhat.com/show_bug.cgi?id=1764986
The abbreviations "ns" and "ms" seem not very clear to me. Spell them
out to nsec/msec. Also, in parts we already used the longer abbreviations,
so it wasn't consistent.
... and nm_utils_fd_get_contents() and nm_utils_file_set_contents().
Don't mix negative errno return value with a GError output. Instead,
return a boolean result indicating success or failure.
Also, optionally
- output GError
- set out_errsv to the positive errno (or 0 on success)
Obviously, the return value and the output arguments (contents, length,
out_errsv, error) must all agree in their success/failure result.
That means, you may check any of the return value, out_errsv, error, and
contents to reliably detect failure or success.
Also note that out_errsv gives the positive(!) errno. But you probably
shouldn't care about the distinction and use nm_errno_native() either
way to normalize the value.
We usually want to combine the fields from "struct timespec" to
have one timestamp in either nanoseconds or milliseconds.
Use nm_utils_clock_gettime_*() util for that.
We no longer add these. If you use Emacs, configure it yourself.
Also, due to our "smart-tab" usage the editor anyway does a subpar
job handling our tabs. However, on the upside every user can choose
whatever tab-width he/she prefers. If "smart-tabs" are used properly
(like we do), every tab-width will work.
No manual changes, just ran commands:
F=($(git grep -l -e '-\*-'))
sed '1 { /\/\* *-\*- *[mM]ode.*\*\/$/d }' -i "${F[@]}"
sed '1,4 { /^\(#\|--\|dnl\) *-\*- [mM]ode/d }' -i "${F[@]}"
Check remaining lines with:
git grep -e '-\*-'
The ultimate purpose of this is to cleanup our files and eventually use
SPDX license identifiers. For that, first get rid of the boilerplate lines.
While at it, rename the "addr" field to "l_address". The term "addr" is
used over and over. Instead we should use distinct names that make it
easier to navigate the code.
When changing the number of VFs the kernel can block for very long
time in the write() to sysfs, especially if autoprobe-drivers is
enabled. Turn the nm_platform_link_set_sriov_params() into an
asynchronous function.
In general, all fields of public NMPlatform* structs must be
plain/simple. Meaning: copying the struct must be possible without
caring about cloning/duplicating memory.
In other words, if there are fields which lifetime is limited,
then these fields cannot be inside the public part NMPlatform*.
That is why
- "NMPlatformLink.kind", "NMPlatformQdisc.kind", "NMPlatformTfilter.kind"
are set by platform code to an interned string (g_intern_string())
that has a static lifetime.
- the "ingress_qos_map" field is inside the ref-counted struct NMPObjectLnkVlan
and not NMPlatformLnkVlan. This field requires managing the lifetime
of the array and NMPlatformLnkVlan cannot provide that.
See also for example NMPClass.cmd_obj_copy() which can deep-copy an object.
But this is only suitable for fields in NMPObject*. The purpose of this
rule is that you always can safely copy a NMPlatform* struct without
worrying about the ownership and lifetime of the fields (the field's
lifetime is unlimited).
This rule and managing of resource lifetime is the main reason for the
NMPlatform*/NMPObject* split. NMPlatform* structs simply have no mechanism
for copying/releasing fields, that is why the NMPObject* counterpart exists
(which is ref-counted and has a copy and destructor function).
This is violated in tc_commit() for the "kind" strings. The lifetime
of these strings is tied to the setting instance.
We cannot intern the strings (because these are arbitrary strings
and interned strings are leaked indefinitely). We also cannot g_strdup()
the strings, because NMPlatform* is not supposed to own strings.
So, just add comments that warn about this ugliness.
The more correct solution would be to move the "kind" fields inside
NMPObjectQdisc and NMPObjectTfilter, but that is a lot of extra effort.
There is only one caller, hence it's simpler to see it all in one place.
I prefer this, because then I can read the code top to bottom and
see what's happening, without following helper functions.
Also, this way we can "reuse" the nla_put_failure label and assertion. Previously,
if the assertion was hit we would not rewind the buffer but continue
constructing the message (which is already borked). Not that it matters
too much, because this was on an "failed-assertion" code path.
Kernel calls the netlink attribute TCA_FQ_CODEL_MEMORY_LIMIT. Likewise,
iproute2 calls this "memory_limit".
Rename because TC parameters are inherrently tied to the kernel
implementation and we should use the familiar name.
iproute2 uses the special value ~0u to indicate not to set
TCA_FQ_CODEL_CE_THRESHOLD in RTM_NEWQDISC. When not explicitly
setting the value, kernel treats the threshold as disabled.
However note that 0xFFFFFFFFu is not an invalid threshold (as far as
kernel is concerned). Thus, we should not use that as value to indicate
that the value is unset. Note that iproute2 uses the special value ~0u
only internally thereby making it impossible to set the threshold to
0xFFFFFFFFu). But kernel does not have this limitation.
Maybe the cleanest way would be to add another field to NMPlatformQDisc:
guint32 ce_threshold;
bool ce_threshold_set:1;
that indicates whether the threshold is enable or not.
But note that kernel does:
static void codel_params_init(struct codel_params *params)
{
...
params->ce_threshold = CODEL_DISABLED_THRESHOLD;
static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
...
if (tb[TCA_FQ_CODEL_CE_THRESHOLD]) {
u64 val = nla_get_u32(tb[TCA_FQ_CODEL_CE_THRESHOLD]);
q->cparams.ce_threshold = (val * NSEC_PER_USEC) >> CODEL_SHIFT;
}
static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
{
...
if (q->cparams.ce_threshold != CODEL_DISABLED_THRESHOLD &&
nla_put_u32(skb, TCA_FQ_CODEL_CE_THRESHOLD,
codel_time_to_us(q->cparams.ce_threshold)))
goto nla_put_failure;
This means, kernel internally uses the special value 0x83126E97u to indicate
that the threshold is disabled (WTF). That is because
(((guint64) 0x83126E97u) * NSEC_PER_USEC) >> CODEL_SHIFT == CODEL_DISABLED_THRESHOLD
So in kernel API this value is reserved (and has a special meaning
to indicate that the threshold is disabled). So, instead of adding a
ce_threshold_set flag, use the same value that kernel anyway uses.
The memory-limit is an unsigned integer. It is ugly (if not wrong) to compare unsigned
values with "-1". When comparing with the default value we must also use an u32 type.
Instead add a define NM_PLATFORM_FQ_CODEL_MEMORY_LIMIT_UNSET.
Note that like iproute2 we treat NM_PLATFORM_FQ_CODEL_MEMORY_LIMIT_UNSET
to indicate to not set TCA_FQ_CODEL_MEMORY_LIMIT in RTM_NEWQDISC. This
special value is entirely internal to NetworkManager (or iproute2) and
kernel will then choose a default memory limit (of 32MB). So setting
NM_PLATFORM_FQ_CODEL_MEMORY_LIMIT_UNSET means to leave it to kernel to
choose a value (which then chooses 32MB).
See kernel's net/sched/sch_fq_codel.c:
static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
...
q->memory_limit = 32 << 20; /* 32 MBytes */
static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
...
if (tb[TCA_FQ_CODEL_MEMORY_LIMIT])
q->memory_limit = min(1U << 31, nla_get_u32(tb[TCA_FQ_CODEL_MEMORY_LIMIT]));
Note that not having zero as default value is problematic. In fields like
"NMPlatformIP4Route.table_coerced" and "NMPlatformRoutingRule.suppress_prefixlen_inverse"
we avoid this problem by storing a coerced value in the structure so that zero is still
the default. We don't do that here for memory-limit, so the caller must always explicitly
set the value.
In practice, there is no difference when representing 0 or 1 as signed/unsigned 32
bit integer. But still use the correct type that also kernel uses.
Also, the implicit conversation from uint32 to bool was correct already.
Still, explicitly convert the uint32 value to boolean in _new_from_nl_qdisc().
It's no change in behavior.
ethtool/mii API is based on the ifname. As an interface can be renamed,
this API is inherently racy. We would prefer to use the ifindex instead.
The ifindex of a device cannot change (altough it can repeat, which opens a
different race *sigh*).
Anyway, we were already trying to minimize the race be resolving the
name from ifindex immediately before the call to ethtool/mii.
Do better than that. Now resolve the name before and after the call. If
the name changed in the meantime, we have an indication that a race
might have happend (but we cannot be sure).
Note that this can not catch every possible kind of rename race. If you are very
unlucky a swapping of names cannot be detected.
For getters this is relatively straight forward. Just retry when we
have an indication to fall victim to a race (up to a few times). Yes, we
still cannot be 100% sure, but this should be very reliable in practice.
For setters (that modify the device) we also retry. We do so under the
assumption that setting the same options multiple times has no bad effect.
Note that for setters the race of swapping interface names is particularly
bad. If we hit a very unlucky race condition, we might set the setting on
the wrong interface and there is nothing we can do about it. The retry only
ensures that eventually we will set it on the right interface.
Note that this involves one more if_indextoname() call for each operation (in
the common case when there is no renaming race). In cases where we make
multiple ioctl calls, we cache and reuse the information though. So, for such
calls the overhead is even smaller.