NM_HASH_OBFUSCATE_PTR() is some snake-oil to not log raw pointer values.
It obviously makes debugging harder.
But we don't need to generate differently obfuscated pointer values.
At least, let most users use the same obfuscation, so that the values
are comparable.
(cherry picked from commit 3e6c8d220a)
Let's just always allocate the hash tables. We will likely need them,
and three hash tables are relatively cheap.
(cherry picked from commit 5b3e96451b)
Routes of type blackhole, unreachable, prohibit don't have an
ifindex/device. They are thus in many ways similar to routing rules,
as they are global. We need a mediator to keep track which routes
to configure.
This will be very similar to what NMPRulesManager already does for
routing rules. Rename the API, so that it also can be used for routes.
Renaming the file will be done next, so that git's rename detection
doesn't get too confused.
(cherry picked from commit ea4f6d7994)
So far, certain NMObject types could not have an ifindex of zero. Hence,
nmp_lookup_init_object() took such an ifindex to mean lookup all objects
of that type.
Soon, we will support blackhole/unreachable/prohibit route types, which
have their ifindex set to zero. It is still useful to lookup those routes
types via nmp_lookup_init_object().
Change behaviour how to interpret the ifindex. Note that this also
affects various callers of nmp_lookup_init_object(). If somebody was
relying on the previous behavior, it would need fixing.
(cherry picked from commit d4ad9666bd)
_vt_cmd_obj_is_alive_ipx_route() is called by nmp_object_is_alive().
Non-alive objects are not put into the cache.
That certainly makes sense for RTM_F_CLONED routes, because they are
generated ad-hoc during the `ip route get` request.
Checking for the ifindex is not necessary. For one, some route types
(blackhole, unreachable, prohibit) don't have an ifindex. Also, the
purpose of _vt_cmd_obj_is_alive_ipx_route() is not to validate the
object. Just don't create objects without an ifindex, if you think the
route needs an ifindex. Checking here is not useful.
We also don't check that other fields like rt_source are valid, so there
is no need to do it for the ifindex either.
(cherry picked from commit 1123d3a5fb)
Currently, for NMPlatformIP[46]Route always has a gateway, even if it's
possibly set to 0.0.0.0/::. Not sure whether kernel has a further
distinction between no-gateway and all-zero gateway.
Anyway. For us, a gateway of 0.0.0.0/:: means the same as having no
gateway. We cannot differentiate the two (nor do we need to).
Don't print that in nm_platform_ip[46]_route_to_string().
Also, because we are going to add blackhole route types, which cannot
have a next-hop. But we do this change for all routes types, because
it makes sense in general (and also what `ip route show` prints).
(cherry picked from commit b58711f20d)
The variable with this purpose is usually called "IS_IPv4".
It's upper case, because usually this is a const variable, and because
it reminds of the NM_IS_IPv4(addr_family) macro. That letter case
is unusual, but it makes sense to me for the special purpose that this
variable has.
Anyway. The naming of this variable is a different point. Let's
use the variable name that is consistent and widely used.
(cherry picked from commit 8085c0121f)
_nm_ip_route_attribute_validate_all() validates all attributes together.
As such, it calls to nm_ip_route_attribute_validate(), which in turn
validates one attribute at a time.
Such full validation needs to check that (potentially conflicting)
attributes are valid together. Hence, _nm_ip_route_attribute_validate_all()
needs again peek into the attributes.
Refactor the code, so that we can extract the pieces that we need and
not need to parse them twice.
(cherry picked from commit 0413b1bf8a)
First of all, all of NMVariantAttributeSpec is internal API. We only
expose the typedef itself as public API, but not its fields nor
their meaning. So we can change things.
Change "str_type" to "type_detail", so that it can work for any kind of
attribute, not only for strings. Usually, we want to avoid special
cases and treat all attributes the same, based on their GVariant type.
But sometimes, it is necessary to do something special with an
attribute. This is what the "type_detail" encodes, but it's not only
relevant for strings.
(cherry picked from commit 6f277d8fa6)
Usually the normalization (canonicalize) and validation of the IP
address string both requires to parse the string. As we always do
validation first, we can use the parsed address and don't need to parse
it a second time.
(cherry picked from commit 00e4f21629)
Order the fields by their size, to minimize the alignment gaps.
I guess, that doesn't matter because the alignment of the heap
allocation is larger than what we can safe here. Still, there is
on reason to do it any other way.
Also, it's not possible via API to set family/prefix to values outside
their range, so an 8bit integer is always sufficient. And we don't want
that invariant to change. We don't ever want to allow the caller to set
values that are clearly invalid, and will assert against that early (g_return()).
Point is, we can do this and there is no danger of future problems.
And even if we will support larger values, it's all an implementation
detail anyway.
(cherry picked from commit 6208a1bb84)
`git bisect run` is peculiar about the exit code:
error: bisect run failed: exit code 134 from '...' is < 0 or >= 128
If we just "exec" the test, it usually will fail on an assert. That results
in SIGABRT or exit code 134. So out of the box that is annoying with
git-bisect.
Work around that and let the test wrapper always coerce any test failure
to exit code 1.
(cherry picked from commit f65747f6e9)
We made the choice, that NMPlatformIPRoute does not contain the actual
route table, instead it contains a "remapped" number: table_coerced.
That remapping done, so that the default (which we want semantically to
be 254, RT_TABLE_MAIN) is numerical zero so that struct initialization
doesn't you require to explicitly set the default.
Hence, we must always distinguish whether we have the real table number
or the "table_coerced", and you must convert back and forth between the
two.
Now, the parameter of nm_l3_config_data_merge() are real table numbers
(as also indicated by their name not having the term "coerced"). So
usually they are set to actually 254.
When we set the field of NMPlatformIPRoute, we must coerce it. This was
wrong, and we would see wrong table numbers in the log:
l3cfg[17b98e59a477b0f4,ifindex=2]: obj-state: track: [2a32eca99405767e, ip4-route, type unicast table 0 0.0.0.0/0 via ...
Fixes: b4aa35e72d ('l3cfg: extend nm_l3cfg_add_config() to accept default route table and metric')
(cherry picked from commit e23ebe9183)
In systemd, it's common that a D-Bus activatable service references
`SystemdService=dbus-$BUSNAME.service` instead the real service name.
Together with an `[Install].Alias=dbus-$BUSNAME.service` directive,
this allowed to enable/disable D-Bus activation without uninstalling the
service altogether ([1]).
Currently, when we install the RPM then `nm-priv-helper.service` is not
enabled, consequently the alias is not created, and D-Bus activation
does not work. I guess, we should fix that by enabling the service in
the %post section or via a systemd preset? Dunno.
Anyway. It seems that nm-priv-helper.service is more of an
implementation detail of NetworkManager. It makes not sense for the user
to interact directly, or to enable/disable D-Bus activation (because
that is how it works).
So, drop the alias.
See-also: [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Systemd/#activation_dbus
(cherry picked from commit d849807521)
"-" is not allowed as D-Bus path and interface name, and discouraged as
bus name. This cause nm-priv-helper to crash, because GDBus asserts the
the object path is valid.
Replace the '-' with '_'. This way, it's consistent with
"nm_dispatcher".
Fixes: d68ab6b8f0 ('nm-sudo: rename to nm-priv-helper')
(cherry picked from commit 16a45d07ed)
gcc-4.8.5-44.el7.x86_64 warns:
In file included from ./src/libnm-systemd-shared/src/basic/hashmap.h:10:0,
from ./src/libnm-systemd-shared/src/shared/dns-domain.h:10,
from src/libnm-systemd-shared/nm-sd-utils-shared.c:12:
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u64':
./src/libnm-systemd-shared/src/basic/util.h:30:20: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2ULL(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2ULL(x), NONCONST_LOG2ULL(x))
^
./src/libnm-systemd-shared/src/basic/util.h:34:16: note: in expansion of macro 'LOG2ULL'
return LOG2ULL(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2i':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:56:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:60:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
gcc-4.8.5-44.el7.x86_64 warns:
In file included from ./src/libnm-systemd-shared/src/basic/hashmap.h:10:0,
from ./src/libnm-systemd-shared/src/shared/dns-domain.h:10,
from src/libnm-systemd-shared/nm-sd-utils-shared.c:12:
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u64':
./src/libnm-systemd-shared/src/basic/util.h:30:20: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2ULL(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2ULL(x), NONCONST_LOG2ULL(x))
^
./src/libnm-systemd-shared/src/basic/util.h:34:16: note: in expansion of macro 'LOG2ULL'
return LOG2ULL(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2i':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:56:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:60:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
The metered status can depend on the DHCP lease, as we accept the
ANDROID_METERED vendor option. That means, on a DHCP update we need
to re-evaluate the metered flag.
This fixes a potential race, where IPv6 might succeed first and
activation completes (with GUESS_NO metered flag). A subsequent
DHCPv4 update requires to re-evaluate that decision.
Fixes-test: @connection_metered_guess_yes
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1080
We always call nl_recv() in a loop. If it would be necessary to clear
the variable, then it would need to happen inside the loop. But it's
not necessary.
Instead of allocating a receive buffer for each nl_recv() call, re-use a
pre-allocated buffer.
The buffer is part of NMPlatform and kept around. As we would not have more
than one NMPlatform instance per netns, we waste a limited amount of
memory.
The buffer gets initialized with 32k, which should be large enough for
any rtnetlink message that we might receive. As before, if the buffer
would be too small, then the first large message would be lost (as we don't
peek). But then we would allocate a larger buffer, resync the platform cache
and recover.
Add parameter to accept a pre-allocated buffer for nl_recv(). In
practice, rtnetlink messages are not larger than 32k, so we can always
pre-allocate it and avoid the need to heap allocate it.
In the past, nl_recv() was libnl3 code and thus used malloc()/realloc() to
allocate the buffer. That meant, we should free the buffer with libc's free()
function instead of glib's g_free(). That is what nm_auto_free is for.
Nowadays, nl_recv() is forked and glib-ified, and uses the glib wrappers to
allocate the buffer. Thus the buffer should also be freed with g_free()
(and gs_free).
In practice there is no difference, because glib's allocation directly
uses libc's malloc/free. This is purely a matter of style.
When the NM_UNMANAGED_PLATFORM_INIT flag is cleared last in
device_link_changed(), a recheck-assume is scheduled and then the
device goes immediately to UNAVAILABLE. During the state transition,
addresses and routes are removed from the interface. Then,
recheck-assume finds that the device can be assumed but it's too late
since the device was already deconfigured.
This is a problem as the whole point of assuming a device is to
activate a connection while leaving the device untouched.
In the NMCI "dracut_NM_vlan_over_bridge and dracut_NM_vlan_over_bond"
test, NM in real root tries to assume a vlan device that was activated
in initrd. When the interface gets deconfigured in UNAVAILABLE, the
connection to the NFS server breaks and the rootfs becomes
inaccessible.
The fix to this problem is to delay state transitions in
device_link_changed() to a idle handler, so that recheck-assume can
run before.
Fixes-test: @dracut_NM_vlan_over_bridge
Fixes-test: @dracut_NM_vlan_over_bond
https://bugzilla.redhat.com/show_bug.cgi?id=2047302
nm_device_set_unmanaged_by_user_settings() does nothing when the
device is unmanaged by platform-init. Remove the if branch to make
this more explicit.
The ACD state handling is unfortunately very complicated. That is, because
we obviously need to track state about how ACD is going (the acd_data, and
in particular NML3AcdAddrState). Then there are various things that can
happen, which are the AcdStateChangeMode enums. All these state-changes
come together in one function: _l3_acd_data_state_change(), which is
therefore complicated (I don't think that it would become simpler by
spreading this code out to different functions, on the contrary).
Anyway.
So, what happens when we need to reset the n-acd instance? For example,
because the MAC address of the link changed or some error. I guess, we
need to restart probing.
Previously, I think this was not handled properly. We already tried to
fix this several times, the last was commit b331606386 ('l3cfg: on
n-acd instance-reset clear also ready ACD state'). There is still an
issue ([1]).
The bug [1] is, that we are in state NM_L3_ACD_ADDR_STATE_READY, during
ACD_STATE_CHANGE_MODE_TIMEOUT event. That leads to an assertion
failure.
#5 0x00007f23be74698f in g_assertion_message_expr (domain=0x5629aca70359 "nm", file=0x5629aca62aab "src/core/nm-l3cfg.c", line=2395, func=0x5629acb26b30 <__func__.72.lto_priv.4> "_l3_acd_data_state_change", expr=<optimized out>) at ../glib/gtestutils.c:3091
#6 0x00005629ac937e46 in _l3_acd_data_state_change (self=0x5629add69790, acd_data=0x5629add8d520, state_change_mode=ACD_STATE_CHANGE_MODE_TIMEOUT, sender_addr=0x0, p_now_msec=0x7ffded506460) at src/core/nm-l3cfg.c:2395
#7 0x00005629ac939f4d in _l3_acd_data_timeout_cb (user_data=user_data@entry=0x5629add8d520) at src/core/nm-l3cfg.c:1933
#8 0x00007f23be71c5a1 in g_timeout_dispatch (source=0x5629addd7a80, callback=0x5629ac939ee0 <_l3_acd_data_timeout_cb>, user_data=0x5629add8d520) at ../glib/gmain.c:4889
#9 0x00007f23be71bd4f in g_main_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:3337
#10 g_main_context_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:4055
That can only happen, (I think) when we scheduled the timeout
during an earlier ACD_STATE_CHANGE_MODE_INSTANCE_RESET event. Meaning,
we need to handle instance-reset better.
Instead, during instance-reset, switch always back to state PROBING, and
let the timeout figure it out.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2047788
The first point is that ACD timeout is strongly tied to the current state. That
is (somewhat) obvious, because _l3_acd_data_state_set_full() will clear any pending
timeout. So you can only schedule a timeout *after* setting the state,
and setting the next state, will clear the timeout.
Likewise, note that l3_acd_data_state_change() for the event
ACD_STATE_CHANGE_MODE_TIMEOUT asserts that it is only called in the few
states where that is expected. See rhbz#2047788 where that assertion
gets hit.
The first point means that we must only schedule a timer when we are
also in a state that supports that. Add an assertion for that at the
point when scheduling the timeout. The assert at this point is useful,
because it catches the moment when we do the wrong thing (instead of
getting the assertion later during the timeout, when we no longer know
where the error happened).
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2047788