When NMClient gets destroyed, it unrefs all NMObject. We need to unbreak
cycles then, and the property getters must return NULL. In particular,
for "o" type properties (NMLDBusPropertyO), this was not done correctly.
For example, calling nm_device_get_active_connection() while/after
destroying the NMClient can give a dangling pointer and assertion
failure. This will also be covered by test_activate_virtual(). Probably
a similar issue can happen, when a D-Bus object gets removed (without
destroying NMClient altogether).
The fix is that nml_dbus_property_o_clear() needs to clear "nmobj". That
is correct, because the pointer is no longer valid and should not be there.
And the unit test shows that in fact a pointer is left there, and
clearing it fixes it.
That was different from an earlier attempt to fix this (in commit 62b2aa85e8
('Revert "libnm: fix dangling pointer in public API while destructing NMClient"')),
where clearing the pointer at a different place broke things. That
attempt was wrong, because nml_dbus_property_o_notify_changed() needs to be the
one that sets/clears nmobj field during a regular update. But the case
here is not a regular update, nml_dbus_property_o_clear() happens during
unregister/cleanup, and then we need to clear the pointer.
Fixes: ce0e898fb4 ('libnm: refactor caching of D-Bus objects in NMClient')
https://bugzilla.redhat.com/show_bug.cgi?id=2039331https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/896
See-also: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1064https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1075
For IPv6 the lease doesn't necessarily have an address. If the address
is missing or the DHCP client doesn't implement accept(), we don't
need to wait for the address in platform.
From https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1066#note_1233210 :
0 0x00007ffff760f88c in __pthread_kill_implementation () at /lib64/libc.so.6
1 0x00007ffff75c26a6 in raise () at /lib64/libc.so.6
2 0x00007ffff75ac7d3 in abort () at /lib64/libc.so.6
3 0x00007ffff77c5d4c in g_assertion_message (domain=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>, message=<optimized out>)
at ../glib/gtestutils.c:3223
4 0x00007ffff782645f in g_assertion_message_expr
(domain=domain@entry=0x5555559e7c96 "nm", file=file@entry=0x5555559deac0 "src/core/dhcp/nm-dhcp-client.c", line=line@entry=609, func=func@entry=0x5555559e0090 <__func__.31> "l3_cfg_notify_cb", expr=expr@entry=0x5555559df5cf "lease_address") at ../glib/gtestutils.c:3249
5 0x00005555558b2866 in l3_cfg_notify_cb (l3cfg=0x555555c29790, notify_data=<optimized out>, self=0x555555e9a1b0) at src/core/dhcp/nm-dhcp-client.c:609
9 0x00007ffff791abe3 in <emit signal ??? on instance ???> (instance=instance@entry=0x555555c29790, signal_id=<optimized out>, detail=detail@entry=0) at ../gobject/gsignal.c:3553
6 0x00007ffff78fcc7f in g_closure_invoke (closure=0x555555ca3900, return_value=0x0, n_param_values=2, param_values=0x7fffffffd420, invocation_hint=0x7fffffffd3a0)
at ../gobject/gclosure.c:830
7 0x00007ffff7919106 in signal_emit_unlocked_R
(node=node@entry=0x555555bbadc0, detail=detail@entry=0, instance=instance@entry=0x555555c29790, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fffffffd420) at ../gobject/gsignal.c:3742
8 0x00007ffff791a9ca in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fffffffd5f0)
at ../gobject/gsignal.c:3497
10 0x000055555564c8a6 in _nm_l3cfg_emit_signal_notify (self=self@entry=0x555555c29790, notify_data=notify_data@entry=0x7fffffffdf30) at src/core/nm-l3cfg.c:576
11 0x000055555564ce77 in _nm_l3cfg_emit_signal_notify_simple (self=self@entry=0x555555c29790, notify_type=notify_type@entry=NM_L3_CONFIG_NOTIFY_TYPE_POST_COMMIT)
at src/core/nm-l3cfg.c:585
12 0x0000555555656082 in _l3_commit (self=self@entry=0x555555c29790, commit_type=NM_L3_CFG_COMMIT_TYPE_UPDATE, commit_type@entry=NM_L3_CFG_COMMIT_TYPE_AUTO, is_idle=is_idle@entry=1)
at src/core/nm-l3cfg.c:4201
13 0x0000555555656189 in _l3_commit_on_idle_cb (user_data=user_data@entry=0x555555c29790) at src/core/nm-l3cfg.c:2961
14 0x00007ffff77f847b in g_idle_dispatch (source=0x555555d65680, callback=0x55555565612c <_l3_commit_on_idle_cb>, user_data=0x555555c29790) at ../glib/gmain.c:5897
15 0x00007ffff77fc130 in g_main_dispatch (context=0x555555aa5020) at ../glib/gmain.c:3381
16 g_main_context_dispatch (context=0x555555aa5020) at ../glib/gmain.c:4099
17 0x00007ffff7851208 in g_main_context_iterate.constprop.0 (context=0x555555aa5020, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4175
18 0x00007ffff77fb853 in g_main_loop_run (loop=0x555555aa5970) at ../glib/gmain.c:4373
19 0x0000555555593c56 in main (argc=<optimized out>, argv=<optimized out>) at src/core/main.c:509
Fixes: e1648d0665 ('core: commit l3cd asynchronously on DHCP bound event')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1074
Update DNS only when something relevant changes:
- an old l3cd gets removed, without adding a new one
- a new one is added without removing an old one
- an old is removed and it differs (in routes and DNS) from the new
added one
NMPolicy stores the last hostname set in priv->cur_hostname and checks
if it changed before notifying the DNS manager.
_set_hostname() should be called one time early during startup so that
the priv->cur_hostname gets initialized without triggering a DNS
update.
I see a significant performance increase with many parallel
activations if DNS updates are deferred to the SECONDARIES state.
Since there is no guarantee that device_l3cd_changed() is called again
when the device becomes ACTIVATED, we need also to change
device_state_changed().
When a lease is obtained, currently NMDevice performs a synchronous
commit of IP configuration and then accepts the lease.
Instead, let NMDevice only schedule a async commit; when the DHCP
client notices that the new address was committed it will
automatically accept it and emit a new signal so that the device can
succeed the activation.
Sync commits should be avoided because a commit does of things which
are outside the control of the caller (see the comment in
nm_device_l3cfg_commit()). Furthermore, when there are many pending
activations, async commits seem to help in reducing the CPU usage.
While making the commit async, also move the responsibility of the
accept() to NMDhcpClient.
This breaks test @nmcli_monitor. With this patch, `nmcli monitor` no
longer prints "There's no primary connection". Need to investigate why.
For now, revert the patch.
This reverts commit 2afecaf908.
Currently, ip4 new config signal is being emitted twice, due
to the lack of a barrier to a possible situation. This
commit includes this.
This fixes a crash on NMCI tests due to an assertion failure,
now it will not go ahead in the function anymore if it is under the
undesired conditions.
The backtrace of the crashes can be found at
https://bugzilla.redhat.com/show_bug.cgi?id=2028385
While (and after) NMClient gets destroyed, nm_device_get_active_connection()
gives a dangling pointer. That can lead to a crash. This probably
affects all NMLDBusPropertyO type properties.
It's not clear how to fix that best. Usually, NMClient does updates in
two phases, first it processes the D-Bus events and tracks internal
data, then it emits all GObject signals and notifications.
When an object gets removed from the NMClient cache, then the second
phase is not fully processed, because the object is already removed
from the cache. Thus, the property was not properly cleared leaving
a dangling pointer.
A simple fix is to always clear the pointer during the first phase. Note that
effectively we do the same also for NMLDBusPropertyAO (by clearing the
"pr_ao->arr"), so at least this is consistent.
Somehow it seems that we should make sure that the "second" phase gets
full processed in this case too. But it's complicated, and it's not
clear how to do that. So this solution seems fine.
https://bugzilla.redhat.com/show_bug.cgi?id=2039331https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/896
When destroying NMClient, nm_device_get_active_connection() still
return dangling pointers. Add a unit test for that bug.
Obviously, the bug currently exists, so the relevant code is commented
out.
In function 'nm_uuid_unparse',
inlined from 'nm_uuid_generate_from_string_str' at src/libnm-glib-aux/nm-uuid.c:393:12,
inlined from 'nm_uuid_generate_from_strings.constprop' at src/libnm-glib-aux/nm-uuid.c:430:16:
src/libnm-glib-aux/nm-uuid.h:37:12: error: 'uuid' may be used uninitialized [-Werror=maybe-uninitialized]
37 | return nm_uuid_unparse_case(uuid, out_str, FALSE);
| ^
src/libnm-glib-aux/nm-uuid.c: In function 'nm_uuid_generate_from_strings.constprop':
src/libnm-glib-aux/nm-uuid.c:20:1: note: by argument 1 of type 'const struct NMUuid *' to 'nm_uuid_unparse_case.constprop' declared here
20 | nm_uuid_unparse_case(const NMUuid *uuid, char out_str[static 37], gboolean upper_case)
| ^
src/libnm-glib-aux/nm-uuid.c:390:12: note: 'uuid' declared here
390 | NMUuid uuid;
| ^
lto1: all warnings being treated as errors
The problem are code paths with failed g_return*() assertions. Being in
a bad state already, they don't bother to ensure proper return values,
and with LTO the compiler might think there are valid code paths wrongly
handled. Work around.
If the connection has wfd_ies set, we want to ensure that a WFD
connection is being established so we want to check that the target peer
also supports WFD. For now this in the IWD backend only.
However, WFD peers will send us their WFD IEs in their Probe Responses
only if our own Probe Request contained a WFD IE, or at least that's
when the spec mandates that they include the WFD IE. This implies that
the discovery phase (NM.Device.WifiP2P.StartFind(options) / .StopFind())
needs to know the value of the local WFD IE and include it in the Probe
Requests, which existing clients (gnome-network-displays) doesn't do and
in fact can't do because there's no API for the client to pass the WFD
IE -- that is easy to add in the options parameter though (a
dictionary).
The current situation is that with the wpa_supplicant backend we'll
sometimes see the WFD IE (when the peer is discovered from a Probe
Request that it has sent, rather than from a Probe Response) and
sometimes not, while with the IWD backend we'll never see the WFD
information because IWD assumes that the client (NM) is not interested
in seeing it if it hasn't registered the local WFD information before
starting discovery.
So, for compatibility with this existing situation and with the
wpa_supplicant backend, ignore whether the peer is a WFD peer except in
the PREPARE stage. In PREPARE, if we have a peer compatible with the
connection being activated -- except for the WFD information -- force a
new discovery, this time passing the WFD information from the
NMSettingConnection's wfd_ies, and only progress to CONFIG if the target
peer is re-discovered as a WFD peer within 10 seconds.
We don't actually check the contents of the WFD IEs to match, e.g. we
don't check the sink/source/dual role compatibility between our and the
peer's properties, but IWD will do some of these checks later during
activation.
Similarly as with wpa_supplicant, create NMDeviceIwdP2P objects for P2P
devices based on data from IWD -- not in NMWifiFactory::create_device
because that is only triggered for system netdevs and a P2P-Device
virtual interface has no netdev. Unlike NMDeviceWifiP2P,
NMDeviceIwdP2P's iface property is a unique string that likely doesn't
match any system interface name -- in theory there doesn't need to be
any related netdev on the system (such as an Infrastructure-mode
interface) before a P2P-Device is added.
[thaller@redhat.com: modified original patch]
Since the NM D-Bus API talks to the client using raw WFD IE bytes and
IWD's API wants/provides some of the actual values from the WFD IE
instead (e.g. boolean Source and Sink properties), we need to be able to
parse and build the WFD IE from those property values to do the
translation, add utilities for this. Use one of them to build the WFD IE
property on NMWifiP2PPeer objects.
[thaller@redhat.com: modify original patch to use
nm_g_variant_unref() and add missing #define]
In NMWifiFactory, move the check that prevents NMDevice's from being
created for interfaces in Monitor and other modes, out of the
wpa_supplicant-specific block so that it applies to both IWD and
wpa_supplicant. This check allows P2P device creation to work
differently that Infrastructure, Ad-Hoc and AP modes so it's needed for
P2P support in the IWD backend.
While there change the check to list the modes that are accepted rather
than rely on some of the modes (Monitor, P2P-{Device,Client,GO}) being
bunched together as _NM_802_11_MODE_UNKNOWN.
The interface name has changed in 2019 but the WSC interface has
never been used by NM. Update #define NM_IWD_WSC_INTERFACE in
nm-iwd-manager.h accordingly.
In function '_nm_auto_g_free',
inlined from 'test_tc_config_tfilter_matchall_mirred' at src/libnm-core-impl/tests/test-setting.c:2955:24:
./src/libnm-glib-aux/nm-macros-internal.h:58:1: error: 'str' may be used uninitialized [-Werror=maybe-uninitialized]
58 | NM_AUTO_DEFINE_FCN_VOID0(void *, _nm_auto_g_free, g_free);
| ^
src/libnm-core-impl/tests/test-setting.c: In function 'test_tc_config_tfilter_matchall_mirred':
src/libnm-core-impl/tests/test-setting.c:2955:24: note: 'str' was declared here
2955 | gs_free char *str;
| ^
lto1: all warnings being treated as errors
lto-wrapper: fatal error: gcc returned 1 exit status
We need to support systems where there are hundereds of thousand routes
(e.g. BGP software).
For that, we will ignore routes that have a rtm_protocol value which was
certainly not created by NetworkManager. Before we had a BPF filer to
filter them out, but that had issues and is removed for now. Still,
don't put those routes in the platform cache and ignore them early.
Note that we still deserialize the RTM_NEWROUTE message to a NMPObject,
and don't shortcut earlier. The reason is that we should still call
delayed_action_refresh_all_in_progress() and handle the RTM_GETROUTE
response, even if the route has a different protocol. So we error out
later, shortly before putting the object in the cache. This means we
will malloc() a NMPObject and initialize it, but that is probably cheap,
compared to the first problem that the process already had to wake up
and read the netlink socket.
This restores the effective behavior of the BPF filter, albeit with a
higher overhead, as the route is rejected later. The important part for
now is that we stick to the behavior of not caching certain routes of a
certain protocol. If that can in the future be optimized (e.g. by a new
BPF filter), then we should do that. But for now that would be only a
future performance improvement, which requires new profiling first and
that has not the highest priority. Note that not caching certain routes
already should reduce the largest part of the overhead that those routes
brings. Whether this form is sufficient to reach the expected
performance goals, needs to be measured in the future.
The socket BPF filter operates on the entire message (skb). One netlink
message (for RTM_NEWROUTE) usually contains a list of routes (struct
nlmsghdr), but the BPF filter was only looking at the first route to decide
whether to throw away the entire message. That is wrong.
This causes that we miss routes. It also means that the response to our
RTM_GETROUTE dump request could be filtered and we poll/wait (unsuccessfully)
for it:
<trace> [1641829930.4111] platform-linux: netlink: read: wait for ACK for sequence number 26...
To get this right, the BPF filter would have to look at all routes in the
list to find out whether there are any we want to receive (and if
there are any, to pass the entire message, regardless that is might also
contain routes we want to ignore). As BPF does not support loops, that
is not trivial. Arguably, we still could do something where we look at a
bounded, unrolled number of routes, in the hope that there are not more
routes in one message that we support. The problem is that this BPF
filter is supposed to filter out massive amounts of routes, so most
messages will be filled to the brim with routes and we would have to
expect a high number of routes in the RTM_NEWROUTE that need checking.
We could still ignore routes in _new_from_nl_route(), to enter the
cache. That means, we would catch them later than with the BPF filter,
but still before a lot of the overhead of handling them. That probably
will be done next, but for now revert the commit.
This reverts commit e9ca5583e5.
https://bugzilla.redhat.com/show_bug.cgi?id=2037411
NMConnection is an interface, implemented by NMSimpleConnection
and NMRemoteConnection. A connection is basically a set of NMSetting
instances.
Usually you would expect that one NMSetting instance only gets added to
zero or one NMConnection. It seems a bit ugly, to have one setting tracked by
multiple NMConnection. Still, technically I am not aware of a single problem
with doing that, where it not for NMSimpleConnection:dispose() to clear the
secrets.
There is no need to clear the secrets of an NMSetting, when the
NMConnection gets destroyed. Either this destroys the NMSetting instance
right away (and NMSetting's destructor will clear the secrets anyway), or
somebody else (e.g. another NMConnection instance), keeps the setting
alive. In the latter case, it is wrong to clear the secrets at
this point.
This was done since commit 6a19e68a7d ('libnm-core: clear secrets from
NMSimpleConnection and NMSettingsConnection dispose()'), but let's stop
doing that.
This also causes problems in practice, see [1].
[1] https://gitlab.gnome.org/GNOME/gnome-control-center/-/merge_requests/1099#note_1334333https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/876https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1056
The n-acd instance gets reset in various cases, for example
when the MAC address changes or during errors.
When that happens, we also need to forget all our pending probes,
because they would reference to a now-defunct n-acd instance.
When the address is currently in state NM_L3_ACD_ADDR_STATE_READY,
we possibly have a pending probe. We need to clean that up. Handle
it the same as in the other cases. Yes, this means we forget about
that the address was ready. But a reset of the n-acd state is a dramatic
event. It warrants some drastic start-over.
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2026288
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2028422
See-also: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1035
Fixes: 9a76b07f74 ('l3cfg: fix assertion failure')
While NetworkManager is of course (mostly) single threaded,
our static functions really should be thread safe.
"mostly" single threaded because we have GDBus's worker
thread, we use a thread for writing non-blocking SR-IOV sysctl,
in the past (or still?) we used a thread for async glibc resolver.
Anyway, a low-level function like must be thread safe, when it
uses global data.
Granted, the initialize-once pattern with the flag and the
int variable, is probably in many cases good enough. But it
makes me unhappy, the thought of accessing static data without
a synchronized operation.
This partly restores the previous behavior. The point of the
file owner check is to ensure that the file cannot be read
by unpriviledged processes as it may contain secrets. If the
file is owned by root, that is considered secure (even if our
euid is different).
Possibly, if our euid is not root, then we couldn't read the
file, but that is a different problem.
There is a hierarchy of how files include each other. "main-utils.h"
is pretty much at the bottom (one above "main.c"), in the sense that
it only includes other headers, but is not included itself (aside
"main.c").
Move the utils function to a place where its accessible from everywhere
and rename.
All callers of _nm_setting_get_private() got the offset from the
property info. Add a wrapper _nm_setting_get_private_field() that
takes the property info. This way, it can add some assertions.
Preferably, we embed the private struct in the GObject struct itself.
In the past, we didn't do that, because the struct was in public headers
and changing that would have been an ABI break. For those struct, we
still use g_type_class_add_private().
We have some structs, where the private struct is embedded. An
alternative to that would be, to not have the private struct at all,
like done for NMSettingOvsBridge.
Anyway. So for direct properties we need to capture the offset of the
field (in the private struct). We can either set the offset of the
private struct in _nm_setting_class_commit() to zero and let the field
offset include the private structure offset. Or, the offset of the
private struct is accounted during _nm_setting_class_commit().
Both approaches are basically the same. Just do it consistently. For no
particular reason, choose to set the offset of the private data to zero
for those types.