We now only call the idle action with the same reason: authorized.
That is since we no longer use GDBusProxy, there are no other reasons
where we would fail.
Drop the unused code.
I encountered a failure in the log
<trace> [1564647990.7822] keyfile: commit: deleting nmmeta file "/etc/NetworkManager/system-connections/35370b0b-e53b-42ea-9fe3-f1b1d552343b.nmmeta" failed
<trace> [1564647990.7822] keyfile: commit: deleting nmmeta file "/etc/NetworkManager/system-connections/35370b0b-e53b-42ea-9fe3-f1b1d552343b.nmmeta" simulated
I think that was due to SELinux (rh #1738010).
Let nms_keyfile_nmmeta_write() return an errno code so we can log
more information about the failure.
... and nm_utils_fd_get_contents() and nm_utils_file_set_contents().
Don't mix negative errno return value with a GError output. Instead,
return a boolean result indicating success or failure.
Also, optionally
- output GError
- set out_errsv to the positive errno (or 0 on success)
Obviously, the return value and the output arguments (contents, length,
out_errsv, error) must all agree in their success/failure result.
That means, you may check any of the return value, out_errsv, error, and
contents to reliably detect failure or success.
Also note that out_errsv gives the positive(!) errno. But you probably
shouldn't care about the distinction and use nm_errno_native() either
way to normalize the value.
nm_utils_file_set_contents() is a re-implementation of g_file_set_contents(),
as such it returned merely a boolean success value.
It's sometimes interesting to get the native error code. Let the function
deviate from glib's original g_file_set_contents() and return the error code
(as negative value) instead.
This requires all callers to change. Also, it's potentially a dangerous
change, as this is easy to miss.
Note that nm_utils_file_get_contents() also returns an errno, and
already deviates from g_file_get_contents() in the same way. This patch
resolves at least the inconsistency with nm_utils_file_get_contents().
The secret-agent D-Bus API knows 4 methods: GetSecrets, SaveSecrets,
DeleteSecrets and CancelGetSecrets. When we cancel a GetSecrets
request, we must issue another CancelGetSecrets to tell the agent
that the request was aborted. This is also true during shutdown.
Well, technically, during shutdown we anyway drop off the bus and
it woudn't matter. In practice, I think we should get this right and
always cancel properly.
To better handle shutdown change the following:
- each request now takes a reference on NMSecretAgent. That means,
as long as there are pending requests, the instance stays alive.
The way to get this right during shutdown, is that NMSecretAgent
registers itself via nm_shutdown_wait_obj_register() and
NetworkManager is supposed to keep running as long as requests
are keeping the instance alive.
- now, the 3 regular methods are cancellable (which means: we are
no longer interested in the result). CancelGetSecrets is not
cancellable, but it has a short timeout NM_SHUTDOWN_TIMEOUT_MS
to handle this. We anyway don't really care about the result,
aside logging and to be sure that the request fully completed.
- this means, a request (NMSecretAgentCallId) can now immediately
be cancelled and destroyed, both when the request returns and
when the caller cancels it. The exception is GetSecrets which
keeps the request alive while waiting for CancelGetSecrets. But
this is easily handled by unlinking the call-id and pass it on
to the CancelGetSecrets callback.
Previously, the NMSecretAgentCallId was only destroyed when
the D-Bus call returns, even if it was cancelled earlier. That's
unnecessary complicated.
- previously, D-Bus requests SaveSecrets and DeleteSecrets were not cancellable.
That is a problem. We need to be able to cancel them in order to shutdown in
time.
- use GDBusConnection instead of GDBusProxy. As most of the time, GDBusProxy
provides features we don't use.
- again, don't log direct pointer values, but obfuscate the indentifiers.
In the past, we had a private unix socket. That is long gone.
Drop the remains in "nm-secret-agent.c". The request here really
always comes from the main D-Bus connection.
Maybe the private unix socket makes sense and we might resurrect it one
day. But at that point it would be an entire rewrite and the existing
code is probably not useful either way. Drop it.
- add nm_c_list_elem_find_first() macro that takes a predicate
and returns the first match.
This macro has a non-function-like behavior, which we often try to
avoid because macros should behave like functions. In this case it's
however convenient, so let's do it.
Also, despite being non-function-like, it should be pretty hard to
use wrongly.
- rename nm_c_list_elem_find_first() to nm_c_list_elem_find_first_ptr().
- Don't use GDBusProxy but plain GDBusConnection. NMFirewallManager
is very simple, it doesn't use any of the features that GDBusProxy
provides.
- make NMFirewallManagerCallId typedef a pointer to the opaque call-id
struct, instead of the struct itself. It's confusing to have a
variable that does not look like a pointer and assigning %NULL to
it.
- internally drop the CBInfo typename and name the call-id variable
constsistantly as "call_id".
- no need to keep the call-id struct alive after cancelling it. That
simplifies the lifetime managment of the pending call because the
completion callback is always invoked shortly before destroying
the call-id.
- note that the caller is no longer allowed to cancel a call-id from
inside the completion callback. That just complicates the
implementation and is not necessary. Assert against that.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/230
They should be "static" and only visible to this source file.
Also, they should be "const", that allows the linker to place them
into read-only memory.
For better or worse, the API does not require the value to be a
UTF-8 string. We cannot just concatenate binary to a string.
Instead, backslash escape it with utf8safe-escape.
Also, this will shut up a (wrong) coverity warning at this place.
Seen on gitlab-ci.
NMPlatformSignalAssert: ../src/platform/tests/test-link.c:260, test_slave(): failure to accept signal [0,2] times: link-changed-changed ifindex 15 (3 times received)
ERROR: src/platform/tests/test-link-linux - too few tests run (expected 76, got 6)
ERROR: src/platform/tests/test-link-linux - exited with status 133 (terminated by signal 5?)
CID 59391 (#1 of 1): Copy into fixed size buffer (STRING_OVERFLOW)
31. fixed_size_dest: You might overrun the 16-character fixed-size string be.ifspec.spec.ifname by copying priv->nas_ifname without checking the length.
get_word() only moves the "argument" pointer forward. It never sets it
to %NULL. Also, above we already dereference argument, so Coverity thinks
that this NULL check indicates a bug.
Drop it to silence Coverity.
Warned by coverity: we assert above that brfd is -1, so we must always
restore it to -1 in the error case.
Technically, not a problem because socket() is documented to return
only -1 on error already. Apparently coverity does not believe that.
nl80211_alloc_msg() just allocates some memory, using glib's allocators.
Hence it cannot fail, and we don't need to check for that.
Drop the unnecessary %NULL checks.
Usually we check the result of nla_nest_start(). Also, in most cases where this
function would return %NULL, it's an actual bug. That is, because our netlink
message is allocated with a large buffer, and in most cases we append there a well
known, small amount of data.
To make coverity happy, handle the case and assert.
This triggers a coverity warning because we above already
check that not all relevant keys are NULL together.
Work around warning by modifying the code.
- use nm_g_variant_unref_floating()
- rename _lldp_attr_take_str_ptr() to _lldp_attr_set_str_take().
The new name has the same "_lldp_attr_set_" prefix as other setters.
Also, with the previous name it is unclear why it takes a "str-ptr".
- setting the same attribute multiple times, ignores all but the first
value. Avoid cloning the string in that case, and explicitly choose
the set or take function.
Valgrind complains:
==26355== 32 bytes in 2 blocks are definitely lost in loss record 2,829 of 6,716
==26355== at 0x4838748: malloc (vg_replace_malloc.c:308)
==26355== by 0x483AD63: realloc (vg_replace_malloc.c:836)
==26355== by 0x4F6AD4F: g_realloc (in /usr/lib64/libglib-2.0.so.0.6000.6)
==26355== by 0x4F87B33: ??? (in /usr/lib64/libglib-2.0.so.0.6000.6)
==26355== by 0x4F87B96: g_string_sized_new (in /usr/lib64/libglib-2.0.so.0.6000.6)
==26355== by 0x2D66E1: nm_utils_buf_utf8safe_escape (nm-shared-utils.c:1911)
==26355== by 0x4113B0: lldp_neighbor_new (nm-lldp-listener.c:676)
==26355== by 0x412788: process_lldp_neighbor (nm-lldp-listener.c:882)
==26355== by 0x4135CF: lldp_event_handler (nm-lldp-listener.c:931)
==26355== by 0x422CDB: lldp_callback (sd-lldp.c:50)
==26355== by 0x4235F9: lldp_add_neighbor (sd-lldp.c:166)
==26355== by 0x423679: lldp_handle_datagram (sd-lldp.c:189)
==26355== by 0x423C8B: lldp_receive_datagram (sd-lldp.c:235)
==26355== by 0x2F887A: source_dispatch (sd-event.c:2832)
==26355== by 0x2FAD43: sd_event_dispatch (sd-event.c:3245)
==26355== by 0x2D9237: event_dispatch (nm-sd.c:51)
==26355== by 0x4F64EDC: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.6000.6)
==26355== by 0x4F6526F: ??? (in /usr/lib64/libglib-2.0.so.0.6000.6)
==26355== by 0x4F655A2: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.6000.6)
==26355== by 0x140932: main (main.c:465)
==26355==
Not all masters type have a platform link and so it's wrong to check
for it to decide whether the slave should be really released. Move the
check to master devices that need it (bond, bridge and team).
OVS ports don't need the check because they don't call to platform to
remove a slave.
https://bugzilla.redhat.com/show_bug.cgi?id=1733709
We set nm-owned to indicate whether a software device was created by
NM or it was pre-existing. When checking the existence, we must verify
also whether the link type is compatible with the device, otherwise it
is possible to match unrelated interfaces. For example, when checking
for the existence of an ovs-bridge (which is not compatible with any
platform link) we could match a unrelated platform link with the same
name.
https://bugzilla.redhat.com/show_bug.cgi?id=1733709
These are special -- initramfs configured them and killed dhclient. Bad
things would happen if we let the addresses expire though.
Let's act as if initramfs actually passed the configuration to us.
It actually tries to do so by the means of writing an ifcfg file, but
that one is too broken to be useful, so the ifcfg-rh plugin ignores it.
Notably, it doesn't have the actual addresses or correct BOOTPROTO.
The generated connection is better.
Co-authored-by: Thomas Haller <thaller@redhat.com>
With "wireguard.ip4-auto-default-route" and "wireguard.ip6-auto-default-route",
NetworkManager automatically adds policy routing rules for the default
route.
For that, it needs to pick (up to) two numbers:
- the fwmark. This is used both for WireGuard's fwmark setting and
is also the number of the routing table where the default-route is
added.
- the rule priority. NetworkManager adds two policy routing rules, and
we need to place them somewhere before the default rules (at 32766).
Previously, we looked at exiting platform configuration and picked
numbers that were not yet used. However, during restart of
NetworkManager, we leave the interface up and after restart we will
take over the previous configuration. At that point, we need to choose
the same fwmark/priority, otherwise the configuration changes.
But since we picked numbers that were not yet used, we would always choose
different numbers. For routing rules that meant that after restart a second
pair of rules was added.
We possibly could store this data in the device's state-file. But that
is complex. Instead, just pick numbers deterministically based on the
connection's UUID.
If the picked numbers are not suitable, then the user can still work
around that:
- for fwmark, the user can explicitly configure wireguard.fwmark
setting.
- for the policy routes, the user can explicitly add the rules with
the desired priorirites (arguably, currently the default-route cannot
be added like a regular route, so the table cannot be set. Possibly
the user would have to add two /1 routes instead with
suppress_prefixlength=1).
We call _auto_default_route_init() at various places, for example during
coerce_route_table(). We cannot be sure that we don't call it before
activation starts (before stage1) or after device-cleanup.
That means, upon activation, we need to clear the state first. Do that in
act_stage1_prepare().
#3 0x00007fb0aa9e7d3d in g_return_if_fail_warning
(log_domain=log_domain@entry=0x562295fd5ee3 "libnm", pretty_function=pretty_function@entry=0x562295fd71d0 <__func__.35180> "_connection_get_setting_check", expression=expression@entry=0x562295f8edf7 "NM_IS_CONNECTION (connection)") at ../glib/gmessages.c:2767
#4 0x0000562295df151a in _connection_get_setting_check (connection=0x0, setting_type=0x562297b17050 [NMSettingWireGuard/NMSetting]) at libnm-core/nm-connection.c:207
#5 0x0000562295df151a in _connection_get_setting_check (connection=0x0, setting_type=0x562297b17050 [NMSettingWireGuard/NMSetting]) at libnm-core/nm-connection.c:205
#6 0x0000562295ef132a in _nm_connection_get_setting (type=<optimized out>, connection=0x0) at ./libnm-core/nm-core-internal.h:483
#7 0x0000562295ef132a in _auto_default_route_init (self=self@entry=0x562297bf82b0 [NMDeviceWireGuard]) at src/devices/nm-device-wireguard.c:443
#8 0x0000562295ef1b98 in coerce_route_table (device=0x562297bf82b0 [NMDeviceWireGuard], addr_family=2, route_table=0, is_user_config=<optimized out>)
at src/devices/nm-device-wireguard.c:565
#9 0x0000562295ea42ae in _get_route_table (self=self@entry=0x562297bf82b0 [NMDeviceWireGuard], addr_family=addr_family@entry=2) at src/devices/nm-device.c:2311
#10 0x0000562295ea4593 in nm_device_get_route_table (self=0x562297bf82b0 [NMDeviceWireGuard], addr_family=2) at src/devices/nm-device.c:2338
#11 0x0000562295eabde0 in ip_config_merge_and_apply (self=0x562297bf82b0 [NMDeviceWireGuard], addr_family=2, commit=1) at src/devices/nm-device.c:7590
#12 0x0000562295ed2f0b in device_link_changed (self=self@entry=0x562297bf82b0 [NMDeviceWireGuard]) at src/devices/nm-device.c:3939
#13 0x00007fb0aa9dc7db in g_idle_dispatch (source=source@entry=0x562297bf0b30, callback=0x562295ed2880 <device_link_changed>, user_data=0x562297bf82b0) at ../glib/gmain.c:5627
#14 0x00007fb0aa9dfedd in g_main_dispatch (context=0x562297a28090) at ../glib/gmain.c:3189
#15 0x00007fb0aa9dfedd in g_main_context_dispatch (context=context@entry=0x562297a28090) at ../glib/gmain.c:3854
#16 0x00007fb0aa9e0270 in g_main_context_iterate (context=0x562297a28090, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:3927
#17 0x00007fb0aa9e05a3 in g_main_loop_run (loop=0x562297a0b380) at ../glib/gmain.c:4123
#18 0x0000562295d0b147 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:465
https://bugzilla.redhat.com/show_bug.cgi?id=1734383
For WireGuard (like for all IP-tunnels and IP-based VPNs), the IP addresses of
the peers must be reached outside the tunnel/VPN itself.
For VPN connections, NetworkManager usually adds a direct /32 route to
the external VPN gateway to the underlying device. For WireGuard that is
not done, because injecting a route to another device is ugly and error
prone. Worse: WireGuard with automatic roaming and multiple peers makes this
more complicated.
This is commonly a problem when setting the default-route via the VPN,
but there are also other subtle setups where special care must be taken
to prevent such routing loops.
WireGuard's wg-quick provides a simple, automatic solution by adding two policy
routing rules and relying on the WireGuard packets having a fwmark set (see [1]).
Let's also do that. Add new properties "wireguard.ip4-auto-default-route"
and "wireguard.ip6-auto-default-route" to enable/disable this. Note that
the default value lets NetworkManager automatically choose whether to
enable it (depending on whether there are any peers that have a default
route). This means, common scenarios should now work well without additional
configuration.
Note that this is also a change in behavior and upon package upgrade
NetworkManager may start adding policy routes (if there are peers that
have a default-route). This is a change in behavior, as the user already
clearly had this setup working and configured some working solution
already.
The new automatism picks the rule priority automatically and adds the
default-route to the routing table that has the same number as the fwmark.
If any of this is unsuitable, then the user is free to disable this
automatism. Note that since 1.18.0 NetworkManager supports policy routing (*).
That means, what this automatism does can be also achieved via explicit
configuration of the profile, which gives the user more flexibility to
adjust all parameters explicitly).
(*) but only since 1.20.0 NetworkManager supports the "suppress_prefixlength"
rule attribute, which makes it impossible to configure exactly this rule-based
solution with 1.18.0 NetworkManager.
[1] https://www.wireguard.com/netns/#improved-rule-based-routing