Revert this change. One problem is that none of the current GUIs
(nm-connection-editor, gnome-control-center, plasma-nm) expose the
dns-priority option. So, users tend to have their profile value set to
0. Changing the default means for them not only a change in behavior,
but its hard to fix via the GUI.
Also, what other call DNS leaks, is Split DNS to some. Both uses make
sense, but have conflicting goals. The default cannot accommodate both
at the same time.
Also, with split DNS enabled (dnsmasq, systemd-resolved), the concern
for DNS leaks is smaller. Imagine:
Wi-Fi profile with ipv4.dns-priority (effectively) 100, domain "example.com".
VPN profile with ipv4.dns-priority (effectively) 50 and a default route.
That is a common setup that one gets by default (and what probably many
users have today). In such a case with split DNS enabled, the Wi-Fi's DNS
server only sees requests for "*.example.com". So, it does not leak
everything.
Hence, revert this change before 1.28.0 release to the earlier behavior.
This reverts commit af13081bec.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/688
We used to set "~." domains for all devices that should be used for
resolving unknown domains.
Systemd-resolved also supports setting "SetLinkDefaultRoute()".
We should only set the wildcard domain if we want that this
interface is used exclusively. Otherwise, we should only set
DefaultRoute. See ([1], [2], [3], [4]).
Otherwise the bad effect is if other components (wg-quick) want
to set exclusive DNS lookups on their link. That is achieved by
explicitly adding "~." and that is also what resolved's
`/usr/sbin/resolvconf -x` does. If NetworkManager sets "~." for
interfaces that are not important and should not be used exclusively,
then this steals the DNS requests from those external components.
In NetworkManager we know whether a link should get exclusive lookups
based on the "ipv[46].dns-priority" setting.
[1] https://www.freedesktop.org/software/systemd/man/org.freedesktop.resolve1.html
[2] https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html
[3] https://github.com/systemd/systemd/issues/17529#issuecomment-730522444
[4] https://github.com/systemd/systemd/pull/17678
domain_is_shadowed() only works, because we pre-sort all items. When
we call domain_is_shadowed(), then "priority" must be not smaller than
any priority already in the dictionary.
Let's add an nm_assert() for that.
While at it, I also found it ugly to rely on
GPOINTER_TO_INT(g_hash_table_lookup(ht, domain))
returning zero to know whether the domain is tracked. While more
cumbersome, we should check whether the value is in the hash (and not).
Not whether the value does not translate to zero.
Add domain_ht_get_priority() for that.
Change the default DNS priority of VPNs to -50, to avoid leaking
queries out of full-tunnel VPNs.
This is a change in behavior. In particular:
- when using dns=default (i.e. no split-dns) before this patch both
VPN and the local name server were added (in this order) to
resolv.conf; the result was that depending on resolv.conf options
and resolver implementation, the name servers were tried in a
certain manner which does not prevent DNS leaks.
With this change, only the VPN name server is added to resolv.conf.
- When using a split-dns plugin (systemd-resolved or dnsmasq), before
this patch the full-tunnel VPN would get all queries except those
ending in a local domain, that would instead be directed to the
local server.
After this patch, the VPN gets all queries.
To revert to the old behavior, set the DNS priority to 50 in the
connection profile.
If a VPN has never-default=no but doesn't get a default route (this
can happen for example when the server pushes routes with
openconnect), and there are no search domains, then the name servers
pushed by the server would be unused. It is preferable in this case to
use the VPN DNS server for all queries.
https://bugzilla.redhat.com/show_bug.cgi?id=1863041
Run:
./contrib/scripts/nm-code-format.sh -i
./contrib/scripts/nm-code-format.sh -i
Yes, it needs to run twice because the first run doesn't yet produce the
final result.
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Add a new `main.rc-manager=auto` setting, that favours to use
systemd-resolved (and not touch "/etc/resolv.conf" but configure
it via D-Bus), or falls back to `resolvconf`/`netconfig` binaries
if they are installed and enabled at compile time.
As final fallback use "symlink", like before.
Note that on Fedora there is no "openresolv" package ([1]). Instead, "systemd"
package provides "/usr/sbin/resolvconf" as a wrapper for systemd-resolved's
"resolvectl". On such a system the fallback to resolvconf is always
wrong, because NetworkManager should either talk to systemd-resolved
directly or not but never call "/usr/sbin/resolvconf". So, the special handling
for resolvconf and netconfig is only done if NetworkManager was build with these
applications explicitly enabled.
Note that SUSE builds NetworkManager with
--with-netconfig=yes
--with-config-dns-rc-manager-default=netconfig
and the new option won't be used there either. But of course, netconfig
already does all the right things on SUSE.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=668153
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Arguably, a fixme comment isn't useful. It would be better to fix it.
On the other hand, nowadays these modes are not very popular and usually
not used. If somebody cares, please provide a patch.
NM_IS_IP_CONFIG() is a standard name for GObject related macros. Next,
we will add NMIPConfig object, so this macro (and name) will have a use.
Rename, and adjust the existing macro to avoid the name conflict.
g_clear_pointer() would always cast the destroy notify function
pointer to GDestroyNotify. That means, it lost some type safety, like
GPtrArray *ptr_arr = ...
g_clear_pointer (&ptr_arr, g_array_unref);
Since glib 2.58 ([1]), g_clear_pointer() is also more type safe. But
this is not used by NetworkManager, because we don't set
GLIB_VERSION_MIN_REQUIRED to 2.58.
[1] f9a9902aac
We have nm_clear_pointer() to avoid this issue for a long time (pre
1.12.0). Possibly we should redefine in our source tree g_clear_pointer()
as nm_clear_pointer(). However, I don't like to patch glib functions
with our own variant. Arguably, we do patch g_clear_error() in
such a manner. But there the point is to make the function inlinable.
Also, nm_clear_pointer() returns a boolean that indicates whether
anything was cleared. That is sometimes useful. I think we should
just consistently use nm_clear_pointer() instead, which does always
the preferable thing.
Replace:
sed 's/\<g_clear_pointer *(\([^;]*\), *\([a-z_A-Z0-9]\+\) *)/nm_clear_pointer (\1, \2)/g' $(git grep -l g_clear_pointer) -i
I think it's preferable to use nm_clear_g_free() instead of
g_clear_pointer(, g_free). The reasons are not very strong,
but I think it is overall preferable to have a shorthand for this
frequently used functionality.
sed 's/\<g_clear_pointer *(\([^;]*\), *\(g_free\) *)/nm_clear_g_free (\1)/g' $(git grep -l g_clear_pointer) -i
It was not very clear whether update_dns() would always set the error
output variable if (and only if) it would return false.
Rework the code to make it clearer.
Not using cleanup attribute is error prone.
In theory, a function should only return a GError if (and only if) it
signals a failure. However, for example in commit 324f67956a ('dns:
ensure to log a warning when writing /etc/resolv.conf fails') due to
a bug we was violated. In that case, it resulted in a leak.
Avoid explicit frees and use the gs_free_error cleanup attribute
instead. That would also work correctly in face of such a bug and in
general it seems preferable to explicitly assign ownership to auto
variables on the stack.
When setting "main.rc-manager=symlink" (the default) and /etc/resolv.conf
is a file, NetworkManager tries to write the file directly. When that fails,
we need to make sure to propagate the error so that we log a warning about that.
With this change:
<debug> [1583320004.3122] dns-mgr: update-dns: updating plugin systemd-resolved
<trace> [1583320004.3123] dns-sd-resolved[f9e3febb7424575d]: send-updates: start 8 requests
<trace> [1583320004.3129] dns-mgr: update-resolv-no-stub: '/var/run/NetworkManager/no-stub-resolv.conf' successfully written
<trace> [1583320004.3130] dns-mgr: update-resolv-conf: write to /etc/resolv.conf failed (rc-manager=symlink, $ERROR_REASON)
<trace> [1583320004.3132] dns-mgr: update-resolv-conf: write internal file /var/run/NetworkManager/resolv.conf succeeded
<trace> [1583320004.3133] dns-mgr: current configuration: [{ [...] }]
<warn> [1583320004.3133] dns-mgr: could not commit DNS changes: $ERROR_REASON
<info> [1583320004.3134] device (eth0): Activation: successful, device activated.
https://bugzilla.redhat.com/show_bug.cgi?id=1809181
Several macros are used to define function. They had a "_STATIC" variant,
to define the function as static.
I think those macros should not try to abstract entirely what they do.
They should not accept the function scope as argument (or have two
variants per scope). This also because it might make sense to add
additional __attribute__(()) to the function. That only works, if
the macro does not pretend to *not* define a plain function.
Instead, embrace what the function does and let the users place the
function scope as they see fit.
This also follows what is already done with
static NM_CACHED_QUARK_FCN ("autoconnect-root", autoconnect_root_quark)
Most callers would pass FALSE to nm_utils_error_is_cancelled(). That's
not very useful. Split the two functions and have nm_utils_error_is_cancelled()
and nm_utils_error_is_cancelled_is_disposing().
We should use the same "is-valid" function everywhere.
Since nm_utils_ipaddr_valid() is part of libnm, it does not qualify.
Use nm_utils_ipaddr_is_valid() instead.
and _nm_utils_inet6_ntop() instead of nm_utils_inet6_ntop().
nm_utils_inet4_ntop()/nm_utils_inet6_ntop() are public API of libnm.
For one, that means they are only available in code that links with
libnm/libnm-core. But such basic helpers should be available everywhere.
Also, they accept NULL as destination buffers. We keep that behavior
for potential libnm users, but internally we never want to use the
static buffers. This patch needs to take care that there are no callers
of _nm_utils_inet[46]_ntop() that pass NULL buffers.
Also, _nm_utils_inet[46]_ntop() are inline functions and the compiler
can get rid of them.
We should consistently use the same variant of the helper. The only
downside is that the "good" name is already taken. The leading
underscore is rather ugly and inconsistent.
Also, with our internal variants we can use "static array indices in
function parameter declarations" next. Thereby the compiler helps
to ensure that the provided buffers are of the right size.
The abbreviations "ns" and "ms" seem not very clear to me. Spell them
out to nsec/msec. Also, in parts we already used the longer abbreviations,
so it wasn't consistent.
Note that the only DNS plugin that actually emits the FAILED signal was
NMDnsDnsmasq. Let's not handle restart, retry and rate-limiting by
NMDnsManager but by NMDnsDnsmasq itself.
There are three goals here:
(1) we want that when dnsmasq (infrequently) crashes, that we always keep
retrying. A random crash should be automatically resolved and
eventually dnsmasq should be working again.
Note that we anyway cannot fully detect whether something is wrong.
OK, we detect crashes, but if dnsmasq just gets catatonic, it's just
as broken. Point being: our ability to detect non-working dnsmasq is limited.
(2) when dnsmasq keeps crashing all the time, then rate limit the retry.
Of course, at this point there is already something seriously wrong,
but we shouldn't kill the system by respawning the process without rate
limiting.
(3) previously, when NMDnsManager noticed that the pluging was broken
(and rate-limiting kicked in), it would temporarily disable the plugin.
Basically, that meant to write the real name servers to /etc/resolv.conf
directly, instead of setting localhost. This partly conflicts with
(1), because we want to retry and recover automatically. So what good
is it to notice a problem, resort to plain /etc/resolv.conf for a
short time, and then run into the issues again? If something is really
broken, there is no way but to involve the user to investigate and
fix the issue. Hence, we don't need to concern NMDnsManager with this either.
The only thing that the manager notices is when the dnsmasq binary is not
available. In that case, update() fails right away, and the manager falls back
to configure the name servers in /etc/resolv.conf directly.
Also, change the backoff time from 5 minutes to 1 minute (twice the
burst interval). There is not particularly strong reason for either
choice, I think that if the ratelimit kicks in, then something is
already so wrong that it doesn't matter either way. Anyway, also 60
seconds is long enough to not kill the machine otherwise.
Several points.
- We spawn the dnsmasq process directly. That has several downsides:
- The lifetime of the process is tied to NetworkManager's. When
stopping NetworkManager, we usually also stop dnsmasq. Or we keep
the process running, but later the process is no longer a child process
of NetworkManager and we can only kill it using the pidfile.
- We don't do special sandboxing of the dnsmasq process.
- Note that we want to ensure that only one dnsmasq process is running
at any time. We should track that in a singletone. Note that NMDnsDnsmasq
is not a singleton. While there is only one instance active at any time,
the DNS plugin can be swapped (e.g. during SIGHUP). Hence, don't track the
process per-NMDnsDnsmasq instance, but in a global variable "gl_pid".
- Usually, when NetworkManager quits, it also stops the dnsmasq process.
Previously, we would always try to terminate the process based on the
pidfile. That is wrong. Most of the time, NetworkManager spawned the
process itself, as a child process. Hence, the PID is known and NetworkManager
will get a signal when dnsmasq exits. The only moment when NetworkManager should
use the pidfile, is the first time when checking to kill the previous instance.
That is: only once at the beginning, to kill instances that were
intentionally or unintentionally (crash) left running earlier.
This is now done by _gl_pid_kill_external().
- Previously, before starting a new dnsmasq instance we would kill a
possibly already running one, and block while waiting for the process to
disappear. We should never block. Especially, since we afterwards start
the process also in non-blocking way, there is no reason to kill the
existing process in a blocking way. For the most part, starting dnsmasq
is already asynchronous and so should be the killing of the dnsmasq
process.
- Drop GDBusProxy and only use GDBusConnection. It fully suffices.
- When we kill a dnsmasq instance, we actually don't have to wait at
all. That can happen fully in background. The only pecularity is that
when we restart a new instance before the previous instance is killed,
then we must wait for the previous process to terminate first. Also, if
we are about to exit while killing the dnsmasq instance, we must register
nm_shutdown_wait_obj_*() to wait until the process is fully gone.
We only have two real DNS plugins: "dnsmasq" and "systemd-resolved" (the "unbound"
plugin is very incomplete and should eventually be dropped).
Of these two, only "dnsmasq" spawns a child process. A lot of the logic
for that is in the parent class NMDnsPlugin, with the purpose for that
logic to be reusable.
However:
- We are unlikely to add more DNS plugins. Especially because
systemd-resolved seems the way forward.
- If we happen to add more plugins, then probably NetworkManager
should not spawn the process itself. That causes problems with
restarting the service. Rather, we should let the service manager
handle the lifetime of such "child" processes. Aside separating
the lifetime of the DNS plugin process from NetworkManager's,
this also would allow to sandbox NetworkManager and the DNS plugin
differently. Currently, NetworkManager itself may might need
capabilities only to pass them on to the DNS plugin, or (more likely)
NetworkManager would want to drop additional capabilities for the
DNS plugin (which we would rather not implement ourself, since that
seems job of the service management already).
- The current implementation is far from beautiful. For example,
it does synchronous (blocking) killing of the running process
from the PID file, and it uses PID fils. This is not something
we would want to reuse for other plugins. Also, note that
dnsmasq already spawns the service asynchronosly (of course).
Hence, we should also kill it asynchronously, but that is complicated
by having the logic separated in two different classes while
providing an abstract API between the two.
Move the code to NMDnsDnsmasq. This is the only place that cares about
this. Also, that makes it actually clearer what is happening, by seeing
the lifetime handling of the child proceess all in one place.
For logging, if the plugin fails with update, it should return a reason
that we can log.
Note that both dnsmasq and system-resolved plugins do the update asynchronously
(of course). Hence, usually they never fail right away, and there isn't really
possibility to handle the failure later. Still, we should print something sensible
for that we need information what went wrong.
The plugin name and whether a plugin is caching only depends on the type,
it does not require a virtual function where types would decided depending
on other reasons.
Convert the virtual functions into fields of the class.