The proxy does nothing for us, except overhead.
We can directly subscribe to "NameOwnerChanged" signals on the
GDBusConnection. Also, instead of asynchronously creating the
GDBusProxy, asynchronously call "GetNameOwner". That's what the
proxy does anyway.
GDBusConnection is actually a decent API. We don't need another layer on
top of that, for functionality that we don't use.
Also, don't use G_BUS_TYPE_SYSTEM, but use the GDBusConnection that
also the bus-manager uses. For all practical purposes, that is the
connection was want to use also in NMDnsSystemdResolved.
Every (failed) attempt to D-Bus activate a service results in log-messages
from dbus-daemon. It must be avoided to spam the logs that way.
Let connectivity check not only ask whether systemd-resolved is enabled
(and NetworkManager would like to push information there), but also
whether it looks like the service is actually available. That is,
either it has a name-owner or it's not blocked from starting.
The previous workaround was to configure main.systemd-resolved=no
in NetworkManager.conf. But that requires explict configuration.
Previously, we would create the D-Bus proxy without
%G_DBUS_PROXY_FLAGS_DO_NOT_AUTO_START_AT_CONSTRUCTION
flag.
That means, when systemd-resolved was not available or masked, the creation
of the D-Bus proxy would fail with
dns-sd-resolved[0x561905dc92d0]: failure to create D-Bus proxy for systemd-resolved: Error calling StartServiceByName for org.freedesktop.resolve1: GDBus.Error:org.freedesktop.systemd1.NoSuchUnit: Unit dbus-org.freedesktop.resolve1.service not found.
and never retried.
Now, when creating the D-Bus proxy don't autostart the instance.
Instead, each D-Bus call will try to poke and start the service.
There is a problem however: if systemd-resolved is not available, then
we must not constantly trying to start it, because it results in a slur
or syslog messages from dbus-daemon:
dbus-daemon[991]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service' requested by ':1.23' (uid=0 pid=1012 comm="/usr/bin/NetworkManager --no-daemon ")
dbus-daemon[991]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.resolve1.service': Unit dbus-org.freedesktop.resolve1.service not found.
dbus-daemon[991]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service' requested by ':1.23' (uid=0 pid=1012 comm="/usr/bin/NetworkManager --no-daemon ")
Avoid that by watching the name owner.
But, since systemd-resolved is D-Bus activated, watching the name owner
alone is not enough to know whether we should try to autostart the service.
Instead:
- if we have a name owner, assume the service runs and we send the update
- if we have no name owner, and we did not recently try to start
the service by name, poke it via "StartServiceByName". The idea
is, that in total we only try this once and remember a previous
attempt in priv->try_start_blocked.
- if we get a name-owner, priv->try_start_blocked gets reset.
Either it was us who started the service, or somebody else.
Either way, we are good to send updates again.
The nice thing is that we only try once to start resolved and only
generate one logging message from dbus-daemon about failure to do so.
But still, after blocking start on failure, when somebody else starts
resolved, we notice it and start using it again.
As we frequently send updates to systemd-resolved and for each update
send multiple messages, it can happen that we log a large number of
warnings if they all fail.
Rate limit the warnings to only warn once (until the failure is
recovered).
Currently, if systemd-resolved is not installed (or disabled) we already
fail once to create the D-Bus proxy (and never retry). That should be
fixed, to create the proxy with G_DBUS_PROXY_FLAGS_DO_NOT_AUTO_START_AT_CONSTRUCTION.
If we allow creating the proxy we would repeatedly try to send messages
and they would all fail. This is one example, where we need to ratelimit
the warning.
If the child is respawning too fast, consider the plugin failed so
that upstream servers are written to resolv.conf until the plugin gets
restarted after the delay.
When the dnsmasq process dies, two events are generated:
(1) a NM_DNS_PLUGIN_FAILED signal in nm-dns-dnsmasq.c:name_owner_changed()
(2) a NM_DNS_PLUGIN_CHILD_QUIT signal in nm-dns-plugin.c:from watch_cb()
Event (1) is handled by updating resolv.conf with upstream servers,
(2) by restarting the child process.
The order in which the two signals are received is not deterministic,
so when (1) comes after (2) the manager leaves upstream servers in
resolv.conf even if a dnsmasq instance is running.
When dnsmasq disappears from D-Bus and we know that the process is not
running, we should not emit a FAILED signal because the disappearing
is caused by the process termination, and that event is already
handled by the manager.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/105
If the user disabled systemd-resolved, two things seem apparent:
- the user does not want us to use systemd-resolved
- NetworkManager is not pushing the DNS configuration to
systemd-resoved.
It seems to me, we should not consult systemd-resolved in that case.
Each time when enabling/disabling "systemd-resolved" in combination with another
plugin (which is unchanged), another pair of signal handlers was
connected. That's wrong.
Fixes: d4eb4cb45f
Even when the system resolver is configured to something else that
systemd-resolved, it still is a good idea to keep systemd-resolved up to
date. If not anything else, it does a good job at doing per-interface
resolving for connectivity checks.
If for whatever reasons don't want NetworkManager to push the DNS data
it discovers to systemd-resolved, the functionality can be disabled
with:
[main]
systemd-resolved=false
g_string_new_len() allocates the buffer with length
bytes. Maybe it should be obvious (wasn't to me), but
if a init argument is given, that is taken as containing
length bytes.
So,
str = g_string_new_len (init, len);
is more like
str = g_string_new_len (NULL, len);
g_string_append_len (str, init, len);
and not (how I wrongly thought)
str = g_string_new_len (NULL, len);
g_string_append (str, init);
Fixes: 95b006c244
Previously, if "main.rc-manager" was set to "unmanaged"
and "/etc/resolv.conf" was symlink to our internal file
"/var/run/NetworkManager/resolv.conf", NM would not rewrite
the file, in an attempt to honor the requirement of NetworkManager
not changing resolv.conf.
No longer special case this. I think it was wrong and inconsistent.
If the user specifies rc-manager unmanaged, he also should manage
/etc/resolv.conf accordingly. And if the user decided to symlink
it to our internal file, that is fine. It should not stop NM from
updating that file.
Also, this was the only cases, where NM would not write our internal
resolv.conf (errors aside). It was inconsitent, and also not documented
behavior. Instead, it is documented that `man NetworkManager.conf`:
Regardless of this setting, NetworkManager will always write
resolv.conf to its runtime state directory
/var/run/NetworkManager/resolv.conf.
When a DNS plugin is enabled (like "main.dns=dnsmasq" or "main.dns=systemd-resolved"),
the name servers announced to the rc-manager are coerced to be 127.0.0.1
or 127.0.0.53.
Depending on the "main.rc-manager" setting, also "/etc/resolv.conf"
contains only this coerced name server to the local caching service.
The same is true for "/var/run/NetworkManager/resolv.conf" file, which
contains what we would write to "/etc/resolv.conf" (depending on
the "main.rc-manager" configuration).
Write a new file "/var/run/NetworkManager/no-stub-resolv.conf", which contains
the original name servers, uncoerced. Like "/var/run/NetworkManager/resolv.conf",
this file is always written.
The effect is, when one enables "main.dns=systemd-resolved", then there
is still a file "no-stub-resolv.conf" with the same content as with
"main.dns=default".
The no-stub-resolv.conf may be a possible solution, when a user wants
NetworkManager to update systemd-resolved, but still have a regular
/etc/resolv.conf [1]. For that, the user could configure
[main]
dns=systemd-resolved
rc-manager=unmanaged
and symlink "/etc/resolv.conf" to "/var/run/NetworkManager/no-stub-resolv.conf".
This is not necessarily the only solution for the problem and does not preclude
options for updating systemd-resolved in combination with other DNS plugins.
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/20
Make them just ask for connections from GDBus, as other D-Bus clients
do. GDBus anyway reuses the connection if it has one, but allows us to
deal with errors in a more civilized manner.
The DNS manager reacts to NM_DEVICE_IP4_CONFIG_CHANGED events, which
are generated when there is a relevant change to an IP4 configuration.
Until now, changes to the mdns,llmnr properties were not
considered relevant (and neither minor, this is already a bug).
Promote them to relevant so that the DNS manager is notified and will
rewrite the DNS configuration when one of this properties changes.
Note that the DNS priority should be considered relevant and added
into the checksum as well, but is a problem right now because in the
DNS manager we rely on the fact that an empty configuration (i.e. just
created) has a zero checksum. This is needed to avoid rewriting
resolv.conf when there is no change. The DNS priority initial value
depends on the connection type (VPN or not), so it's a bit difficult
to add it to checksum maintaining the assumption of checksum(empty)=0.
This should be improved in the future.
We commonly don't use the glib typedefs for char/short/int/long,
but their C types directly.
$ git grep '\<g\(char\|short\|int\|long\|float\|double\)\>' | wc -l
587
$ git grep '\<\(char\|short\|int\|long\|float\|double\)\>' | wc -l
21114
One could argue that using the glib typedefs is preferable in
public API (of our glib based libnm library) or where it clearly
is related to glib, like during
g_object_set (obj, PROPERTY, (gint) value, NULL);
However, that argument does not seem strong, because in practice we don't
follow that argument today, and seldomly use the glib typedefs.
Also, the style guide for this would be hard to formalize, because
"using them where clearly related to a glib" is a very loose suggestion.
Also note that glib typedefs will always just be typedefs of the
underlying C types. There is no danger of glib changing the meaning
of these typedefs (because that would be a major API break of glib).
A simple style guide is instead: don't use these typedefs.
No manual actions, I only ran the bash script:
FILES=($(git ls-files '*.[hc]'))
sed -i \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>\( [^ ]\)/\1\2/g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\> /\1 /g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>/\1/g' \
"${FILES[@]}"
With "main.rc-manager=file", if /etc/resolv.conf is a symlink, NetworkManager
would follow the symlink and update the file instead.
However, note that realpath() only returns a target, if the file actually
exists. That means, if /etc/resolv.conf is a dangling symlink, NetworkManager
would replace the symlink with a file.
This was the only case in which NetworkManager would every change a symlink
resolv.conf to a file. I think this is undesired behavior.
This is a change in long established behavior. Although note that there were several
changes regarding rc-manager settings in the past. See for example commit [1] and [2].
Now, first still try using realpath() as before. Only if that fails, try
to resolve /etc/resolv.conf as a symlink with readlink().
Following the dangling symlink is likely not a problem for the user, it
probably is even desired. The part that most likely can cause problems
is if the destination file is not writable. That happens for example, if
the destination's parent directories are missing. In this case, NetworkManager
will now fail to write resolv.conf and log a warning. This has the potential of
breaking existing setups, but it really is a mis-configuration from the user's
side.
This fixes for example the problem, if the user configures
/etc/resolv.conf as symlink to /tmp/my-resolv.conf. At boot, the file
would not exist, and NetworkManager would previously always replace the
link with a plain file. Instead, it should follow the symlink and create
the file.
[1] 718fd22436
[2] 15177a34behttps://github.com/NetworkManager/NetworkManager/pull/127
It's not clear whether this was desired behavior. However, it was
behavior for a long time, so we probably should not change it.
Just document what happens with dangling symlinks.
Do some preprocessing on the DNS configuration sent to plugins:
- add the '~' default routing (lookup) domain to IP configurations
with the default route or, when there is none, to all non-VPN
IP configurations
- use the dns-priority to decide which connection to use in case
multiple connections have the same domain
- consider a negative dns-priority value as a way to 'shadow' all
subdomains from other connections
- compute reverse DNS domains
and add the resulting domain list to NMDnsIPConfigData so that
split-DNS plugins can use that directly instead of reimplementing the
same logic themselves.
- consistently set options, searches, domains fields to %NULL,
if there are no values.
- in nm_global_dns_config_update_checksum(), ensure that we uniquely
hash values. E.g. a config with "searches[a], options=[b]" should
hash differently from "searches=[ab], options=[]".
- in nm_global_dns_config_to_dbus(), reuse the sorted domain list.
We already have it, and it guarantees a consistent ordering of
fields.
- in global_dns_domain_from_dbus(), fix memleaks if D-Bus strdict
contains duplicate entries.
I dislike the static hash table to cache the integer counter for
numbered paths. Let's instead cache the counter at the class instance
itself -- since the class contains the information how the export
path should be exported.
However, we cannot use a plain integer field inside the class structure,
because the class is copied between derived classes. For example,
NMDeviceEthernet and NMDeviceBridge both get a copy of the NMDeviceClass
instance. Hence, the class doesn't contain the counter directly, but
a pointer to one counter that can be shared between sibling classes.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS