If the child is respawning too fast, consider the plugin failed so
that upstream servers are written to resolv.conf until the plugin gets
restarted after the delay.
When the dnsmasq process dies, two events are generated:
(1) a NM_DNS_PLUGIN_FAILED signal in nm-dns-dnsmasq.c:name_owner_changed()
(2) a NM_DNS_PLUGIN_CHILD_QUIT signal in nm-dns-plugin.c:from watch_cb()
Event (1) is handled by updating resolv.conf with upstream servers,
(2) by restarting the child process.
The order in which the two signals are received is not deterministic,
so when (1) comes after (2) the manager leaves upstream servers in
resolv.conf even if a dnsmasq instance is running.
When dnsmasq disappears from D-Bus and we know that the process is not
running, we should not emit a FAILED signal because the disappearing
is caused by the process termination, and that event is already
handled by the manager.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/105
If the user disabled systemd-resolved, two things seem apparent:
- the user does not want us to use systemd-resolved
- NetworkManager is not pushing the DNS configuration to
systemd-resoved.
It seems to me, we should not consult systemd-resolved in that case.
Each time when enabling/disabling "systemd-resolved" in combination with another
plugin (which is unchanged), another pair of signal handlers was
connected. That's wrong.
Fixes: d4eb4cb45f
Even when the system resolver is configured to something else that
systemd-resolved, it still is a good idea to keep systemd-resolved up to
date. If not anything else, it does a good job at doing per-interface
resolving for connectivity checks.
If for whatever reasons don't want NetworkManager to push the DNS data
it discovers to systemd-resolved, the functionality can be disabled
with:
[main]
systemd-resolved=false
g_string_new_len() allocates the buffer with length
bytes. Maybe it should be obvious (wasn't to me), but
if a init argument is given, that is taken as containing
length bytes.
So,
str = g_string_new_len (init, len);
is more like
str = g_string_new_len (NULL, len);
g_string_append_len (str, init, len);
and not (how I wrongly thought)
str = g_string_new_len (NULL, len);
g_string_append (str, init);
Fixes: 95b006c244
Previously, if "main.rc-manager" was set to "unmanaged"
and "/etc/resolv.conf" was symlink to our internal file
"/var/run/NetworkManager/resolv.conf", NM would not rewrite
the file, in an attempt to honor the requirement of NetworkManager
not changing resolv.conf.
No longer special case this. I think it was wrong and inconsistent.
If the user specifies rc-manager unmanaged, he also should manage
/etc/resolv.conf accordingly. And if the user decided to symlink
it to our internal file, that is fine. It should not stop NM from
updating that file.
Also, this was the only cases, where NM would not write our internal
resolv.conf (errors aside). It was inconsitent, and also not documented
behavior. Instead, it is documented that `man NetworkManager.conf`:
Regardless of this setting, NetworkManager will always write
resolv.conf to its runtime state directory
/var/run/NetworkManager/resolv.conf.
When a DNS plugin is enabled (like "main.dns=dnsmasq" or "main.dns=systemd-resolved"),
the name servers announced to the rc-manager are coerced to be 127.0.0.1
or 127.0.0.53.
Depending on the "main.rc-manager" setting, also "/etc/resolv.conf"
contains only this coerced name server to the local caching service.
The same is true for "/var/run/NetworkManager/resolv.conf" file, which
contains what we would write to "/etc/resolv.conf" (depending on
the "main.rc-manager" configuration).
Write a new file "/var/run/NetworkManager/no-stub-resolv.conf", which contains
the original name servers, uncoerced. Like "/var/run/NetworkManager/resolv.conf",
this file is always written.
The effect is, when one enables "main.dns=systemd-resolved", then there
is still a file "no-stub-resolv.conf" with the same content as with
"main.dns=default".
The no-stub-resolv.conf may be a possible solution, when a user wants
NetworkManager to update systemd-resolved, but still have a regular
/etc/resolv.conf [1]. For that, the user could configure
[main]
dns=systemd-resolved
rc-manager=unmanaged
and symlink "/etc/resolv.conf" to "/var/run/NetworkManager/no-stub-resolv.conf".
This is not necessarily the only solution for the problem and does not preclude
options for updating systemd-resolved in combination with other DNS plugins.
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/20
Make them just ask for connections from GDBus, as other D-Bus clients
do. GDBus anyway reuses the connection if it has one, but allows us to
deal with errors in a more civilized manner.
The DNS manager reacts to NM_DEVICE_IP4_CONFIG_CHANGED events, which
are generated when there is a relevant change to an IP4 configuration.
Until now, changes to the mdns,llmnr properties were not
considered relevant (and neither minor, this is already a bug).
Promote them to relevant so that the DNS manager is notified and will
rewrite the DNS configuration when one of this properties changes.
Note that the DNS priority should be considered relevant and added
into the checksum as well, but is a problem right now because in the
DNS manager we rely on the fact that an empty configuration (i.e. just
created) has a zero checksum. This is needed to avoid rewriting
resolv.conf when there is no change. The DNS priority initial value
depends on the connection type (VPN or not), so it's a bit difficult
to add it to checksum maintaining the assumption of checksum(empty)=0.
This should be improved in the future.
We commonly don't use the glib typedefs for char/short/int/long,
but their C types directly.
$ git grep '\<g\(char\|short\|int\|long\|float\|double\)\>' | wc -l
587
$ git grep '\<\(char\|short\|int\|long\|float\|double\)\>' | wc -l
21114
One could argue that using the glib typedefs is preferable in
public API (of our glib based libnm library) or where it clearly
is related to glib, like during
g_object_set (obj, PROPERTY, (gint) value, NULL);
However, that argument does not seem strong, because in practice we don't
follow that argument today, and seldomly use the glib typedefs.
Also, the style guide for this would be hard to formalize, because
"using them where clearly related to a glib" is a very loose suggestion.
Also note that glib typedefs will always just be typedefs of the
underlying C types. There is no danger of glib changing the meaning
of these typedefs (because that would be a major API break of glib).
A simple style guide is instead: don't use these typedefs.
No manual actions, I only ran the bash script:
FILES=($(git ls-files '*.[hc]'))
sed -i \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>\( [^ ]\)/\1\2/g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\> /\1 /g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>/\1/g' \
"${FILES[@]}"
With "main.rc-manager=file", if /etc/resolv.conf is a symlink, NetworkManager
would follow the symlink and update the file instead.
However, note that realpath() only returns a target, if the file actually
exists. That means, if /etc/resolv.conf is a dangling symlink, NetworkManager
would replace the symlink with a file.
This was the only case in which NetworkManager would every change a symlink
resolv.conf to a file. I think this is undesired behavior.
This is a change in long established behavior. Although note that there were several
changes regarding rc-manager settings in the past. See for example commit [1] and [2].
Now, first still try using realpath() as before. Only if that fails, try
to resolve /etc/resolv.conf as a symlink with readlink().
Following the dangling symlink is likely not a problem for the user, it
probably is even desired. The part that most likely can cause problems
is if the destination file is not writable. That happens for example, if
the destination's parent directories are missing. In this case, NetworkManager
will now fail to write resolv.conf and log a warning. This has the potential of
breaking existing setups, but it really is a mis-configuration from the user's
side.
This fixes for example the problem, if the user configures
/etc/resolv.conf as symlink to /tmp/my-resolv.conf. At boot, the file
would not exist, and NetworkManager would previously always replace the
link with a plain file. Instead, it should follow the symlink and create
the file.
[1] 718fd22436
[2] 15177a34behttps://github.com/NetworkManager/NetworkManager/pull/127
It's not clear whether this was desired behavior. However, it was
behavior for a long time, so we probably should not change it.
Just document what happens with dangling symlinks.
Do some preprocessing on the DNS configuration sent to plugins:
- add the '~' default routing (lookup) domain to IP configurations
with the default route or, when there is none, to all non-VPN
IP configurations
- use the dns-priority to decide which connection to use in case
multiple connections have the same domain
- consider a negative dns-priority value as a way to 'shadow' all
subdomains from other connections
- compute reverse DNS domains
and add the resulting domain list to NMDnsIPConfigData so that
split-DNS plugins can use that directly instead of reimplementing the
same logic themselves.
- consistently set options, searches, domains fields to %NULL,
if there are no values.
- in nm_global_dns_config_update_checksum(), ensure that we uniquely
hash values. E.g. a config with "searches[a], options=[b]" should
hash differently from "searches=[ab], options=[]".
- in nm_global_dns_config_to_dbus(), reuse the sorted domain list.
We already have it, and it guarantees a consistent ordering of
fields.
- in global_dns_domain_from_dbus(), fix memleaks if D-Bus strdict
contains duplicate entries.
I dislike the static hash table to cache the integer counter for
numbered paths. Let's instead cache the counter at the class instance
itself -- since the class contains the information how the export
path should be exported.
However, we cannot use a plain integer field inside the class structure,
because the class is copied between derived classes. For example,
NMDeviceEthernet and NMDeviceBridge both get a copy of the NMDeviceClass
instance. Hence, the class doesn't contain the counter directly, but
a pointer to one counter that can be shared between sibling classes.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The next commit will completely rework NMBusManager and replace
NMExportedObject by a new type NMDBusObject.
Originally, NMDBusObject was added along NMExportedObject to ease
the rework and have compilable, intermediate stages of refactoring. Now,
I think the new name is better, because NMDBusObject is very strongly related
to the bus manager and the old name NMExportedObject didn't make that
clear.
I also slighly prefer the name NMDBusObject over NMBusObject, hence
for consistancy, also rename NMBusManager to NMDBusManager.
This commit only renames the file for a nicer diff in the next commit.
It does not actually update the type name in sources. That will be done
later.
Previously we always updated resolv.conf on quit. When we are using
systemd-resolved the update is not necessary because the resolver on
127.0.0.53 would still be reachable after NM quits. Also, when NM
manages resolv.conf directly there is no need to update the file
again. Let's rewrite resolv.conf only when using dnsmasq.
https://bugzilla.redhat.com/show_bug.cgi?id=1541031
Fixes the following error when building with gcc 4.8.5 and address
sanitizer:
src/dns/nm-dns-dnsmasq.c: In function 'update':
src/dns/nm-dns-dnsmasq.c:506:44: error: 'first_prio' may be used uninitialized in this function [-Werror=maybe-uninitialized]
} else if (first_prio < 0 && first_prio != prio)
^
Similarly to what systemd-resolved does, introduce the concept of
"routing" domain, which is a domain in the search list that is used
only to decide the interface over which a query must be forwarded, but
is not used to complete unqualified host names. Routing domains are
those starting with a tilde ('~') before the actual domain name.
Domains without the initial tilde are used both for completing
unqualified names and for the routing decision.
The "domain" key of the D-Bus configuration dictionary specifies the
domains a configuration applies to. In DNS code we consider domains
and searches as equivalent, so they should be exported via D-Bus using
the same logic used to populate resolv.conf and for plugins.