Several points.
- We spawn the dnsmasq process directly. That has several downsides:
- The lifetime of the process is tied to NetworkManager's. When
stopping NetworkManager, we usually also stop dnsmasq. Or we keep
the process running, but later the process is no longer a child process
of NetworkManager and we can only kill it using the pidfile.
- We don't do special sandboxing of the dnsmasq process.
- Note that we want to ensure that only one dnsmasq process is running
at any time. We should track that in a singletone. Note that NMDnsDnsmasq
is not a singleton. While there is only one instance active at any time,
the DNS plugin can be swapped (e.g. during SIGHUP). Hence, don't track the
process per-NMDnsDnsmasq instance, but in a global variable "gl_pid".
- Usually, when NetworkManager quits, it also stops the dnsmasq process.
Previously, we would always try to terminate the process based on the
pidfile. That is wrong. Most of the time, NetworkManager spawned the
process itself, as a child process. Hence, the PID is known and NetworkManager
will get a signal when dnsmasq exits. The only moment when NetworkManager should
use the pidfile, is the first time when checking to kill the previous instance.
That is: only once at the beginning, to kill instances that were
intentionally or unintentionally (crash) left running earlier.
This is now done by _gl_pid_kill_external().
- Previously, before starting a new dnsmasq instance we would kill a
possibly already running one, and block while waiting for the process to
disappear. We should never block. Especially, since we afterwards start
the process also in non-blocking way, there is no reason to kill the
existing process in a blocking way. For the most part, starting dnsmasq
is already asynchronous and so should be the killing of the dnsmasq
process.
- Drop GDBusProxy and only use GDBusConnection. It fully suffices.
- When we kill a dnsmasq instance, we actually don't have to wait at
all. That can happen fully in background. The only pecularity is that
when we restart a new instance before the previous instance is killed,
then we must wait for the previous process to terminate first. Also, if
we are about to exit while killing the dnsmasq instance, we must register
nm_shutdown_wait_obj_*() to wait until the process is fully gone.
... and nm_utils_fd_get_contents() and nm_utils_file_set_contents().
Don't mix negative errno return value with a GError output. Instead,
return a boolean result indicating success or failure.
Also, optionally
- output GError
- set out_errsv to the positive errno (or 0 on success)
Obviously, the return value and the output arguments (contents, length,
out_errsv, error) must all agree in their success/failure result.
That means, you may check any of the return value, out_errsv, error, and
contents to reliably detect failure or success.
Also note that out_errsv gives the positive(!) errno. But you probably
shouldn't care about the distinction and use nm_errno_native() either
way to normalize the value.
nm_utils_file_set_contents() is a re-implementation of g_file_set_contents(),
as such it returned merely a boolean success value.
It's sometimes interesting to get the native error code. Let the function
deviate from glib's original g_file_set_contents() and return the error code
(as negative value) instead.
This requires all callers to change. Also, it's potentially a dangerous
change, as this is easy to miss.
Note that nm_utils_file_get_contents() also returns an errno, and
already deviates from g_file_get_contents() in the same way. This patch
resolves at least the inconsistency with nm_utils_file_get_contents().
The define is better, because then we can grep for all the occurances
where they are used. The plain text like "mac:" is not at all unique in
our source-tree.
We usually want to combine the fields from "struct timespec" to
have one timestamp in either nanoseconds or milliseconds.
Use nm_utils_clock_gettime_*() util for that.
The settings plugins are created by NMSettings when the plugin
gets loaded. There is no need for these instances to be singletons
or to have a singleton getter.
Also, while in practice we create a settings plugin instance of
each type only once, there is nothing that would prevent creating
multiple instances. Hence, having a singleton getter is not right.
What is however useful, is to track them and block shutdown
via nm_shutdown_wait_obj_register*(). While the actual waiting
is not yet implemented, we should mark the plugin instances to
block shutdown (in the future).
We no longer add these. If you use Emacs, configure it yourself.
Also, due to our "smart-tab" usage the editor anyway does a subpar
job handling our tabs. However, on the upside every user can choose
whatever tab-width he/she prefers. If "smart-tabs" are used properly
(like we do), every tab-width will work.
No manual changes, just ran commands:
F=($(git grep -l -e '-\*-'))
sed '1 { /\/\* *-\*- *[mM]ode.*\*\/$/d }' -i "${F[@]}"
sed '1,4 { /^\(#\|--\|dnl\) *-\*- [mM]ode/d }' -i "${F[@]}"
Check remaining lines with:
git grep -e '-\*-'
The ultimate purpose of this is to cleanup our files and eventually use
SPDX license identifiers. For that, first get rid of the boilerplate lines.
Always ensure that the entire buffer is initialized with padding NULs.
For example, valgrind checks whether we access uninitalized memory,
so leaving this uninitalized can be unexpected and cause valgrind
failures. In general, one might be tempted to copy the ifname buffer (of
well known size IFNAMSIZ) with memcpy(). In that case, we should not
have trailing garbage there.
We could use strncpy() for that (which guarantees NUL padding), but
then we still would have to ensure NUL termination. But strncpy() is
frowned upon, so let's not use it here.
Note that g_strlcpy() does not guarantee NUL padding, so it's
unsuitable.
We could also implement this with a combination of memcpy() and
memset(). But in this case, it just seems simpler to iterate over the
16 bytes and do it manually.
Ooherwise, the file has wrong permissions:
# ls -la /var/lib/NetworkManager/secret_key
----r-xr-x. 1 root root 50 May 14 13:52 /var/lib/NetworkManager/secret_key
Luckily, /var/lib/NetworkManager should be already
# ls -lad /var/lib/NetworkManager
drwx------. 2 root root 8192 May 14 13:57 /var/lib/NetworkManager
which mitigates this a bit.
Fixes: dbcb1d6d97 ('core: let nm_utils_secret_key_read() handle failures internally')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/175
From the files under "shared/nm-utils" we build an internal library
that provides glib-based helper utilities.
Move the files of that basic library to a new subdirectory
"shared/nm-glib-aux" and rename the helper library "libnm-core-base.la"
to "libnm-glib-aux.la".
Reasons:
- the name "utils" is overused in our code-base. Everything's an
"utils". Give this thing a more distinct name.
- there were additional files under "shared/nm-utils", which are not
part of this internal library "libnm-utils-base.la". All the files
that are part of this library should be together in the same
directory, but files that are not, should not be there.
- the new name should better convey what this library is and what is isn't:
it's a set of utilities and helper functions that extend glib with
funcitonality that we commonly need.
There are still some files left under "shared/nm-utils". They have less
a unifying propose to be in their own directory, so I leave them there
for now. But at least they are separate from "shared/nm-glib-aux",
which has a very clear purpose.
(cherry picked from commit 80db06f768)
"shared/nm-utils" contains general purpose utility functions that only
depend on glib (and extend glib with some helper functions).
We will also add code that does not use glib, hence it would be good
if the part of "shared/nm-utils" that does not depend on glib, could be
used by these future projects.
Also, we use the term "utils" everywhere. While that covers the purpose
and content well, having everything called "nm-something-utils" is not
great. Instead, call this "nm-std-aux", inspired by "c-util/c-stdaux".
(cherry picked from commit b434b9ec07)
libnm exposes simplified variants of hexstr2bin in its public API. I
think that was a mistake, because libnm should provide NetworkManager
specific utils. It should not provide such string functions.
However, nmcli used to need this, so it was added to libnm.
The better approach is to add it to our internally shared static
library, so that all interested components can make use of it.
For static functions inside a module, the compiler determines on its own
whether to inline the function.
Also, "inline" was used at some places that don't immediatly look like
candidates for inlining. It was most likely a copy&paste error.
_log_connection_get_property() is a hack, as it cannot meaningfully print complex
properties. Also, it uses _nm_setting_get_property() which can only work with GObject
base properties.
Don't assert against _nm_setting_get_property() returning success. Eventually
we should replace _nm_setting_get_property() by something better. But for the moment,
it's fine to being unable to print a property value.
While nm_utils_inet*_ntop() accepts a %NULL buffer to fallback
to a static buffer, don't do that.
I find the possibility of using a static buffer here error prone
and something that should be avoided. There is of course the downside,
that in some cases it requires an additional line of code to allocate
the buffer on the stack as auto-variable.
The timestamp of the host-id is the timestamp of the secret_key file.
Under normal circumstances, reading the timestamp should never fail,
and reading it multiple times should always yield the same result.
If we unexpectedly fail to read the timestamp from the file we want:
- log a warning, so that the user can find out what's wrong. But
do so only once.
- we don't want to handle errors or fail operation due to a missing
timestamp. Remember, it's not supposed to ever fail, and if it does,
just log a warning and proceed with a fake timestamp instead. In
that case something is wrong, but using a non-stable, fake timestamp
is the least of the problems here.
We already have a stable identifier (the host-id) which we can use to
generate a fake timestamp. Use it.
In case the user would replace the secret_key file, we also don't want
that accessing nm_utils_host_id_get_timestamp*() yields different
results. It's not implemented (nor necessary) to support reloading a
different timestamp. Hence, nm_utils_host_id_get_timestamp() should
memoize the value and ensure that it never changes.
Now that the secret-key is hashed with the machine-id, the name is
no longer best.
Sure, part of the key are persisted in /var/lib/NetworkManager/secret_key
file, which the user is well advised to keep secret.
But what nm_utils_secret_key_get() returns is first and foremost a binary
key that is per-host and used for hashing a per-host component. It's
really the "host-id". Compare that to what we also have, the
"machine-id" and the "boot-id".
Rename.
NetworkManager loads (and generates) a secret key as
"/var/lib/NetworkManager/secret_key".
The secret key is used for seeding a per-host component when generating
hashed, stable data. For example, it contributes to "ipv4.dhcp-client-id=duid"
"ipv6.addr-gen-mode=stable-privacy", "ethernet.cloned-mac-address=stable", etc.
As such, it corresponds to the identity of the host.
Also "/etc/machine-id" is the host's identity. When cloning a virtual machine,
it may be a good idea to generate a new "/etc/machine-id", at least in those
cases where the VM's identity shall be different. Systemd provides various
mechanisms for doing that, like accepting a new machine-id via kernel command line.
For the same reason, the user should also regenerate a new NetworkManager's
secrey key when the host's identity shall change. However, that is less obvious,
less understood and less documented.
Support and use a new variant of secret key. This secret key is combined
with "/etc/machine-id" by sha256 hashing it together. That means, when the
user generates a new machine-id, NetworkManager's per-host key also changes.
Since we don't want to change behavior for existing installations, we
only do this when generating a new secret key file. For that, we encode
a version tag inside the "/var/lib/NetworkManager/secret_key" file.
Note that this is all abstracted by nm_utils_secret_key_get(). For
version 2 secret-keys, it internally combines the secret_key file with
machine-id (via sha256). The advantage is that callers don't care that
the secret-key now also contains the machine-id. Also, since we want to
stick to the previous behavior if we have an old secret-key, this is
nicely abstracted. Otherwise, the caller would not only need to handle
two per-host parts, but it would also need to check the version to
determine whether the machine-id should be explicitly included.
At this point, nm_utils_secret_key_get() should be renamed to
nm_utils_host_key_get().
g_file_get_contents() fails with G_FILE_ERROR, G_FILE_ERROR_NOENT when the
file does not exist.
That wasn't obvious to me, nm_utils_error_is_notfound() to the rescue.
Fixes: dbcb1d6d97
If the spec specifies only negative matches (and none of them matches),
then the result shall be positive.
Meaning:
[connection*] match-device=except:dhcp-plugin:dhclient
[connection*] match-device=except:interface-name:eth0
[.config] enabled=except:nm-version:1.14
should be the same as:
[connection*] match-device=*,except:dhcp-plugin:dhclient
[connection*] match-device=*,except:interface-name:eth0
[.config] enabled=*,except:nm-version:1.14
and match by default. Previously, such specs would never yield a
positive match, which seems wrong.
Note that "except:" already has a special meaning. It is not merely
"not:". That is because we don't support "and:" nor grouping, but all
matches are combined by an implicit "or:". With such a meaning, having
a "not:" would be unclear to define. Instead it is defined that any
"except:" match always wins and makes the entire condition to explicitly
not match. As such, it makes sense to treat a match that only consists
of "except:" matches special.
This is a change in behavior, but the alternative meaning makes
little sense.
We already had "${DEVICE}" which uses the interface name.
In times of predictable interface naming, that works well.
It allows the user to generate IDs per device which don't
change when the hardware is replaced.
"${MAC}" is similar, except that is uses the permanent MAC
address of the device. The substitution results in the empty
word, if the device has no permanent MAC address (like software
devices).
The per-device substitutions "${DEVICE}" and "${MAC}" are especially
interesting with "connection.multi-connect=multiple".
Our internal DHCP client (from systemd) defaults to a particular client ID.
It is currently exposed as nm_sd_utils_generate_default_dhcp_client_id()
and is based on the systemd implementation.
One problem with that is, that it internally looks up the interface name
with if_indextoname() and reads /etc/machine-id. Both makes it harder
for testing.
Another problem is, that this way of generating the client-id is
currently limited to internal client. Why? If you use dhclient plugin,
you may still want to use the same algorithm. Also, there is no explict
"ipv4.dhcp-client-id" mode to select this client-id (so that it could
be used in combination with "dhclient" plugin).
As such, this code will be useful also aside systemd DHCP plugin.
Hence, the function should not be obviously tied to systemd code.
The implementation is simple enough, and since we already have a
unit-test, refactor the code to our own implementation.
For testing purpose, it's bad to let nm_utils_stable_id_parse()
directly access nm_utils_get_boot_id_str(). Instead, the function
should have no side-effects.
Since the boot-id is anyway cached, accessing it is cheap. Even
if it likely won't be needed.
Tests might access the secret-key.
For CI builds we may very well build NM as root and also run
unit tests. In such a situation it's bad to persist the secret
key. For example, the SELinux label may be wrong, and subsequently
starting NetworkManager may cause errors. Avoid persisting the secret
key for tests.
Previously, whenever we needed /etc/machine-id we would re-load it
from file. The are 3 downsides of that:
- the smallest downside is the runtime overhead of repeatedly
reading the file and parse it.
- as we read it multiple times, it may change anytime. Most
code in NetworkManager does not expect or handle a change of
the machine-id.
Generally, the admin should make sure that the machine-id is properly
initialized before NetworkManager starts, and not change it. As such,
a change of the machine-id should never happen in practice.
But if it would change, we would get odd behaviors. Note for example
how generate_duid_from_machine_id() already cached the generated DUID
and only read it once.
It's better to pick the machine-id once, and rely to use the same
one for the remainder of the program.
If the admin wants to change the machine-id, NetworkManager must be
restarted as well (in case the admin cares).
Also, as we now only load it once, it makes sense to log an error
(once) when we fail to read the machine-id.
- previously, loading the machine-id could fail each time. And we
have to somehow handle that error. It seems, the best thing what we
anyway can do, is to log an error once and continue with a fake
machine-id. Here we add a fake machine-id based on the secret-key
or the boot-id. Now obtaining a machine-id can no longer fail
and error handling is no longer necessary.
Also, ensure that a machine-id of all zeros is not valid.
Technically, a machine-id is not an RFC 4122 UUID. But it's
the same size, so we also use NMUuid data structure for it.
While at it, also refactor caching of the boot-id and the secret
key. In particular, fix the thread-safety of the double-checked
locking implementations.
- add a commnt about thread-safety.
- minor refactoring initializing the value in nm_utils_get_testing().
Instead of returning the flags we just set, go back to the begin
and re-read the value (which must be initialized by now). No big
difference, but feels a bit nicer to me.
The need for this is the following:
"ipv4.dhcp-client-id" can be specified via global connection defaults.
In absence of any configuration in NetworkManager, the default depends
on the DHCP client plugin. In case of "dhclient", the default further
depends on /etc/dhcp.
For "internal" plugin, we may very well want to change the default
client-id to "mac" by universally installing a configuration
snippet
[connection-use-mac-client-id]
ipv4.dhcp-client-id=mac
However, if we the user happens to enable "dhclient" plugin, this also
forces the client-id and overrules configuration from /etc/dhcp. The real
problem is, that dhclient can be configured via means outside of NetworkManager,
so our defaults shall not overwrite defaults from /etc/dhcp.
With the new device spec, we can avoid this issue:
[connection-dhcp-client-id]
match-device=except:dhcp-plugin:dhclient
ipv4.dhcp-client-id=mac
This will be part of the solution for rh#1640494. Note that merely
dropping a configuration snippet is not yet enough. More fixes for
DHCP will follow. Also, bug rh#1640494 may have alternative solutions
as well. The nice part of this new feature is that it is generally
useful for configuring connection defaults and not specifically for
the client-id issue.
Note that this match spec is per-device, although the plugin is selected
globally. That makes some sense, because in the future we may or may not
configure the DHCP plugin per-device or per address family.
https://bugzilla.redhat.com/show_bug.cgi?id=1640494
Add a helper function for the common check whether a file is
inside a path. Also, this function handles special cases like
repeated file separators. However, as it is still entirely text
based, it also cannot recognize if two (literally) different
paths reference the same inode/file.