Add support to the internal DHCP client for requesting a prefix and
distributing it to interfaces with 'shared' IPv6 mode.
The systemd-networkd API currently allows to request only a single
prefix and so there will be issues when the number of downstream
interfaces is greater than the number of /64 subnets available in the
returned prefix; but this is still an improvement over the previous
situation when no prefix was requested at all.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/247
Only reapply the IP configuration on link up if the IP state is CONF
or DONE. Previously we also reapplied it when the device was
disconnected (IP state NONE) and this could lead to a situation where
an incomplete config was applied; then we intersected the desired
configuration with the external - incomplete - one, causing the
removal of part of desired configuration (for example the default
route).
Fixes: d0b16b9283 ('device: unconditionally reapply IP configuration on link up')
https://bugzilla.redhat.com/show_bug.cgi?id=1754511https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/291
While nmi_cmdline_reader_parse() only has one caller, which indeed has the
argv parameter at hand and doesn't care it to be modified, I think it
is ugly.
Arguments preferably are strictly either input or output arguments,
with input arguments not being modified by the call.
- nm_setting_wired_add_s390_option() asserts that a "value" argument
is given. Check that the string contains a '=' where we can split.
- pass the requested NM_SETTING_WIRED_SETTING_NAME type to get_conn().
Otherwise, @s_wired might be %NULL, resulting in an assertion.
I do wonder whether this always retrieves a connection of the
appropriate type for modification, or whether a profile could
be returned that was created for a different purpose. But that
isn't changed.
- avoid "g_strcmp0 (nettype, "ctc") != 0". I find it unexpected, that we add the
3rd subchannel component, if the nettype is "ctc" (intuitively, I'd expect it
to be the opposite). The reasons for this are not documented, but I
presume it is correct.
Anyway, using streq() makes this slightly more clear to me, as with
strcmp() I would wonder whether this was just a typo while with
streq() I'd be more confident that this is indeed intended.
- don't initialize local variables unnecessarily. The compiler would
warn if we would forget about this. Also, don'\''t use { } for a
one-line block.
syntax: rd.znet=<nettype>,<subchannels>,<options>
The s390 specific options used to create the network interface in the kernel
are currently not processed by nm-initrd-generator causing incomplete ifcfg file.
fixes https://bugzilla.redhat.com/show_bug.cgi?id=1753975
Note that the only DNS plugin that actually emits the FAILED signal was
NMDnsDnsmasq. Let's not handle restart, retry and rate-limiting by
NMDnsManager but by NMDnsDnsmasq itself.
There are three goals here:
(1) we want that when dnsmasq (infrequently) crashes, that we always keep
retrying. A random crash should be automatically resolved and
eventually dnsmasq should be working again.
Note that we anyway cannot fully detect whether something is wrong.
OK, we detect crashes, but if dnsmasq just gets catatonic, it's just
as broken. Point being: our ability to detect non-working dnsmasq is limited.
(2) when dnsmasq keeps crashing all the time, then rate limit the retry.
Of course, at this point there is already something seriously wrong,
but we shouldn't kill the system by respawning the process without rate
limiting.
(3) previously, when NMDnsManager noticed that the pluging was broken
(and rate-limiting kicked in), it would temporarily disable the plugin.
Basically, that meant to write the real name servers to /etc/resolv.conf
directly, instead of setting localhost. This partly conflicts with
(1), because we want to retry and recover automatically. So what good
is it to notice a problem, resort to plain /etc/resolv.conf for a
short time, and then run into the issues again? If something is really
broken, there is no way but to involve the user to investigate and
fix the issue. Hence, we don't need to concern NMDnsManager with this either.
The only thing that the manager notices is when the dnsmasq binary is not
available. In that case, update() fails right away, and the manager falls back
to configure the name servers in /etc/resolv.conf directly.
Also, change the backoff time from 5 minutes to 1 minute (twice the
burst interval). There is not particularly strong reason for either
choice, I think that if the ratelimit kicks in, then something is
already so wrong that it doesn't matter either way. Anyway, also 60
seconds is long enough to not kill the machine otherwise.
Several points.
- We spawn the dnsmasq process directly. That has several downsides:
- The lifetime of the process is tied to NetworkManager's. When
stopping NetworkManager, we usually also stop dnsmasq. Or we keep
the process running, but later the process is no longer a child process
of NetworkManager and we can only kill it using the pidfile.
- We don't do special sandboxing of the dnsmasq process.
- Note that we want to ensure that only one dnsmasq process is running
at any time. We should track that in a singletone. Note that NMDnsDnsmasq
is not a singleton. While there is only one instance active at any time,
the DNS plugin can be swapped (e.g. during SIGHUP). Hence, don't track the
process per-NMDnsDnsmasq instance, but in a global variable "gl_pid".
- Usually, when NetworkManager quits, it also stops the dnsmasq process.
Previously, we would always try to terminate the process based on the
pidfile. That is wrong. Most of the time, NetworkManager spawned the
process itself, as a child process. Hence, the PID is known and NetworkManager
will get a signal when dnsmasq exits. The only moment when NetworkManager should
use the pidfile, is the first time when checking to kill the previous instance.
That is: only once at the beginning, to kill instances that were
intentionally or unintentionally (crash) left running earlier.
This is now done by _gl_pid_kill_external().
- Previously, before starting a new dnsmasq instance we would kill a
possibly already running one, and block while waiting for the process to
disappear. We should never block. Especially, since we afterwards start
the process also in non-blocking way, there is no reason to kill the
existing process in a blocking way. For the most part, starting dnsmasq
is already asynchronous and so should be the killing of the dnsmasq
process.
- Drop GDBusProxy and only use GDBusConnection. It fully suffices.
- When we kill a dnsmasq instance, we actually don't have to wait at
all. That can happen fully in background. The only pecularity is that
when we restart a new instance before the previous instance is killed,
then we must wait for the previous process to terminate first. Also, if
we are about to exit while killing the dnsmasq instance, we must register
nm_shutdown_wait_obj_*() to wait until the process is fully gone.
We only have two real DNS plugins: "dnsmasq" and "systemd-resolved" (the "unbound"
plugin is very incomplete and should eventually be dropped).
Of these two, only "dnsmasq" spawns a child process. A lot of the logic
for that is in the parent class NMDnsPlugin, with the purpose for that
logic to be reusable.
However:
- We are unlikely to add more DNS plugins. Especially because
systemd-resolved seems the way forward.
- If we happen to add more plugins, then probably NetworkManager
should not spawn the process itself. That causes problems with
restarting the service. Rather, we should let the service manager
handle the lifetime of such "child" processes. Aside separating
the lifetime of the DNS plugin process from NetworkManager's,
this also would allow to sandbox NetworkManager and the DNS plugin
differently. Currently, NetworkManager itself may might need
capabilities only to pass them on to the DNS plugin, or (more likely)
NetworkManager would want to drop additional capabilities for the
DNS plugin (which we would rather not implement ourself, since that
seems job of the service management already).
- The current implementation is far from beautiful. For example,
it does synchronous (blocking) killing of the running process
from the PID file, and it uses PID fils. This is not something
we would want to reuse for other plugins. Also, note that
dnsmasq already spawns the service asynchronosly (of course).
Hence, we should also kill it asynchronously, but that is complicated
by having the logic separated in two different classes while
providing an abstract API between the two.
Move the code to NMDnsDnsmasq. This is the only place that cares about
this. Also, that makes it actually clearer what is happening, by seeing
the lifetime handling of the child proceess all in one place.
For logging, if the plugin fails with update, it should return a reason
that we can log.
Note that both dnsmasq and system-resolved plugins do the update asynchronously
(of course). Hence, usually they never fail right away, and there isn't really
possibility to handle the failure later. Still, we should print something sensible
for that we need information what went wrong.
The plugin name and whether a plugin is caching only depends on the type,
it does not require a virtual function where types would decided depending
on other reasons.
Convert the virtual functions into fields of the class.
The previous to wait-types (NM_SHUTDOWN_WAIT_TYPE_OBJECT and
NM_SHUTDOWN_WAIT_TYPE_CANCELLABLE) both required a GObject/GCancellable,
and the shutdown was automatically unblocked when the object got
destroyed.
Add another wait type NM_SHUTDOWN_WAIT_TYPE_HANDLE, which does not take
an object to wait. Instead, shutdown is indefinitely blocked, until the
user unregisters the handle again. While other wait-types allow to
ignore the handle, this wait-type only makes sense if the user keeps
track of the handle.
nettools does not expose the original lease lifetime. It's a missing
API. Instead, it only exposes the timestamp when the lease will expire.
As a workaround, we calulate the timestamp by subtracting the current
timestamp from the expiration timestamp, assuming that the lease was
received just now. However, it was not received *exactly* now, but a
few milliseconds before. Hence, the calculated timestamp is not exact
here and likely a few milliseconds less then the actual (full integer)
value.
Account for that by rounding the value to the second.
have_connection_for_device() really should just call nm_device_check_connection_compatible().
Note that nm_device_check_connection_compatible() of course checks the
connection type already, so this is redundant.
This check is only useful for devices that implement new_default_connection.
We can shortcut the possibly expensive checks like have_connection_for_device(),
which need to iterate all profiles.
If a profile has only "ethernet.mac-address" set, but
"connection.interface-name" not, then the previous check
iface = nm_setting_connection_get_interface_name (s_con);
if (!nm_streq0 (iface, nm_device_get_iface (device)))
continue;
would wrongly consider the profile not matching for the device.
As a result, we would wrongly create a auto-default connection.
Fix that. We already call nm_device_check_connection_compatible()
above. That is fully suitable to compare the interface name and
the MAC address. We don't need to duplicate this check (wrongly).
See also commit 77d01c9094 ('settings: ignore incompatible connections
when looking for existing ones') for how this code changed.
https://bugzilla.redhat.com/show_bug.cgi?id=1727909
This is a complete refactoring of the bluetooth code.
Now that BlueZ 4 support was dropped, the separation of NMBluezManager
and NMBluez5Manager makes no sense. They should be merged.
At that point, notice that BlueZ 5's D-Bus API is fully centered around
D-Bus's ObjectManager interface. Using that interface, we basically only
call GetManagedObjects() once and register to InterfacesAdded,
InterfacesRemoved and PropertiesChanged signals. There is no need to
fetch individual properties ever.
Note how NMBluezDevice used to query the D-Bus properties itself by
creating a GDBusProxy. This is redundant, because when using the ObjectManager
interfaces, we have all information already.
Instead, let NMBluezManager basically become the client-side cache of
all of BlueZ's ObjectManager interface. NMBluezDevice was mostly concerned
about caching the D-Bus interface's state, tracking suitable profiles
(pan_connection), and moderate between bluez and NMDeviceBt.
These tasks don't get simpler by moving them to a seprate file. Let them
also be handled by NMBluezManager.
I mean, just look how it was previously: NMBluez5Manager registers to
ObjectManager interface and sees a device appearing. It creates a
NMBluezDevice object and registers to its "initialized" and
"notify:usable" signal. In the meantime, NMBluezDevice fetches the
relevant information from D-Bus (although it was already present in the
data provided by the ObjectManager) and eventually emits these usable
and initialized signals.
Then, NMBlue5Manager emits a "bdaddr-added" signal, for which NMBluezManager
creates the NMDeviceBt instance. NMBluezManager, NMBluez5Manager and
NMBluezDevice are strongly cooperating to the point that it is simpler
to merge them.
This is not mere refactoring. This patch aims to make everything
asynchronously and always cancellable. Also, it aims to fix races
and inconsistencies of the state.
- Registering to a NAP server now waits for the response and delays
activation of the NMDeviceBridge accordingly.
- For NAP connections we now watch the bnep0 interface in platform, and tear
down the device when it goes away. Bluez doesn't send us a notification
on D-Bus in that case.
- Rework establishing a DUN connection. It no longer uses blocking
connect() and does not block until rfcomm device appears. It's
all async now. It also watches the rfcomm file descriptor for
POLLERR/POLLHUP to notice disconnect.
- drop nm_device_factory_emit_component_added() and instead let
NMDeviceBt directly register to the WWan factory's "added" signal.
The previous function arguments of nm_modem_act_stage2_config() act as if the
function could fail or even postpone the action. It never did.
We cannot treat this generic. A caller needs to know whether nm_modem_act_stage2_config()
can postpone the action, and when it does, which signal is emitted upon completion. That
is, the caller needs to know how to proceed after postponing.
In other words, since this function currently cannot fail or postpone
the stage, so must all callers already rely on that. At this point it makes
no sense to pretend that the function could be any different, if all callers
assume it is not. Simplify the API.
Currently, we cannot ask which modems exist. NMDeviceBt may claim it
via nm_device_factory_emit_component_added(), and NMWWanFactory may
take it by listening to NM_MODEM_MANAGER_MODEM_ADDED. But that's it.
We will drop nm_device_factory_emit_component_added() because it's only
used for passing modems to NMDeviceBt. Instead, NMDeviceBt can directly
subscribe to NM_MODEM_MANAGER_MODEM_ADDED. It already has a reference
to NMModemManager.
Anyway, the NM_MODEM_MANAGER_MODEM_ADDED signal is no enough, because
sometimes when the mode appears, NMDeviceBt might not yet know whether
it should take it (because the DUN connect call is not yet complete).
Currently that never happens because dun_connect() blocks waiting for
the device. That must be fixed, by not waiting. But this opens up a
race, and NMDeviceBt might after NM_MODEM_MANAGER_MODEM_ADDED need to
search for the suitable modem: by iterating the list of all modems.
NMModem-s are either used by NMDeviceModem or by NMDeviceBt.
The mechanism how that is coordinated it odd:
- the factory emits component-added, and then NMDeviceBt
might take the device (and claim it). In that case, component-added
would return TRUE to indicate that the modem should not be also
used by NMDeviceModem.
- next, if the modem has a driver that looks like bluetooth, NMDeviceModem
ignores it too.
- finally, NMDeviceModem claims the modem (which is now considered to
be non-bluetooth).
I think the first problem is that the device factory tries to have this
generic mechanism of "component-added". It's literally only used to
cover this special case. Note that NMDeviceBt is aware of modems. So,
abstracting this just adds lots of code that could be solved better
by handling the case (of giving the modem to either NMDeviceBt or
NMDeviceModem) specifically.
NMWWanFactory itself registers to the NM_MODEM_MANAGER_MODEM_ADDED
signal and emits nm_device_factory_emit_component_added().
We could just have NMWWanFactory and NMDeviceBt both register to
that signal. Signals even support priorities, so we could have
NMDeviceBt be called first to claim the device.
Anyway, as the modem can only have one owner, the modem should have
a flag that indicates whether it's claimed or not. That will allow
multiple components all look at the same modem and moderate who is
going to take ownership.
Now nm_shutdown_wait_obj_*() supports two styles:
- NM_SHUTDOWN_WAIT_TYPE_OBJECT: this just registers a weak pointer
on a source GObject. As long as the object is not destroyed
(and the object is not unregistered), the shutdown gets blocked.
- now new is NM_SHUTDOWN_WAIT_TYPE_CANCELLABLE: this source object
is a GCancellable, and during shutdown, the system will cancel
the instances to notify about the shutdown. That aside, the GCancellable
is tracked exactly like a regular NM_SHUTDOWN_WAIT_TYPE_OBJECT (meaning:
a weak pointer is registered and shutdown gets delayed as long as the instance
lives).
As the rest of the shutdown, it's not yet implemented on the shutdown-side.
What is now possible is to register such cancellables, so that users can make
use of this API before we fix shutdown. We cannot fix it all at the same time,
so first users must be ready for this approach.
If DHCPv4 fails but IPv6 succeeds it makes sense to continue trying
DHCP so that we will eventually be able to get an address if the DHCP
server comes back. Always keep the client running; it will be only
terminated when the connection is brought down.
https://bugzilla.redhat.com/show_bug.cgi?id=1688329
In the accept() callback, the nettools client creates a UDP socket
with the received address as source, so the address must be already
configured on the interface.
Also, handle errors returned by nm_dhcp_client_accept().
Fixes: 401fee7c20 ('dhcp: support notifying the client of the result of DAD')
FT-SAE is missing in the supplicant configuration verification list,
causing an activation failure when using SAE and the supplicant
supports FT.
Fixes: d17a0a0905 ('supplicant: allow fast transition for WPA-PSK and WPA-EAP')
Drop it from the functions for extracting the dhcp options from the
lease: it was just used for the logging, but now we log all the options
once, at the end of the process.