This is a complete refactoring of the bluetooth code.
Now that BlueZ 4 support was dropped, the separation of NMBluezManager
and NMBluez5Manager makes no sense. They should be merged.
At that point, notice that BlueZ 5's D-Bus API is fully centered around
D-Bus's ObjectManager interface. Using that interface, we basically only
call GetManagedObjects() once and register to InterfacesAdded,
InterfacesRemoved and PropertiesChanged signals. There is no need to
fetch individual properties ever.
Note how NMBluezDevice used to query the D-Bus properties itself by
creating a GDBusProxy. This is redundant, because when using the ObjectManager
interfaces, we have all information already.
Instead, let NMBluezManager basically become the client-side cache of
all of BlueZ's ObjectManager interface. NMBluezDevice was mostly concerned
about caching the D-Bus interface's state, tracking suitable profiles
(pan_connection), and moderate between bluez and NMDeviceBt.
These tasks don't get simpler by moving them to a seprate file. Let them
also be handled by NMBluezManager.
I mean, just look how it was previously: NMBluez5Manager registers to
ObjectManager interface and sees a device appearing. It creates a
NMBluezDevice object and registers to its "initialized" and
"notify:usable" signal. In the meantime, NMBluezDevice fetches the
relevant information from D-Bus (although it was already present in the
data provided by the ObjectManager) and eventually emits these usable
and initialized signals.
Then, NMBlue5Manager emits a "bdaddr-added" signal, for which NMBluezManager
creates the NMDeviceBt instance. NMBluezManager, NMBluez5Manager and
NMBluezDevice are strongly cooperating to the point that it is simpler
to merge them.
This is not mere refactoring. This patch aims to make everything
asynchronously and always cancellable. Also, it aims to fix races
and inconsistencies of the state.
- Registering to a NAP server now waits for the response and delays
activation of the NMDeviceBridge accordingly.
- For NAP connections we now watch the bnep0 interface in platform, and tear
down the device when it goes away. Bluez doesn't send us a notification
on D-Bus in that case.
- Rework establishing a DUN connection. It no longer uses blocking
connect() and does not block until rfcomm device appears. It's
all async now. It also watches the rfcomm file descriptor for
POLLERR/POLLHUP to notice disconnect.
- drop nm_device_factory_emit_component_added() and instead let
NMDeviceBt directly register to the WWan factory's "added" signal.
The previous function arguments of nm_modem_act_stage2_config() act as if the
function could fail or even postpone the action. It never did.
We cannot treat this generic. A caller needs to know whether nm_modem_act_stage2_config()
can postpone the action, and when it does, which signal is emitted upon completion. That
is, the caller needs to know how to proceed after postponing.
In other words, since this function currently cannot fail or postpone
the stage, so must all callers already rely on that. At this point it makes
no sense to pretend that the function could be any different, if all callers
assume it is not. Simplify the API.
Currently, we cannot ask which modems exist. NMDeviceBt may claim it
via nm_device_factory_emit_component_added(), and NMWWanFactory may
take it by listening to NM_MODEM_MANAGER_MODEM_ADDED. But that's it.
We will drop nm_device_factory_emit_component_added() because it's only
used for passing modems to NMDeviceBt. Instead, NMDeviceBt can directly
subscribe to NM_MODEM_MANAGER_MODEM_ADDED. It already has a reference
to NMModemManager.
Anyway, the NM_MODEM_MANAGER_MODEM_ADDED signal is no enough, because
sometimes when the mode appears, NMDeviceBt might not yet know whether
it should take it (because the DUN connect call is not yet complete).
Currently that never happens because dun_connect() blocks waiting for
the device. That must be fixed, by not waiting. But this opens up a
race, and NMDeviceBt might after NM_MODEM_MANAGER_MODEM_ADDED need to
search for the suitable modem: by iterating the list of all modems.
NMModem-s are either used by NMDeviceModem or by NMDeviceBt.
The mechanism how that is coordinated it odd:
- the factory emits component-added, and then NMDeviceBt
might take the device (and claim it). In that case, component-added
would return TRUE to indicate that the modem should not be also
used by NMDeviceModem.
- next, if the modem has a driver that looks like bluetooth, NMDeviceModem
ignores it too.
- finally, NMDeviceModem claims the modem (which is now considered to
be non-bluetooth).
I think the first problem is that the device factory tries to have this
generic mechanism of "component-added". It's literally only used to
cover this special case. Note that NMDeviceBt is aware of modems. So,
abstracting this just adds lots of code that could be solved better
by handling the case (of giving the modem to either NMDeviceBt or
NMDeviceModem) specifically.
NMWWanFactory itself registers to the NM_MODEM_MANAGER_MODEM_ADDED
signal and emits nm_device_factory_emit_component_added().
We could just have NMWWanFactory and NMDeviceBt both register to
that signal. Signals even support priorities, so we could have
NMDeviceBt be called first to claim the device.
Anyway, as the modem can only have one owner, the modem should have
a flag that indicates whether it's claimed or not. That will allow
multiple components all look at the same modem and moderate who is
going to take ownership.
If DHCPv4 fails but IPv6 succeeds it makes sense to continue trying
DHCP so that we will eventually be able to get an address if the DHCP
server comes back. Always keep the client running; it will be only
terminated when the connection is brought down.
https://bugzilla.redhat.com/show_bug.cgi?id=1688329
In the accept() callback, the nettools client creates a UDP socket
with the received address as source, so the address must be already
configured on the interface.
Also, handle errors returned by nm_dhcp_client_accept().
Fixes: 401fee7c20 ('dhcp: support notifying the client of the result of DAD')
This adds capability to hand over the network configuration from
OpenFirmware (and potentially other boot loaders with openfirmware
support such as U-Boot) to NetworkManager.
It's done analogously to ACPI/iBFT. In fact, the same ip=ibft command
line option is used, adding a more general ip=fw alias. This probably
deserves some documentation, but I'm not adding any at this time.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/257
There's multiple things wrong there, but unnoticed because the error handling
was entirely missing or nobody is using thie anymore.
The Mesh ID needs to be set while the device is down. Also, the channel
needs to be set last, because that's what triggers the connection
attempt. For that the device needs to be up.
Also, fix the error handling.
The comment is wrong. Since 6eaded9071 ('device: add
get_autoconnect_allowed() virtual function'), get_autoconnect_allowed()
is called before the device state is consulted.
The nm-owned flag indicates whether the device was created by NM. If
the realization step fails, the device was not created and so nm-owned
should not be updated.
When the master AC becomes ready, activate_stage1_device_prepare() is
called in a idle handler. If the master AC fails in the meantime, it
will change state to deactivating or deactivated. We must check for
that condition before proceeding with slave activation. Note the the
'master_ready' flag of an AC is never cleared after it is set.
Fixes: 5b677d5a3b ('device: move check for master from nm_device_activate_schedule_stage2_device_config() to end of stage1')
https://bugzilla.redhat.com/show_bug.cgi?id=1747998
With accept_ra set to 1, kernel sends its own router solicitation
messages and parses the advertisements. This duplicates what NM
already does in userspace and has unwanted consequences like [1] and
[2].
The only reason why accept_ra was re-enabled in the past was to apply
RA parameters like ReachableTime and RetransTimer [3]; but now NM
supports them and so accept_ra can be turned off again.
Also, note that previously the option was set in
addrconf6_start_with_link_ready(), and so this was done only when the
method was 'auto'. Instead, now we clear it for all methods except
'ignore'.
[1] https://mail.gnome.org/archives/networkmanager-list/2019-June/msg00027.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1734470
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1068673
IPv6 router advertisement messages contain the following parameters
(RFC 4861):
- Reachable time: 32-bit unsigned integer. The time, in
milliseconds, that a node assumes a neighbor is reachable after
having received a reachability confirmation. Used by the Neighbor
Unreachability Detection algorithm. A value of zero means
unspecified (by this router).
- Retrans Timer: 32-bit unsigned integer. The time, in milliseconds,
between retransmitted Neighbor Solicitation messages. Used by
address resolution and the Neighbor Unreachability Detection
algorithm. A value of zero means unspecified (by this router).
Currently NM ignores them; however, since it leaves accept_ra=1, the
kernel parses RAs and applies those parameters for us [1].
In the next commit kernel handling of RAs will be disabled, so let NM
set those neighbor-related parameters.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ndisc.c?h=v5.2#n1353
Note that by now no callers of nm_device_activate_schedule_stage2_device_config()
are left. All previous callers now re-schedule stage1 instead of directly
scheduling stage2.
Note that if stage2 later also gets re-factored to re-enter itself
instead of scheduling stage3 right away, the function will be used
again.
That means, we can move the check for the master where it belongs: as
part (and at the end of) stage1.
Also, slightly simplify the code. The handler master_ready_cb()
no longer directly calls master_ready(). It's enough to always
enter stage1 again.
Also drop master_ready_handled. We don't need to remember that this
condition was satsified. We can just check it always when we reach
the place in activate_stage1_device_prepare().
I am about to change the when stage1 gets postponed, then the way to
proceed it is to schedule stage1 again (instead of scheduling stage2).
The reason is that stage1 handling should be reentrant and we should
keep entering it until there is no more reason to postpone it. If
a subclass postpones stage1 and then later progresses it by directly
scheduling stage2, then only the subclass is in control over postponing
stage 2.
Instead, anybody should be able to delay stage2 independently. That can
only work if everybody signals readyness to proceed by scheduling stage1
again.
NMDevice's act_stage1_prepare() now does nothing. Calling it is not
useful and has no effect.
In general, when a subclass overwrites a virtual function, it must be
defined whether the subclass must, may or must-not call the parents
implementation. Likewise, it must be clear when the parents
implementation should be chained: first, as last, does it matter?
In any case, that very much depends on how the parent is implemented
and this can only be solved by documentation and common conventions.
It's a forgiving approach to have a parents implementation do nothing,
then the subclass may call it at any time (or not call it at all).
This is especially useful if classes don't know their parent class well.
But in NetworkManager code the relationship between classes are known
at compile time, so every of these classes knows it derives directly
from NMDevice.
This forgingin approach was what NMDevice's act_stage1_prepare() was doing.
However, it also adds lines of code resulting in a different kind of complexity.
So, it's not clear that this forgiving approach is really better. Note
that it also has a (tiny) runtime and code-size overhead.
Change the expectation of how NMDevice's act_stage1_prepare() should be
called: it is no longer implemented, and subclasses *MUST* not chain up.
Note that all subclasses of NMDevice that implement act_stage1_prepare(), call
the parents implementation as very first thing.
Previously, NMDevice's act_stage1_prepare() was setting up SR-IOV. But that is
problemantic. Note that it may have returned NM_ACT_STAGE_RETURN_POSTPONE, in which
case subclasses would just skip their action and return to the caller. Later,
sriov_params_cb() would directly call nm_device_activate_schedule_stage2_device_config(),
and thus act_stage1_prepare() was never executed for the subclass. That
is wrong.
First, I don't think it is good to let subclasses decide whether to call a
parents implementation (and when). It just causes ambiguity. In
the best case we do it always in the same order, in the worst case we
call the parent at the wrong time or never. Instead, we want to initialize
SR-IOV always and early in stage1, so we should just do it directly from
activate_stage1_device_prepare(). Now NMDevice's act_stage1_prepare() does
nothing.
The bigger problem is that when a device wants to resume a stage that
it previously postponed, that it would schedule the next stage!
Instead, it should schedule the same stage again instead. That allows
to postpone the completion of a stage for multiple reasons, and each
call to a certain stage merely notifies that something changed and
we need to check whether we can now complete the stage.
For that to work, stages must become re-entrant. That means we need to
remember whether an action that we care about needs to be started, is pending
or already completed.
Compare this to nm_device_activate_schedule_stage3_ip_config_start(),
which checks whether firewall is configured. That is likewise the wrong
approach. Callers that were in stage2 and postponed stage2, and later would
schedule stage3 when they are ready.
Then nm_device_activate_schedule_stage3_ip_config_start() would check whether
firewall is also ready, and do nothing if that's not the case (relying
that when the firewall code completes to call
nm_device_activate_schedule_stage3_ip_config_start().
- use a [2] array for IPv4/IPv6 variants and a IS_IPv4 variable,
like we do for other places that have similar implementations for
both address families.
- drop ActivationHandleData and use the fields directly. Also drop
activation_source_get_by_family().
- rename "act_handle*" field to "activation_source_*", to follow the
naming of the related accessor functions.
- downgrade the severity of some logging messages.
The only change in behavior is in act_stage1_prepare().
That changes compared to before that we also set the specific
object path if it was already set (and we looked up the AP by
specific object to start with).
Also, for existing APs that we found with nm_wifi_aps_find_first_compatible(),
it changes the order of calling set_current_ap() before nm_active_connection_set_specific_object().
That should not make a different though. I anyway wonder why we even bother to
set the specific object on the AC. Maybe that should be revisited.
act_stage1_prepare() should become re-entrant. That means, we should not clear the state
there. Instead, we clear it where necessary or on deactivate (which we do already).
NM_DEVICE_MODEM_GET_PRIVATE() is based on _NM_GET_PRIVATE(), which has
some smarts to check the pointer type, but is fine with well-known parent
pointer types like "NMDevice *".