With accept_ra set to 1, kernel sends its own router solicitation
messages and parses the advertisements. This duplicates what NM
already does in userspace and has unwanted consequences like [1] and
[2].
The only reason why accept_ra was re-enabled in the past was to apply
RA parameters like ReachableTime and RetransTimer [3]; but now NM
supports them and so accept_ra can be turned off again.
Also, note that previously the option was set in
addrconf6_start_with_link_ready(), and so this was done only when the
method was 'auto'. Instead, now we clear it for all methods except
'ignore'.
[1] https://mail.gnome.org/archives/networkmanager-list/2019-June/msg00027.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1734470
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1068673
IPv6 router advertisement messages contain the following parameters
(RFC 4861):
- Reachable time: 32-bit unsigned integer. The time, in
milliseconds, that a node assumes a neighbor is reachable after
having received a reachability confirmation. Used by the Neighbor
Unreachability Detection algorithm. A value of zero means
unspecified (by this router).
- Retrans Timer: 32-bit unsigned integer. The time, in milliseconds,
between retransmitted Neighbor Solicitation messages. Used by
address resolution and the Neighbor Unreachability Detection
algorithm. A value of zero means unspecified (by this router).
Currently NM ignores them; however, since it leaves accept_ra=1, the
kernel parses RAs and applies those parameters for us [1].
In the next commit kernel handling of RAs will be disabled, so let NM
set those neighbor-related parameters.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ndisc.c?h=v5.2#n1353
Note that by now no callers of nm_device_activate_schedule_stage2_device_config()
are left. All previous callers now re-schedule stage1 instead of directly
scheduling stage2.
Note that if stage2 later also gets re-factored to re-enter itself
instead of scheduling stage3 right away, the function will be used
again.
That means, we can move the check for the master where it belongs: as
part (and at the end of) stage1.
Also, slightly simplify the code. The handler master_ready_cb()
no longer directly calls master_ready(). It's enough to always
enter stage1 again.
Also drop master_ready_handled. We don't need to remember that this
condition was satsified. We can just check it always when we reach
the place in activate_stage1_device_prepare().
I am about to change the when stage1 gets postponed, then the way to
proceed it is to schedule stage1 again (instead of scheduling stage2).
The reason is that stage1 handling should be reentrant and we should
keep entering it until there is no more reason to postpone it. If
a subclass postpones stage1 and then later progresses it by directly
scheduling stage2, then only the subclass is in control over postponing
stage 2.
Instead, anybody should be able to delay stage2 independently. That can
only work if everybody signals readyness to proceed by scheduling stage1
again.
NMDevice's act_stage1_prepare() now does nothing. Calling it is not
useful and has no effect.
In general, when a subclass overwrites a virtual function, it must be
defined whether the subclass must, may or must-not call the parents
implementation. Likewise, it must be clear when the parents
implementation should be chained: first, as last, does it matter?
In any case, that very much depends on how the parent is implemented
and this can only be solved by documentation and common conventions.
It's a forgiving approach to have a parents implementation do nothing,
then the subclass may call it at any time (or not call it at all).
This is especially useful if classes don't know their parent class well.
But in NetworkManager code the relationship between classes are known
at compile time, so every of these classes knows it derives directly
from NMDevice.
This forgingin approach was what NMDevice's act_stage1_prepare() was doing.
However, it also adds lines of code resulting in a different kind of complexity.
So, it's not clear that this forgiving approach is really better. Note
that it also has a (tiny) runtime and code-size overhead.
Change the expectation of how NMDevice's act_stage1_prepare() should be
called: it is no longer implemented, and subclasses *MUST* not chain up.
Note that all subclasses of NMDevice that implement act_stage1_prepare(), call
the parents implementation as very first thing.
Previously, NMDevice's act_stage1_prepare() was setting up SR-IOV. But that is
problemantic. Note that it may have returned NM_ACT_STAGE_RETURN_POSTPONE, in which
case subclasses would just skip their action and return to the caller. Later,
sriov_params_cb() would directly call nm_device_activate_schedule_stage2_device_config(),
and thus act_stage1_prepare() was never executed for the subclass. That
is wrong.
First, I don't think it is good to let subclasses decide whether to call a
parents implementation (and when). It just causes ambiguity. In
the best case we do it always in the same order, in the worst case we
call the parent at the wrong time or never. Instead, we want to initialize
SR-IOV always and early in stage1, so we should just do it directly from
activate_stage1_device_prepare(). Now NMDevice's act_stage1_prepare() does
nothing.
The bigger problem is that when a device wants to resume a stage that
it previously postponed, that it would schedule the next stage!
Instead, it should schedule the same stage again instead. That allows
to postpone the completion of a stage for multiple reasons, and each
call to a certain stage merely notifies that something changed and
we need to check whether we can now complete the stage.
For that to work, stages must become re-entrant. That means we need to
remember whether an action that we care about needs to be started, is pending
or already completed.
Compare this to nm_device_activate_schedule_stage3_ip_config_start(),
which checks whether firewall is configured. That is likewise the wrong
approach. Callers that were in stage2 and postponed stage2, and later would
schedule stage3 when they are ready.
Then nm_device_activate_schedule_stage3_ip_config_start() would check whether
firewall is also ready, and do nothing if that's not the case (relying
that when the firewall code completes to call
nm_device_activate_schedule_stage3_ip_config_start().
- use a [2] array for IPv4/IPv6 variants and a IS_IPv4 variable,
like we do for other places that have similar implementations for
both address families.
- drop ActivationHandleData and use the fields directly. Also drop
activation_source_get_by_family().
- rename "act_handle*" field to "activation_source_*", to follow the
naming of the related accessor functions.
- downgrade the severity of some logging messages.
The only change in behavior is in act_stage1_prepare().
That changes compared to before that we also set the specific
object path if it was already set (and we looked up the AP by
specific object to start with).
Also, for existing APs that we found with nm_wifi_aps_find_first_compatible(),
it changes the order of calling set_current_ap() before nm_active_connection_set_specific_object().
That should not make a different though. I anyway wonder why we even bother to
set the specific object on the AC. Maybe that should be revisited.
act_stage1_prepare() should become re-entrant. That means, we should not clear the state
there. Instead, we clear it where necessary or on deactivate (which we do already).
NM_DEVICE_MODEM_GET_PRIVATE() is based on _NM_GET_PRIVATE(), which has
some smarts to check the pointer type, but is fine with well-known parent
pointer types like "NMDevice *".
If the @append_force argument is set and the object is already in the
list, it must be moved at the end.
Fixes: 22edeb5b69 ('core: track addresses for NMIP4Config/NMIP6Config via NMDedupMultiIndex')
Add test to show a wrong result of ip_ipX_config_replace() due to a
bug in _nm_ip_config_add_obj(). When an address is added to the tail
of the index and another address with the same id already exists, the
existing object is left at the same place, breaking the order of
addresses.
First of all, keyfile writer (and reader) are supposed to be able to store
every profile to disk and re-read a valid profile back. Note that the profile
might be modified in the process, for example, blob certificates are written
to a file. So, the result might no be exactly the same, but it must still be
valid (and should only diverge in expected ways from the original, like mangled
certificates).
Previously, we would re-read the profile after writing to disk. If that failed,
we would only fail an assertion but otherwise proceeed. It is a bug
after all. However, it's bad to check only after writing to file,
because it results in a unreadable profile on disk, and in the first
moment it appears that noting went wrong. Instead, we should fail early.
Note that nms_keyfile_reader_from_keyfile() must entirely operate on the in-memory
representation of the keyfile. It must not actually access any files on disk. Hence,
moving this check before writing the profile must work. Otherwise, that would be
a separate bug. Actually, keyfile reader and writer violate this. I
added FIXME comments for that. But it doesn't interfere with this
patch.
NM didn't support wpa-none for years because kernel drivers used to be
broken. Note that it wasn't even possible to *add* a connection with
wpa-none because it was rejected in nm_settings_add_connection_dbus().
Given that wpa-none is also deprecated in wpa_supplicant and is
considered insecure, drop altogether any reference to it.
This file causes a crash [1], add it to the tests.
Note that the test only check parsing the file and the
crash happens in the "upper" layers. So, it's not really
a test for the crash. But at least have such a file in
our repository.
[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/235
Logging pointer values might reveal information that can be used to defeat
ASLR. We should avoid that.
On the other hand, it's useful to tag a logging message with the pointer
value of the "source" of the message. It helps to correlate messages and
search for relevant messages in the log.
As a compromise, use NM_HASH_OBFUSCATE_PTR(), like we do at several places
already. For example, we also log
<debug> [1566550899.7901] setup NMPlatform singleton (29a6af9867f2e5d0)
This obfuscated value is a 64 bit unsigned integer with the siphash24
hash of the raw value with a randomized seed. Of course, contrary to the
pointer value, there is a tiny chance that two different pointers hash
to the same identifier. However, that seems unlikely enough to be of no
concern. Note that this pointer value is only logged to aid debugging.
It is sufficiently unlikely that this causes confusion.
One other downside of printed the obfuscated value, is that you can no
longer read the pointer from the log and use it in gdb directly. That
might be sometimes convenient, but making this impossible is kinda the
purpose of this change.
As such, nm_log_ptr() becomes a bit of a misnomer. But not too bad, it
still is a good name. For example, if we wanted we could redefine the
NM_HASH_OBFUSCATE_PTR* macros when building "--with-more-asserts".
Previously we only cared whether supplicant is build with support for
FT. In that case we would pass FT-PSK to supplicant, like
Config: added 'key_mgmt' value 'WPA-PSK WPA-PSK-SHA256 FT-PSK'
Supplicant would then always try FT with preference, regardless whether
the interface/driver support it. That results in a failure to associate, if
the driver does not support it.
NetworkManager[1356]: <info> [1566296144.9940] Config: added 'key_mgmt' value 'WPA-PSK WPA-PSK-SHA256 FT-PSK'
...
wpa_supplicant[1348]: wlan0: WPA: AP key_mgmt 0x42 network profile key_mgmt 0x142; available key_mgmt 0x42
wpa_supplicant[1348]: wlan0: WPA: using KEY_MGMT FT/PSK
...
wpa_supplicant[1348]: * akm=0xfac04
...
kernel: ERROR @wl_set_key_mgmt :
kernel: invalid cipher group (1027076)
Since we pass a list of acceptable "key_mgmt" options to supplicant,
FT-PSK should not be used when supplicant knows it's not supported.
That is a supplicant bug.
Regardless, work around it by checking the per-interface capability, and
avoid it if support is apparently not present.
Up until now, a default-route (with prefix length zero) could not
be configured directly. The user could only set ipv4.gateway,
ipv4.never-default, ipv4.route-metric and ipv4.route-table to influence
the setting of the default-route (respectively for IPv6).
That is a problematic limitation. For one, whether a route has prefix
length zero or non-zero does not make a fundamental difference. Also,
it makes it impossible to configure all the routing attributes that one can
configure otherwise for static routes. For example, the default-route could
not be configured as "onlink", could not have a special MTU, nor could it be
placed in a dedicated routing table.
Fix that by lifting the restriction. Note that "ipv4.never-default" does
not apply to /0 manual routes. Likewise, the previous manners of
configuring default-routes ("ipv4.gateway") don't conflict with manual
default-routes.
Server-side this all the pieces are already in place to accept a default-route
as static routes. This was done by earlier commits like 5c299454b4
('core: rework tracking of gateway/default-route in ip-config').
A long time ago, NMIPRoute would assert that the prefix length is
positive. That was relaxed by commit a2e93f2de4 ('libnm: allow zero
prefix length for NMIPRoute'), already before 1.0.0. Using libnm from
before 1.0.0 would result in assertion failures.
Note that the default-route-metric-penalty based on connectivity
checking applies to all /0 routes, even these static routes. Be they
added due to DHCP, "ipv4.gateway", "ipv4.routes" or "wireguard.peer-routes".
I wonder whether doing that unconditionally is desirable, and maybe
there should be a way to opt-out/opt-in for the entire profile or even
per-routes.
https://bugzilla.redhat.com/show_bug.cgi?id=1714438
Minor refactoring so that there is only a one-line change necessary to
flip the implementation of the "internal" DHCP plugin for IPv4 from
"systemd" to "nettools".
We don't do that yet, because there are still some issues (e.g. the
lease is not persisted for nettools plugin). Eventually we want to
switch, so prepare the code to be almost there.
We have the "internal" DHCP plugin. That's our preferred plugin,
and eventually we may drop all other plugins.
Currently, the "internal" plugin is based on code from systemd-networkd
and implemented in "src/dhcp/nm-dhcp-systemd.c". As this code is forked
we eventually want to switch to nettools' n-dhcp4 library (for IPv4).
For that reason we already have "src/dhcp/nm-dhcp-nettools.c".
Note that "nettools" can be configured as a DHCP plugin, but this configuration
is only experimental and for testing. There is never supposed to be a
"nettools" plugin, but eventually the "internal" plugin will switch
implementation.
We don't want to replace systemd-based implementation right away. Not until
we are sure that nettools works well. For that reason we keep them
both in parallel for a while.
This commit makes "systemd" DHCP plugin explicitly configurable
in NetworkManager.conf. Like "nettools" this is an undocumented option,
only for testing.
If you choose "internal" (the default), you get one of the
implementations (currently the "systemd" one). But by selecting
"systemd" or "nettools" explicitly, you can select the exact plugin.