Cleanup logging to always print a "block-autoconnect:" prefix to related
lines. Also, make sure that everywhere where the state changes, a line
gets logged. Also, for devconf data print both the interface and the
profile.
We only have a few blocked reasons. Some of them can be only set on the
devcon data, and some only on the settings connection. Assert that we
don't mix that up.
Add per port priority support for bond active port re-selection during
failover. A higher number means a higher priority in selection. The
primary port still has the highest priority. This option is only
compatible with active-backup, balance-tlb and balance-alb modes.
sysfs is deprecated and kernel will not add new bond port options to
sysfs. Netlink is a stable API and therefore is the right method to
communicate with kernel in order to set the link options.
The previous logging happened, when the value did not change. Log
instead, when the value changes.
Fixes: 86bb09c93b ('dns: generate correct search domain for hostnames on non-public TLD')
dns-manager uses the Mozilla Public Suffix List to determine an
appropriate search domain when generating /etc/resolv.conf. It is
presumed that if the hostname is "example.com", the user does not want
to automatically search "com" for unqualified hostnames, which is
reasonable. To implement that, prior to the fix, domain_is_valid()
implicitly used the PSL "prevailing star rule", which had the
consequence of assuming that any top-level domain (TLD) is public
whether it is on the official suffix list or not. That meant
"example.local" or "example.localdomain" would not result in searching
"local" or "localdomain" respectively, but rather /etc/resolv.conf would
contain the full hostname "example.local" as the search domain and not
give users what they expect. The fix here uses the newer PSL API
function that allows us to turn off the "prevailing star rule" so that
"local" and "localdomain" are NOT considered public TLDs because they
are not literally on the suffix list. That in turn gives us the search
domain "local" or "localdomain" in /etc/resolv.conf and allows
unqualified hostname lookups "e.g., resolvectl query example" to find
example.local while example.com still maintains the previous behavior
(i.e., search domain of "example.com" rather than "com").
[thaller@redhat.com: reworded commit message]
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1281https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1613
Before commit a42682d44f ('device: take reference to device object
before 'delete_on_deactivate''), we used a weak pointer to track the
idle action.
As we now use a strong reference, we can store all data about the idle
action in NMDevice itself. Drop DeleteOnDeactivateData.
NMDevice holds a reference to NMManager, which holds a reference to NMPolicy.
It is not possible that we try to dispose NMPolicy while there are still devices
registered. That would be a bug, that we need to find and solve
differently. Add an assertion instead of trying to handle it.
Add an assertion to nm_policy_device_recheck_auto_activate_schedule(),
that the device is currently registered in NMPolicy. Calling it outside
would be odd, and likely a bug.
But if we only register the auto-activate while being registered, we
don't need to take an additional reference. We know that the object must
be be alive (also, we have assertions that in fact it is still alive).
Hook the information for tracking the activation of a device, to the
NMDevice itself. Sure, that slightly couples the NMPolicy closer to
NMDevice, but the result is still simpler code because we don't need a
separate ActivateData.
It also means we can immediately tell whether the auto activation check
for NMDevice is already scheduled and don't need to search through the
list.
NMPolicy really should be merged into NMManager. It has not a clear responsiblity
so that there are two separate objects only makes things confusing. Anyway. It
is permissible to look up the NMPolicy instance of a NMManager. Add an accessor.
It's the better name. Especially since there is no more signal involved,
the term "emit" doesn't match.
Note also how the previous approach using a signal tried to abstract
what is happening. So we were no longer rechecking-autoconnect, instead,
we were emitting-a-signal-to-recheck-autoconnect. Just be plain about
what it is doing and don't go through a layer of signal.
GObject signals don't make the code easier to understand, on the
contrary. They may have their purpose, when objects truly must/should
not be aware of each other, and need to be composed very loosely. That
is not the case here.
There really is only one subscriber to NM_DEVICE_RECHECK_AUTO_ACTIVATE
signal, and it only makes sense this way. Instead of going through a
signal invocation, just call the well known method directly. It becomes
clearer who calls this code (and it has a lower overhead).
When using cscope/ctags it also is easier to follow the code because the
tools understand function calls.
The delete_on_deactivate_link_delete() handler may be called after the
device was already removed from NMManager. Don't allow that.
Check whether the device is still exported on D-Bus as indication.
NM_reboot_openvswitch_vlan_configuration_var2 test exposes a race. What
the test does, is to create OVS profiles and repeatedly restart
NetworkManager, checking that those profiles autoconnect and the OVS
configuration gets created.
There is a race, where:
- the OVS interface exists, and an NMDeviceOvsInterface is created
- first ovsdb cleans up old interfaces, sending a json request.
- OVS deletes the interface, and NetworkManager first picks up the
platform signal (there is a race here, usually the ovsdb request
completes first, which will cleanup the NMDeviceOvsInterface in
a different way).
- when the device gets unrealized, we don't schedule a
check-autoactivate, so the device stays down.
See https://bugzilla.redhat.com/show_bug.cgi?id=2152864#c5 for a log
file with more details.
What should instead happen, is to autoactivate the OVS interface, which
then also fully configures the port and bridge interfaces.
Explicitly schedule an autoactivate when unrealizing devices.
Note that there are now several cases, where NetworkManager autoconnects
more eagerly. This even affects some CI tests and user-visible behavior.
But I think relying on "just don't call nm_device_emit_recheck_auto_activate()
to hope that autoconnect doesn't happen is wrong. It must always be
possible to trigger an autoconnect check, and the right thing must
happen. We only don't trigger autoconnect checks *all* the time, because
it would be a waste of CPU resources, but whenever we slightly suspect
that an autoconnect may happen, we must be allowed to trigger a check.
If a device is in a condition where it previously did not autoconnect,
and it also *should* not autoconnect, then we need to fix the code that
evaluates whether an autoconnect may happen (not avoid triggering a
check).
https://bugzilla.redhat.com/show_bug.cgi?id=2152864
Fixes-test: @NM_reboot_openvswitch_vlan_configuration_var2
Currently, when we delete a device then autoconnect does not kick in
right away. But that is only, because we happen not to schedule a
"autoactivate" recheck.
What should be happen, is that rechecking whether to autoconnect is
always allowed, and that we have the necessary state to know that
autoconnect currently should not work.
Instead, block autoconnect of the involved profile. That makes sense,
because clearly we don't want to autoconnect right again after `nmcli
device delete $iface`.
The "connection.stable-id" supports placeholders like "${CONNECTION}" or
"${DEVICE}".
The stable-id can also be specified in global connection defaults in
NetworkManager.conf, by leaving it unset in the profile. Global
connection defaults always follow the pattern, that they correspond to a
per-profile property, and only when the per-profile value indicates a
special default/unset value, the global connection default is consulted.
Finally, if the global connection default is also not configured in
NetworkManager.conf, a built-in default is used (which may not be
constant either, for example ipv6.ip6-privacy's built-in default depends
on a sysctl value).
In any case, every possible configuration that can be achieved should be
configurable both per-profile and via global connection default. That
was not given for the stable-id, because the built-in default generated
an ID in a way that could not be explicitly expressed otherwise.
So you could not:
- explicitly set the per-profile value to the built-in default, to avoid
that the global-connection-default overwrites it.
- explicitly set the global-connection-default to the built-in default,
to avoid that a lower priority [connection*] section overwrites the
stable-id again.
Fix that inconsistency to make it possible to explicitly set the
built-in default.
Change behavior for literally "default${CONNECTION}" and make it behave
as the built-in default. Also document that the built-in default has that
value.
It's unlikely that this breaks an existing configuration, but of course,
if any user configured "connection.stable-id=default${CONNECTION}", then
the behavior changes for them.
Using the ppp code is rather ugly.
Historically, the pppd headers don't follow a good naming convention,
and define things that cause conflicts with our headers:
/usr/include/pppd/patchlevel.h:#define VERSION "2.4.9"
/usr/include/pppd/pppd.h:typedef unsigned char bool;
Hence we had to include the pppd headers in certain order, and be
careful.
ppp 2.5 changes API and cleans that up. But since we need to support
also old versions, it does not immediately simplify anything.
Only include "pppd" headers in "nm-pppd-compat.c" and expose a wrapper
API from "nm-pppd-compat.h". The purpose is that "nm-pppd-compat.h"
exposes clean names, while all the handling of ppp is in the source
file.
This change does the following
* Adding in nm-pppd-compat.h to mask details regarding different
versions of pppd.
* Fix the nm-pppd-plugin.c regarding differences in API between
2.4.9 (current) and latet pppd 2.5.0 in master branch
* Additional fixes to the configure.ac to appropriately set defines used
for compilation
Older versions of iproute2 don't support the "enclimit" argument. Work
around that from the unit tests.
Fixes: 1505ca3626 ('platform/tests: ip6gre & ip6gretap test cases (ip6 tunnel flags)')
If the client was waiting for IPv6 DAD to complete and the lease was
updated or lost, `wait_ipv6_dad` needs to be cleared; otherwise, at
the next platform change the client will try to evaluate the DAD state
with a different or no lease. In particular if there is no lease the
client will try to decline it because there are no valid addresses,
leading to an assertion failure:
../src/core/dhcp/nm-dhcp-client.c:997:_dhcp_client_decline: assertion failed: (l3cd)
Backtrace:
__GI_raise ()
__GI_abort ()
g_assertion_message ()
g_assertion_message_expr ()
_dhcp_client_decline (self=0x1af13b0, l3cd=0x0, error_message=0x8e25e1 "DAD failed", error=0x7ffec2c45cb0) at ../src/core/dhcp/nm-dhcp-client.c:997
l3_cfg_notify_cb (l3cfg=0x1bc47f0, notify_data=0x7ffec2c46c60, self=0x1af13b0) at ../src/core/dhcp/nm-dhcp-client.c:1190
g_closure_invoke ()
g_signal_emit_valist ()
g_signal_emit ()
_nm_l3cfg_emit_signal_notify () at ../src/core/nm-l3cfg.c:629
_nm_l3cfg_notify_platform_change_on_idle () at ../src/core/nm-l3cfg.c:1390
_platform_signal_on_idle_cb () at ../src/core/nm-netns.c:411
g_idle_dispatch ()
Fixes: 393bc628ff ('dhcp: wait DAD completion for DHCPv6 addresses')
https://bugzilla.redhat.com/show_bug.cgi?id=2179890https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1594
<error> is mostly about "really should not happen" scenarios. It's
closer to an assertion failure, and something that NetworkManager should
not happen.
Of course, things can go wrong, but <warn> is a sufficient. When ovsdb
gives unexpected communication, it's just a warning. At least, that's
also what all the similar cases in "nm-ovsdb.c" already do
GSocketConnection/GOutputStream/GInputStream seems rather unnecessary.
Maybe they make sense when you want to write portable code (for
Windows). Otherwise, watching a file descriptor and reading/writing it
directly is simpler (and also more efficient).
For example, we passed no GCancellable to g_input_stream_read_async().
What does that mean w.r.t. destroying the NMOvsdb instance? I suspect
it's wrong, but it's hard to say, because there are so many layers of
code.
Note that we anyway keep state in NMOvsdb, namely the data we want to
send (output_buf) and the data we partially received (input_buf). All we
need, are poll notifications when the file descriptor is ready. To
those, we hook up the read/write callbacks. Also before was the code
async, and there were callbacks when when read/write was done. That does
not simplify the code in any way.
- we no longer use separate NMOvsdbPrivate.buf and NMOvsdbPrivate.input
buffers. There is just a NMOvsdbPrivate.input_buf that can we can fill
directly.
The "priv->bufp" offset is only used while parsing a message at a time.
It's unnecessary to track it in NMOvsdbPrivate and keep it between
parsing messages. Tracking the state in NMOvsdbPrivate makes it more
complicated to understand, because one needs to reason at which times
the state is used (when it really is not used).
Also, move the parsing to a separate function.
We did not initialize "child_stderr". If that were necessary, we would need
to add it too. However, it is clearly not necessary to initialize those fields.