Run:
./contrib/scripts/nm-code-format.sh -i
./contrib/scripts/nm-code-format.sh -i
Yes, it needs to run twice because the first run doesn't yet produce the
final result.
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Setting "active_slave" fails unless the slave is currently present and
IFF_UP. That complicates the code, because we cannot set the property
at any time, but only under the right circumstances.
But really, "active_slave" option is something for debugging. It's not
an option which should be set by NetworkManager. The right option
instead is "primary", which will tell kernel to make the slave active,
when it is ready.
Deprecate the "active_slave" option and make it an alias for "primary".
https://bugzilla.redhat.com/show_bug.cgi?id=1856640
Code doesn't get simpler by having more functions -- if these functions
are only called once.
What actually is a problem is repeated, redundant code. Like the list of
bond options that can be reapplied. But the function didn't help to
avoid repeating the list.
Add a macro for the list of bond options we are going to set. By seeing
them side-by-side, it is hopefully simpler to see that all options are
specified correctly.
We see that:
- the *_SUBSET defines don't include the options that we are explicitly
setting, that is "mode", "active_slave" and "arp_ip_target".
- OPTIONS_REAPPLY_SUBSET contains 4 options less than OPTIONS_APPLY_SUBSET:
"ad_select", "ad_user_port_key", "lacp_rate" and "tlb_dynamic_lb".
These are the options that are marked as BOND_OPTFLAG_IFDOWN in
kernel.
I guess the idea was to only accept options that can be changed without
taking the interface !IFF_UP. "active_slave" is wrongly omitted from
that list.
Also, "active_slave" option doesn't really make sense for NetworkManager
to configure. Instead "primary" should be used. In the future, we should
re-map the properties and deprecate "active_slave" for "primary" ([1]).
Fixes: 746dc119a6 ('bond: let 'reapply()' reapply all supported options')
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1856640#c19https://bugzilla.redhat.com/show_bug.cgi?id=1876577
Reapply now handles all the options supported by kernel and NM, meaning
that some options are simply not allowed to be set while keeping the
bond up, one of those options is the mode for instance.
https://bugzilla.redhat.com/show_bug.cgi?id=1847814
can_reapply_change() would wrongly return true for
unsupported reapply values because it used 'nm_setting_bond_get_option_default()'
that is ill-named because it returns the overriden option other than
its default value.
https://bugzilla.redhat.com/show_bug.cgi?id=1847814
Fixes: 9bd07336ef ('bond: bond options logic rework')
Kernel will reject setting "active_slave", if the interface is not enslaved or not
up. We already handle that by setting the option whenever we enslave an interface.
However, we also must not set it initially, otherwise we get an ugly error log message:
NetworkManager[939]: <debug> [1594709143.7459] platform-linux: sysctl: setting net:/sys/class/net/bond99/bonding/active_slave to eth1 (current value is )
NetworkManager[939]: <error> [1594709143.7459] platform-linux: sysctl: failed to set bonding/active_slave to eth1: (22) Invalid argument
NetworkManager[939]: <warn> [1594709143.7460] device (bond99): failed to set bonding attribute active_slave to eth1
...
kernel: bond99: (slave eth1): Device is not bonding slave
kernel: bond99: option active_slave: invalid value (eth1)
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=1856640https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/577
- the arp_ip_target option in the settings might not have normalized
IP addresses or duplicates. If there would be duplicates, setting
them twice would fail with EINVAL. Hence, first normalize them
and make them unique.
- if what we want to set is identical to what is already set, don't
do anything.
(cherry picked from commit 6a923a5d57)
Add '_nm_setting_bond_get_option_or_default()' and move all the custom
policies applied by NM for bond options in there.
One such example of a custom policy is to set 'miimon' to 0 (instead of its
default value of 100) if 'arp_interval' is explicitly enabled
and 'miimon' is not.
This means removing every piece of logic from
nm_setting_bond_add_option() which used to clear out 'arp_interval' and
'arp_ip_target' if 'miimon' was set or clear out 'miimon' along with
'downdelay', 'updelay' and 'miimon' if 'arp_interval' was set.
This behaviour is a bug since the kernel allow setting any combination
of this options for bonds and NetworkManager should not limit the user
to do so.
Also use 'set_bond_attr_or_default()' instead of 'set_bond_attr()' as
the former calls '_nm_setting_bond_get_option_or_default()' to implement
the right logic to retrieve bond options according to current bond
configuration.
Add VRF support to the daemon. When the device we are activating is a
VRF or a VRF's slave, put routes in the table specified by the VRF
connection.
Also, introduce a VRF device type in libnm.
NMDevice's act_stage1_prepare() now does nothing. Calling it is not
useful and has no effect.
In general, when a subclass overwrites a virtual function, it must be
defined whether the subclass must, may or must-not call the parents
implementation. Likewise, it must be clear when the parents
implementation should be chained: first, as last, does it matter?
In any case, that very much depends on how the parent is implemented
and this can only be solved by documentation and common conventions.
It's a forgiving approach to have a parents implementation do nothing,
then the subclass may call it at any time (or not call it at all).
This is especially useful if classes don't know their parent class well.
But in NetworkManager code the relationship between classes are known
at compile time, so every of these classes knows it derives directly
from NMDevice.
This forgingin approach was what NMDevice's act_stage1_prepare() was doing.
However, it also adds lines of code resulting in a different kind of complexity.
So, it's not clear that this forgiving approach is really better. Note
that it also has a (tiny) runtime and code-size overhead.
Change the expectation of how NMDevice's act_stage1_prepare() should be
called: it is no longer implemented, and subclasses *MUST* not chain up.
Not all masters type have a platform link and so it's wrong to check
for it to decide whether the slave should be really released. Move the
check to master devices that need it (bond, bridge and team).
OVS ports don't need the check because they don't call to platform to
remove a slave.
https://bugzilla.redhat.com/show_bug.cgi?id=1733709
We no longer add these. If you use Emacs, configure it yourself.
Also, due to our "smart-tab" usage the editor anyway does a subpar
job handling our tabs. However, on the upside every user can choose
whatever tab-width he/she prefers. If "smart-tabs" are used properly
(like we do), every tab-width will work.
No manual changes, just ran commands:
F=($(git grep -l -e '-\*-'))
sed '1 { /\/\* *-\*- *[mM]ode.*\*\/$/d }' -i "${F[@]}"
sed '1,4 { /^\(#\|--\|dnl\) *-\*- [mM]ode/d }' -i "${F[@]}"
Check remaining lines with:
git grep -e '-\*-'
The ultimate purpose of this is to cleanup our files and eventually use
SPDX license identifiers. For that, first get rid of the boilerplate lines.
Platform had it's own scheme for reporting errors: NMPlatformError.
Before, NMPlatformError indicated success via zero, negative integer
values are numbers from <errno.h>, and positive integer values are
platform specific codes. This changes now according to nm-error:
success is still zero. Negative values indicate a failure, where the
numeric value is either from <errno.h> or one of our error codes.
The meaning of positive values depends on the functions. Most functions
can only report an error reason (negative) and success (zero). For such
functions, positive values should never be returned (but the caller
should anticipate them).
For some functions, positive values could mean additional information
(but still success). That depends.
This is also what systemd does, except that systemd only returns
(negative) integers from <errno.h>, while we merge our own error codes
into the range of <errno.h>.
The advantage is to get rid of one way how to signal errors. The other
advantage is, that these error codes are compatible with all other
nm-errno values. For example, previously negative values indicated error
codes from <errno.h>, but it did not entail error codes from netlink.
Note the special error codes NM_UTILS_ERROR_CONNECTION_AVAILABLE_*.
This will be used to determine, whether the profile is fundamentally
incompatible with the device, or whether just some other properties
mismatch. That information will be importand during a plain `nmcli
connection up`, where NetworkManager searches all devices for a device
to activate. If no device is found (and multiple errors happened),
we want to show the error that is most likely relevant for the user.
Also note, how NMDevice's check_connection_compatible() uses the new
class field "device_class->connection_type_check_compatible" to simplify
checks for compatible profiles.
The error reason is still unused.
It seems to me the NM_DEVICE_CLASS_DECLARE_TYPES() macro confuses more
than helping. Let's explicitly initialize the two fields, albeit with
another helper macro NM_DEVICE_DEFINE_LINK_TYPES() to get the list of
link-types right.
For consistency, also leave nop-lines like
device_class->connection_type_supported = NULL;
device_class->link_types = NM_DEVICE_DEFINE_LINK_TYPES ();
because all NMDevice class init methods should have this same
boiler plate code and to make it explicit that this is intended.
And there are only 3 occurences where this actually comes into play.
The majority of device implementations name their parent-class variable
"device_class". That also makes more sense as it is more consistant.
E.g. "parent" sounds like it's the direct parent, but that is not
the crucial point here. The crucial point at this place, is that we
access the NMDeviceClass typed pointer. Rename.
'num_grat_arp' and 'num_unsol_na' are actually the same attribute on
kernel side, so if only 'num_grat_arp' is set in configuration, we
first write its value and then overwrite it with the 'num_unsol_na'
default value (1). Instead, just write one of the two option.
https://bugzilla.redhat.com/show_bug.cgi?id=1591734
Coccinelle:
@@
expression a, b;
@@
-a ? a : b
+a ?: b
Applied with:
spatch --sp-file ternary.cocci --in-place --smpl-spacing --dir .
With some manual adjustments on spots that Cocci didn't catch for
reasons unknown.
Thanks to the marvelous effort of the GNU compiler developer we can now
spare a couple of bits that could be used for more important things,
like this commit message. Standards commitees yet have to catch up.
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
Change the output of nm_platform_error_to_string() to print the numeric value.
Also, accept a string buffer instead of using an alloca() allocated buffer.
There is still a macro to provide the previous functionality, but it
was ill-suited to call from inside a loop.
The default value for miimon, when missing in the setting, is 0 if
arp_interval is != 0, and 100 otherwise. So, when generating a
connection, let's ignore miimon=0 (which means that miimon is
disabled) and accept any other value. Adding miimon=100 does not cause
any harm to the connection assumption.
While at it, slightly improve the code: ignore_if_zero() is not useful
for 'updelay','downdelay','arp_interval' because zero is their default
value, so introduce a new function that checks if the value is the
default (and specially handles 'miimon').
Reported-by: Taketo Kabe <rkabe@vega.pgw.jp>
https://bugzilla.redhat.com/show_bug.cgi?id=1463077
Previously, master device types like bridge, bond, and team
would overwrite is_available() and check_connection_available()
and always return TRUE.
The device already expresses via nm_device_is_master() that it
is of a master kind. Refactor the code, so, instead of having these
device types overwrite is_available() and check_connection_available(),
let the parents implementation react on nm_device_is_master().
There is no change in behavior at all. Instead, the knowledge how to
treat a master device moves from the device implementation to the
parent class.
Don't crash if the bond mode can't be read from sysfs - for example
when the interface disappears. The generated connection will be bogus,
but at that point it doesn't matter because the in-memory connection
will be destroyed.
Fixes: 056a973a4fhttps://bugzilla.redhat.com/show_bug.cgi?id=1459580
Reduce the use of NM_PLATFORM_GET / nm_platform_get() to get
the platform singleton instance.
For one, this is a step towards supporting namespaces, where we need
to use different NMNetns/NMPlatform instances depending on in which
namespace the device lives.
Also, we should reduce our use of singletons. They are difficult to
coordinate on shutdown. Instead there should be a clear order of
dependencies, expressed by owning a reference to those singelton
instances. We already own a reference to the platform singelton,
so use it and avoid NM_PLATFORM_GET.
(cherry picked from commit 94d9ee129d)
src/devices/nm-device-bond.c: In function 'check_changed_options':
src/devices/nm-device-bond.c:529:4: error: 'name' may be used uninitialized in this function [-Werror=maybe-uninitialized]
g_set_error (error,
^
src/devices/nm-device-bond.c:505:14: note: 'name' was declared here
const char *name, *value_a, *value_b;
^
src/devices/nm-device-bond.c:528:8: error: 'value_a' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (!nm_streq0 (value_a, value_b)) {
^
src/devices/nm-device-bond.c:505:21: note: 'value_a' was declared here
const char *name, *value_a, *value_b;
^
(cherry picked from commit f66de1dd0f)