The _GET_PRIVATE() macros are all implemented based on
_NM_GET_PRIVATE(). That macro tries to be more type safe and uses
_Generic() to do the right thing. Explicitly casting is not only
unnecessary, it defeats these (static) type checks.
Don't do that.
We try to set only one time the MTU from the connection to not
interfere with manual user changes.
If at some point the parent interface changes temporarily MTU to a
lower value (for example, because the connection was reactivated), the
kernel will also lower the MTU on child interface and we will not
update it ever again.
Add a workaround to this. If we detect that the MTU we want to set
from connection is higher that the allowed one, go into a state where
we follow the parent MTU until it is possible to set again the desired
MTU. This is a bit ugly, but I can't think of any nicer way to do it.
https://bugzilla.redhat.com/show_bug.cgi?id=1751079
NMDevice's act_stage1_prepare() now does nothing. Calling it is not
useful and has no effect.
In general, when a subclass overwrites a virtual function, it must be
defined whether the subclass must, may or must-not call the parents
implementation. Likewise, it must be clear when the parents
implementation should be chained: first, as last, does it matter?
In any case, that very much depends on how the parent is implemented
and this can only be solved by documentation and common conventions.
It's a forgiving approach to have a parents implementation do nothing,
then the subclass may call it at any time (or not call it at all).
This is especially useful if classes don't know their parent class well.
But in NetworkManager code the relationship between classes are known
at compile time, so every of these classes knows it derives directly
from NMDevice.
This forgingin approach was what NMDevice's act_stage1_prepare() was doing.
However, it also adds lines of code resulting in a different kind of complexity.
So, it's not clear that this forgiving approach is really better. Note
that it also has a (tiny) runtime and code-size overhead.
Change the expectation of how NMDevice's act_stage1_prepare() should be
called: it is no longer implemented, and subclasses *MUST* not chain up.
We no longer add these. If you use Emacs, configure it yourself.
Also, due to our "smart-tab" usage the editor anyway does a subpar
job handling our tabs. However, on the upside every user can choose
whatever tab-width he/she prefers. If "smart-tabs" are used properly
(like we do), every tab-width will work.
No manual changes, just ran commands:
F=($(git grep -l -e '-\*-'))
sed '1 { /\/\* *-\*- *[mM]ode.*\*\/$/d }' -i "${F[@]}"
sed '1,4 { /^\(#\|--\|dnl\) *-\*- [mM]ode/d }' -i "${F[@]}"
Check remaining lines with:
git grep -e '-\*-'
The ultimate purpose of this is to cleanup our files and eventually use
SPDX license identifiers. For that, first get rid of the boilerplate lines.
If the AddAndActivate() caller didn't explicitely a MAC address, default
to pinpointing the connection to the device by the means of an interface
name. This makes more sense than a MAC address with stable device names.
Platform had it's own scheme for reporting errors: NMPlatformError.
Before, NMPlatformError indicated success via zero, negative integer
values are numbers from <errno.h>, and positive integer values are
platform specific codes. This changes now according to nm-error:
success is still zero. Negative values indicate a failure, where the
numeric value is either from <errno.h> or one of our error codes.
The meaning of positive values depends on the functions. Most functions
can only report an error reason (negative) and success (zero). For such
functions, positive values should never be returned (but the caller
should anticipate them).
For some functions, positive values could mean additional information
(but still success). That depends.
This is also what systemd does, except that systemd only returns
(negative) integers from <errno.h>, while we merge our own error codes
into the range of <errno.h>.
The advantage is to get rid of one way how to signal errors. The other
advantage is, that these error codes are compatible with all other
nm-errno values. For example, previously negative values indicated error
codes from <errno.h>, but it did not entail error codes from netlink.
In the past, the headers "linux/if.h" and "net/if.h" were incompatible.
That means, we can either include one or the other, but not both.
This is fixed in the meantime, however the issue still exists when
building against older kernel/glibc.
That means, including one of these headers from a header file
is problematic. In particular if it's a header like "nm-platform.h",
which itself is dragged in by many other headers.
Avoid that by not including these headers from "platform.h", but instead
from the source files where needed (or possibly from less popular header
files).
Currently there is no problem. However, this allows an unknowing user to
include <net/if.h> at the same time with "nm-platform.h", which is easy
to get wrong.
Note the special error codes NM_UTILS_ERROR_CONNECTION_AVAILABLE_*.
This will be used to determine, whether the profile is fundamentally
incompatible with the device, or whether just some other properties
mismatch. That information will be importand during a plain `nmcli
connection up`, where NetworkManager searches all devices for a device
to activate. If no device is found (and multiple errors happened),
we want to show the error that is most likely relevant for the user.
Also note, how NMDevice's check_connection_compatible() uses the new
class field "device_class->connection_type_check_compatible" to simplify
checks for compatible profiles.
The error reason is still unused.
It seems to me the NM_DEVICE_CLASS_DECLARE_TYPES() macro confuses more
than helping. Let's explicitly initialize the two fields, albeit with
another helper macro NM_DEVICE_DEFINE_LINK_TYPES() to get the list of
link-types right.
For consistency, also leave nop-lines like
device_class->connection_type_supported = NULL;
device_class->link_types = NM_DEVICE_DEFINE_LINK_TYPES ();
because all NMDevice class init methods should have this same
boiler plate code and to make it explicit that this is intended.
And there are only 3 occurences where this actually comes into play.
The majority of device implementations name their parent-class variable
"device_class". That also makes more sense as it is more consistant.
E.g. "parent" sounds like it's the direct parent, but that is not
the crucial point here. The crucial point at this place, is that we
access the NMDeviceClass typed pointer. Rename.
Instead of returning a boolean @is_user_config value from
get_configured_mtu(), return an mtu-source enum with possible values
NONE,CONNECTION. This enum will be expanded later; for now there is no
change in behavior.
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
XXX was used to either raise attention (NOTE) or to indicate
that this is ugly code that should be fixed (FIXME). The usage
was inconsistent.
Let's avoid XXX and use either NOTE or FIXME.
At startup the manager tries to create virtual devices without a
specific order and spits warnings when a device can't be realized
because the parent device is not yet created. These failures are not
something the user should worry about because the creation will be
retried when the parent appears.
A better approach is to return an error code from the device's
create_and_realize() telling that it failed because the parent doesn't
exist. In this way, the manager knows that the device isn't ready and
can avoid printing warning messages.
Change the output of nm_platform_error_to_string() to print the numeric value.
Also, accept a string buffer instead of using an alloca() allocated buffer.
There is still a macro to provide the previous functionality, but it
was ill-suited to call from inside a loop.
Since commit 2b51d3967 "device: merge branch 'th/device-mtu-bgo777251'",
we always set the MTU for certain device types during activation. Even
if the MTU is neither specified via the connection nor other means, like
DHCP.
Revert that change. On activation, if nothing explicitly configures the
MTU, leave it unchanged. This is like what we do with ethernet's
cloned-mac-address, which has a default value "preserve".
So, as last resort the default value for MTU is now 0 (don't change),
instead of depending on the device type.
Note that you also can override the default value in global
configuration via NetworkManager.conf.
This behavior makes sense, because whenever NM actively resets the MTU,
it remembers the previous value and restores it when deactivating
the connection. That wasn't implemented before 2b51d3967, and the
MTU would depend on which connection was previously active. That
is no longer an issue as the MTU gets reset when deactivating.
https://bugzilla.redhat.com/show_bug.cgi?id=1460760
Reduce the use of NM_PLATFORM_GET / nm_platform_get() to get
the platform singleton instance.
For one, this is a step towards supporting namespaces, where we need
to use different NMNetns/NMPlatform instances depending on in which
namespace the device lives.
Also, we should reduce our use of singletons. They are difficult to
coordinate on shutdown. Instead there should be a clear order of
dependencies, expressed by owning a reference to those singelton
instances. We already own a reference to the platform singelton,
so use it and avoid NM_PLATFORM_GET.
(cherry picked from commit 94d9ee129d)
This argument is only relevant when the NMActStageReturn argument
indicates NM_ACT_STAGE_RETURN_FAILURE. In all other cases it is ignored.
Rename the argument to make the meaning clearer. The argument is passed
through several layers of code, it isn't obvious that this argument only
matters for the failure case. Also, the distinct name makes it easier
to distinguish from other uses of the "reason" name.
While at it, do some drive-by cleanup:
- use g_return_*() instead of g_assert() to have a more graceful
assertion.
- functions like dhcp4_start() don't need to return a failure reason.
Most callers don't care, and the caller who does can determine the
proper reason.
- allow omitting the out-argument via NM_SET_OUT().
This allows a user to restore the previous behavior where NetworkManager
would not reconfigure the MTU during device activation, if no MTU is
available (commit "22e8af6 device: set a per-device default MTU on
activation").
Well, not exactly. The previous behavior was to use per-connection
configuration, then DHCP provided value, or finally leave the MTU
unspecified.
Now, we prefer a per-connection configuration, followed by a global
connection default. If "ethernet.mtu=0", the MTU is left unspecified.
In absense of a global connection default, the value from DHCP is used
or finally a per-device-type default. That is effectively 1500 for most
types, except for infiniband where the MTU is still left unspecified.
Instead of overwriting ip4_config_pre_commit(), add a new function
get_mtu().
This also adds a default value in case there is no user-configuration.
This will allow us later to reset a default MTU based on the device
type.
This makes it easier to install the files with proper names.
Also, it makes the makefile rules slightly simpler.
Lastly, the documentation is now generated into docs/api, which makes it
possible to get rid of the awkward relative file names in docbook.
Keep the include paths clean and separate. We use directories to group source
files together. That makes sense (I guess), but then we should use this
grouping also when including files. Thus require to #include files with their
path relative to "src/".
Also, we build various artifacts from the "src/" tree. Instead of having
individual CFLAGS for each artifact in Makefile.am, the CFLAGS should be
unified. Previously, the CFLAGS for each artifact differ and are inconsistent
in which paths they add to the search path. Fix the inconsistency by just
don't add the paths at all.
We only needed proper glib enum types for having properties
and signal arguments. These got all converted to plain int,
so no longer generate such an enum type.
Had to rename "nm-enum-types.h" because it works badly with
"libnm/nm-enum-types.h". Maybe I could fix that differently,
but duplicate names is anyway error prone.
Note that "nm-core-enum-types.h" is already taken too, so
"nm-src-enum-types.h" it is.
An interface would make sense to allow the actual device-factory to inherit
from another type.
However, glib interfaces make code much harder to follow and less
efficient. The device factory shall be a very simple type with meta data
about supported device types and the ability to create device instances.
There is no need to make this an interface implementation, instead just
let the factories inherit from NM_TYPE_DEVICE_FACTORY directly.
- use _NM_GET_PRIVATE() and _NM_GET_PRIVATE_PTR() everywhere.
- reorder statements, to have GObject related functions (init, dispose,
constructed) at the bottom of each file and in a consistent order w.r.t.
each other.
- unify whitespaces in signal and properties declarations.
- use NM_GOBJECT_PROPERTIES_DEFINE() and _notify()
- drop unused signal slots in class structures
- drop unused header files for device factories
Note that usually for infiniband we cannot get a permanent MAC address
via ethtool. Thus, nm_device_get_permanent_hw_address() will return the
current address due to fallback_fake=TRUE.
The infiniband drivers don't implement the rtnetlink link deletions.
Therefore we unrealize the NMDevice instance but the backing resources
stay around, preventing us from ever realizing the device again.
The device creation can be attempted if the name can be determined. It
alone is doesn't mean that there's a parent device -- the name could
just have been hardcoded in the connection.
NetworkManager[21519]: nm_device_get_ifindex: assertion 'NM_IS_DEVICE (self)' failed
Program received signal SIGTRAP, Trace/breakpoint trap.
g_logv (log_domain=0x5555557fb2e5 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffffffd3d0) at gmessages.c:1046
1046 g_private_set (&g_log_depth, GUINT_TO_POINTER (depth));
(gdb) bt
#0 0x00007ffff4ec88c3 in g_logv (log_domain=0x5555557fb2e5 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffffffd3d0) at gmessages.c:1046
#1 0x00007ffff4ec8a3f in g_log (log_domain=<optimized out>, log_level=<optimized out>, format=<optimized out>) at gmessages.c:1079
#2 0x00005555555d2090 in nm_device_get_ifindex (self=0x0) at devices/nm-device.c:562
#3 0x00005555555ef77a in nm_device_supports_vlans (self=0x0) at devices/nm-device.c:9865
#4 0x00005555555bf2f9 in create_and_realize (device=0x555555c549b0 [NMDeviceVlan], connection=0x555555b451e0, parent=0x0, out_plink=0x7fffffffd5f8, error=0x7fffffffd700) at devices/nm-device-vlan.c:225
#5 0x00005555555d5757 in nm_device_create_and_realize (self=0x555555c549b0 [NMDeviceVlan], connection=0x555555b451e0, parent=0x0, error=0x7fffffffd700) at devices/nm-device.c:1783
#6 0x0000555555688601 in system_create_virtual_device (self=0x555555af51c0 [NMManager], connection=0x555555b451e0) at nm-manager.c:1120
#7 0x000055555568894e in connection_changed (settings=0x555555ae8220 [NMSettings], connection=0x555555b451e0, manager=0x555555af51c0 [NMManager]) at nm-manager.c:1172
#8 0x0000555555693448 in nm_manager_start (self=0x555555af51c0 [NMManager], error=0x7fffffffda30) at nm-manager.c:4466
#9 0x00005555555d166f in main (argc=1, argv=0x7fffffffdba8) at main.c:454
(gdb)
Fixes: 332994f1b1