Platform had it's own scheme for reporting errors: NMPlatformError.
Before, NMPlatformError indicated success via zero, negative integer
values are numbers from <errno.h>, and positive integer values are
platform specific codes. This changes now according to nm-error:
success is still zero. Negative values indicate a failure, where the
numeric value is either from <errno.h> or one of our error codes.
The meaning of positive values depends on the functions. Most functions
can only report an error reason (negative) and success (zero). For such
functions, positive values should never be returned (but the caller
should anticipate them).
For some functions, positive values could mean additional information
(but still success). That depends.
This is also what systemd does, except that systemd only returns
(negative) integers from <errno.h>, while we merge our own error codes
into the range of <errno.h>.
The advantage is to get rid of one way how to signal errors. The other
advantage is, that these error codes are compatible with all other
nm-errno values. For example, previously negative values indicated error
codes from <errno.h>, but it did not entail error codes from netlink.
In the past, the headers "linux/if.h" and "net/if.h" were incompatible.
That means, we can either include one or the other, but not both.
This is fixed in the meantime, however the issue still exists when
building against older kernel/glibc.
That means, including one of these headers from a header file
is problematic. In particular if it's a header like "nm-platform.h",
which itself is dragged in by many other headers.
Avoid that by not including these headers from "platform.h", but instead
from the source files where needed (or possibly from less popular header
files).
Currently there is no problem. However, this allows an unknowing user to
include <net/if.h> at the same time with "nm-platform.h", which is easy
to get wrong.
Add a helper function nm_device_parent_find_for_connection() to
unify implementations of setting the parent in update_connection().
There is some change in behavior, in particular for nm-device-vlan.c,
which no longer compares the link information from platform. But
update_connection() is anyway a questionable concept, only used
for external assumed connection (which itself, is questionable). Meaning,
update_connection() is a hack not science, and it's not at all clear
what the correct behavior is.
Also, note how vlan's implementation differs from all others. Why?
Should we always resort to also check the information from platform?
Either way, one of the two approaches should be used consistently and
nm_device_parent_find_for_connection() opts to not consult platform
cache.
Note the special error codes NM_UTILS_ERROR_CONNECTION_AVAILABLE_*.
This will be used to determine, whether the profile is fundamentally
incompatible with the device, or whether just some other properties
mismatch. That information will be importand during a plain `nmcli
connection up`, where NetworkManager searches all devices for a device
to activate. If no device is found (and multiple errors happened),
we want to show the error that is most likely relevant for the user.
Also note, how NMDevice's check_connection_compatible() uses the new
class field "device_class->connection_type_check_compatible" to simplify
checks for compatible profiles.
The error reason is still unused.
It seems to me the NM_DEVICE_CLASS_DECLARE_TYPES() macro confuses more
than helping. Let's explicitly initialize the two fields, albeit with
another helper macro NM_DEVICE_DEFINE_LINK_TYPES() to get the list of
link-types right.
For consistency, also leave nop-lines like
device_class->connection_type_supported = NULL;
device_class->link_types = NM_DEVICE_DEFINE_LINK_TYPES ();
because all NMDevice class init methods should have this same
boiler plate code and to make it explicit that this is intended.
And there are only 3 occurences where this actually comes into play.
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The change doesn't really make a difference. I thought it would, so I
did it. But turns out (as the code correctly assumes), while the
notifications are frozen, it's OK to leave the property still in an
inconsistent state while emitting the notify signal.
Still, it feels slightly more correct this way, so keep the change.
At startup the manager tries to create virtual devices without a
specific order and spits warnings when a device can't be realized
because the parent device is not yet created. These failures are not
something the user should worry about because the creation will be
retried when the parent appears.
A better approach is to return an error code from the device's
create_and_realize() telling that it failed because the parent doesn't
exist. In this way, the manager knows that the device isn't ready and
can avoid printing warning messages.
Change the output of nm_platform_error_to_string() to print the numeric value.
Also, accept a string buffer instead of using an alloca() allocated buffer.
There is still a macro to provide the previous functionality, but it
was ill-suited to call from inside a loop.
Reduce the use of NM_PLATFORM_GET / nm_platform_get() to get
the platform singleton instance.
For one, this is a step towards supporting namespaces, where we need
to use different NMNetns/NMPlatform instances depending on in which
namespace the device lives.
Also, we should reduce our use of singletons. They are difficult to
coordinate on shutdown. Instead there should be a clear order of
dependencies, expressed by owning a reference to those singelton
instances. We already own a reference to the platform singelton,
so use it and avoid NM_PLATFORM_GET.
(cherry picked from commit 94d9ee129d)
The state-change of a device has a reason argument, which is mostly for information
only.
There are many places in code that are the source of a state-reason.
Mostly these are calls to:
- nm_device_state_changed()
- nm_device_queue_state()
- nm_device_queue_recheck_available()
- nm_device_set_unmanaged_by_*()
- nm_device_master_release_one_slave()
- nm_device_ip_method_failed()
- nm_modem_emit_prepare_result()
- nm_modem_emit_ppp_failed()
- nm_manager_deactivate_connection()
- NM_SET_OUT (out_failure_reason, NM_DEVICE_STATE_REASON_*);
However, there are a few places in code that look at the reason
to decide how to proceed. I think this is a bad pattern, because
cause and effect are decoupled and it gets hard to understand where
a certain reason is set and what consequences that has.
Add a nop-function nm_device_state_reason_check() to mark all uses
of the device state reason that derive decisions from it. That is,
highlight the "effect" part.
This argument is only relevant when the NMActStageReturn argument
indicates NM_ACT_STAGE_RETURN_FAILURE. In all other cases it is ignored.
Rename the argument to make the meaning clearer. The argument is passed
through several layers of code, it isn't obvious that this argument only
matters for the failure case. Also, the distinct name makes it easier
to distinguish from other uses of the "reason" name.
While at it, do some drive-by cleanup:
- use g_return_*() instead of g_assert() to have a more graceful
assertion.
- functions like dhcp4_start() don't need to return a failure reason.
Most callers don't care, and the caller who does can determine the
proper reason.
- allow omitting the out-argument via NM_SET_OUT().
Instead of overwriting ip4_config_pre_commit(), add a new function
get_mtu().
This also adds a default value in case there is no user-configuration.
This will allow us later to reset a default MTU based on the device
type.
Most implementations of realize_start_notify() do the same
for link_changed().
Let NMDevice's base implementation of realize_start_notify() call
link_changed() -- which by default does notthing. This allows subclasses
to only overwrite link_changed().
This makes it easier to install the files with proper names.
Also, it makes the makefile rules slightly simpler.
Lastly, the documentation is now generated into docs/api, which makes it
possible to get rid of the awkward relative file names in docbook.
Keep the include paths clean and separate. We use directories to group source
files together. That makes sense (I guess), but then we should use this
grouping also when including files. Thus require to #include files with their
path relative to "src/".
Also, we build various artifacts from the "src/" tree. Instead of having
individual CFLAGS for each artifact in Makefile.am, the CFLAGS should be
unified. Previously, the CFLAGS for each artifact differ and are inconsistent
in which paths they add to the search path. Fix the inconsistency by just
don't add the paths at all.
An interface would make sense to allow the actual device-factory to inherit
from another type.
However, glib interfaces make code much harder to follow and less
efficient. The device factory shall be a very simple type with meta data
about supported device types and the ability to create device instances.
There is no need to make this an interface implementation, instead just
let the factories inherit from NM_TYPE_DEVICE_FACTORY directly.
- use _NM_GET_PRIVATE() and _NM_GET_PRIVATE_PTR() everywhere.
- reorder statements, to have GObject related functions (init, dispose,
constructed) at the bottom of each file and in a consistent order w.r.t.
each other.
- unify whitespaces in signal and properties declarations.
- use NM_GOBJECT_PROPERTIES_DEFINE() and _notify()
- drop unused signal slots in class structures
- drop unused header files for device factories
This was added by commit 4de8851eca, probably
by copying from NMDeviceVlan. It's not clear why a netlink request to
set the device IFF_UP would fail, or why that warrants a retry.
Instead of letting different subclasses call reset in their
virtual deactivate() function, do it in the parent class.
This works nicely, because the parent know whether the MAC
address is currently modified.
When a user want to explicitly spoof the MAC address, a failure
to do so should fail activation. For one, failing to do so may
be a security problem. In any case, if user asks to configure the
interface in a certain way and we fail to do so that shall result
in a failure to activate.
Extend the "ethernet.cloned-mac-address" and "wifi.cloned-mac-address"
settings. Instead of specifying an explicit MAC address, the additional
special values "permanent", "preserve", "random", "random-bia", "stable" and
"stable-bia" are supported.
"permanent" means to use the permanent hardware address. Previously that
was the default if no explict cloned-mac-address was set. The default is
thus still "permanent", but it can be overwritten by global
configuration.
"preserve" means not to configure the MAC address when activating the
device. That was actually the default behavior before introducing MAC
address handling with commit 1b49f941a6.
"random" and "random-bia" use a randomized MAC address for each
connection. "stable" and "stable-bia" use a generated, stable
address based on some token. The "bia" suffix says to generate a
burned-in address. The stable method by default uses as token the
connection UUID, but the token can be explicitly choosen via
"stable:<TOKEN>" and "stable-bia:<TOKEN>".
On a D-Bus level, the "cloned-mac-address" is a bytestring and thus
cannot express the new forms. It is replaced by the new
"assigned-mac-address" field. For the GObject property, libnm's API,
nmcli, keyfile, etc. the old name "cloned-mac-address" is still used.
Deprecating the old field seems more complicated then just extending
the use of the existing "cloned-mac-address" field, although the name
doesn't match well with the extended meaning.
There is some overlap with the "wifi.mac-address-randomization" setting.
https://bugzilla.gnome.org/show_bug.cgi?id=705545https://bugzilla.gnome.org/show_bug.cgi?id=708820https://bugzilla.gnome.org/show_bug.cgi?id=758301
VLAN and MACVLAN devices consider an ethernet.mac-address setting
to find the parent device. This setting shall be the permanent MAC
address of the device, not the current.
When the entire NMSettingWired setting is missing, it should be treated
exactly the same as each property having the default/unset value.
Otherwise, adding a NMSettingWired setting only to set (say) MTU,
would result in different behavior. Although effectively the
"cloned-mac-address" shall be in both cases the same.
Instead of accessing the singleton getter nm_settings_get(), obtain
the settings instance from the device instance itself via
nm_device_get_settings().
The device creation can be attempted if the name can be determined. It
alone is doesn't mean that there's a parent device -- the name could
just have been hardcoded in the connection.
NetworkManager[21519]: nm_device_get_ifindex: assertion 'NM_IS_DEVICE (self)' failed
Program received signal SIGTRAP, Trace/breakpoint trap.
g_logv (log_domain=0x5555557fb2e5 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffffffd3d0) at gmessages.c:1046
1046 g_private_set (&g_log_depth, GUINT_TO_POINTER (depth));
(gdb) bt
#0 0x00007ffff4ec88c3 in g_logv (log_domain=0x5555557fb2e5 "NetworkManager", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fffffffd3d0) at gmessages.c:1046
#1 0x00007ffff4ec8a3f in g_log (log_domain=<optimized out>, log_level=<optimized out>, format=<optimized out>) at gmessages.c:1079
#2 0x00005555555d2090 in nm_device_get_ifindex (self=0x0) at devices/nm-device.c:562
#3 0x00005555555ef77a in nm_device_supports_vlans (self=0x0) at devices/nm-device.c:9865
#4 0x00005555555bf2f9 in create_and_realize (device=0x555555c549b0 [NMDeviceVlan], connection=0x555555b451e0, parent=0x0, out_plink=0x7fffffffd5f8, error=0x7fffffffd700) at devices/nm-device-vlan.c:225
#5 0x00005555555d5757 in nm_device_create_and_realize (self=0x555555c549b0 [NMDeviceVlan], connection=0x555555b451e0, parent=0x0, error=0x7fffffffd700) at devices/nm-device.c:1783
#6 0x0000555555688601 in system_create_virtual_device (self=0x555555af51c0 [NMManager], connection=0x555555b451e0) at nm-manager.c:1120
#7 0x000055555568894e in connection_changed (settings=0x555555ae8220 [NMSettings], connection=0x555555b451e0, manager=0x555555af51c0 [NMManager]) at nm-manager.c:1172
#8 0x0000555555693448 in nm_manager_start (self=0x555555af51c0 [NMManager], error=0x7fffffffda30) at nm-manager.c:4466
#9 0x00005555555d166f in main (argc=1, argv=0x7fffffffdba8) at main.c:454
(gdb)
Fixes: 332994f1b1
Don't rely on what's already on the device. It could be that the MAC address
set on the device is not meaningful -- the NM crashed while two devices were
teamed together and now they have the same hardware address and now it's
impossible to bond them with mode=5.
- All internal source files (except "examples", which are not internal)
should include "config.h" first. As also all internal source
files should include "nm-default.h", let "config.h" be included
by "nm-default.h" and include "nm-default.h" as first in every
source file.
We already wanted to include "nm-default.h" before other headers
because it might contains some fixes (like "nm-glib.h" compatibility)
that is required first.
- After including "nm-default.h", we optinally allow for including the
corresponding header file for the source file at hand. The idea
is to ensure that each header file is self contained.
- Don't include "config.h" or "nm-default.h" in any header file
(except "nm-sd-adapt.h"). Public headers anyway must not include
these headers, and internal headers are never included after
"nm-default.h", as of the first previous point.
- Include all internal headers with quotes instead of angle brackets.
In practice it doesn't matter, because in our public headers we must
include other headers with angle brackets. As we use our public
headers also to compile our interal source files, effectively the
result must be the same. Still do it for consistency.
- Except for <config.h> itself. Include it with angle brackets as suggested by
https://www.gnu.org/software/autoconf/manual/autoconf.html#Configuration-Headers