Note the special error codes NM_UTILS_ERROR_CONNECTION_AVAILABLE_*.
This will be used to determine, whether the profile is fundamentally
incompatible with the device, or whether just some other properties
mismatch. That information will be importand during a plain `nmcli
connection up`, where NetworkManager searches all devices for a device
to activate. If no device is found (and multiple errors happened),
we want to show the error that is most likely relevant for the user.
Also note, how NMDevice's check_connection_compatible() uses the new
class field "device_class->connection_type_check_compatible" to simplify
checks for compatible profiles.
The error reason is still unused.
It seems to me the NM_DEVICE_CLASS_DECLARE_TYPES() macro confuses more
than helping. Let's explicitly initialize the two fields, albeit with
another helper macro NM_DEVICE_DEFINE_LINK_TYPES() to get the list of
link-types right.
For consistency, also leave nop-lines like
device_class->connection_type_supported = NULL;
device_class->link_types = NM_DEVICE_DEFINE_LINK_TYPES ();
because all NMDevice class init methods should have this same
boiler plate code and to make it explicit that this is intended.
And there are only 3 occurences where this actually comes into play.
The majority of device implementations name their parent-class variable
"device_class". That also makes more sense as it is more consistant.
E.g. "parent" sounds like it's the direct parent, but that is not
the crucial point here. The crucial point at this place, is that we
access the NMDeviceClass typed pointer. Rename.
We commonly don't use the glib typedefs for char/short/int/long,
but their C types directly.
$ git grep '\<g\(char\|short\|int\|long\|float\|double\)\>' | wc -l
587
$ git grep '\<\(char\|short\|int\|long\|float\|double\)\>' | wc -l
21114
One could argue that using the glib typedefs is preferable in
public API (of our glib based libnm library) or where it clearly
is related to glib, like during
g_object_set (obj, PROPERTY, (gint) value, NULL);
However, that argument does not seem strong, because in practice we don't
follow that argument today, and seldomly use the glib typedefs.
Also, the style guide for this would be hard to formalize, because
"using them where clearly related to a glib" is a very loose suggestion.
Also note that glib typedefs will always just be typedefs of the
underlying C types. There is no danger of glib changing the meaning
of these typedefs (because that would be a major API break of glib).
A simple style guide is instead: don't use these typedefs.
No manual actions, I only ran the bash script:
FILES=($(git ls-files '*.[hc]'))
sed -i \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>\( [^ ]\)/\1\2/g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\> /\1 /g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>/\1/g' \
"${FILES[@]}"
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
We also unconditionally use them with autotools.
Also, the detection for have_version_script does
not seem correct to me. At least, it didn't work
with clang.
Some targets are missing dependencies on some generated sources in
the meson port. These makes the build to fail due to missing source
files on a highly parallelized build.
These dependencies have been resolved by taking advantage of meson's
internal dependencies which can be used to pass source files,
include directories, libraries and compiler flags.
One of such internal dependencies called `core_dep` was already in
use. However, in order to avoid any confusion with another new
internal dependency called `nm_core_dep`, which is used to include
directories and source files from the `libnm-core` directory, the
`core_dep` dependency has been renamed to `nm_dep`.
These changes have allowed minimizing the build details which are
inherited by using those dependencies. The parallelized build has
also been improved.
teamd uses a PID file to guarantee a single instance is running for
each device. If we spawn a new teamd process without waiting the
termination of the existing one, the new process can fail:
<debug> [1486191713.2530] kill child process 'teamd' (2676): wait for process to terminate after sending SIGTERM (15) (send SIGKILL in 2000 milliseconds)...
...
<debug> [1486191713.2539] device[0x7f737f5d7c40] (team1): running: /usr/bin/teamd -o -n -U -D -N -t team1 -c {"runner": {"name": "activebackup"}} -gg
Using team device "team1".
Using PID file "/var/run/teamd/team1.pid"
This program is not intended to be run as root.
Daemon already running on PID 2676.
Failed: File exists
To avoid this, keep track that a kill is in progress and postpone the
start of teamd.
https://bugzilla.redhat.com/show_bug.cgi?id=1415641
Change the output of nm_platform_error_to_string() to print the numeric value.
Also, accept a string buffer instead of using an alloca() allocated buffer.
There is still a macro to provide the previous functionality, but it
was ill-suited to call from inside a loop.
Previously, master device types like bridge, bond, and team
would overwrite is_available() and check_connection_available()
and always return TRUE.
The device already expresses via nm_device_is_master() that it
is of a master kind. Refactor the code, so, instead of having these
device types overwrite is_available() and check_connection_available(),
let the parents implementation react on nm_device_is_master().
There is no change in behavior at all. Instead, the knowledge how to
treat a master device moves from the device implementation to the
parent class.
Teamd is not happy about them and would fail anyway. Worse even, if we
json_loads() such a JSON, which is precisely what happens when we inject the
"hwaddr" key, we turn bad JSON into a good one in a lossy matter. Not good.
https://bugzilla.redhat.com/show_bug.cgi?id=1455130
Reduce the use of NM_PLATFORM_GET / nm_platform_get() to get
the platform singleton instance.
For one, this is a step towards supporting namespaces, where we need
to use different NMNetns/NMPlatform instances depending on in which
namespace the device lives.
Also, we should reduce our use of singletons. They are difficult to
coordinate on shutdown. Instead there should be a clear order of
dependencies, expressed by owning a reference to those singelton
instances. We already own a reference to the platform singelton,
so use it and avoid NM_PLATFORM_GET.
(cherry picked from commit 94d9ee129d)
When deciding whether to touch a device we sometimes look at whether
the active connection is external/assumed. In many cases however,
there is no active connection around (e.g. while moving the device
from state unmanaged to disconnected before assuming).
So in most cases we instead look at the device-state-reason to decide
whether to touch the interface (see nm_device_state_reason_check()).
Often it's desirable to have no state and passing data as function
arguments. However, the state reason has to be passed along several hops
(e.g. a queued state change). Or a change to a master/slave can affect
the slave/master, where we pass on the state reason. Or an intermediate
event might invalidate a previous state reason. Passing the state
whether to touch a device or not as a state-reason is cumbersome
and limited.
Instead, the device should be aware of whats going on. Add a
sys-iface-state with:
- SYS_IFACE_STATE_EXTERNAL: meaning, NM should not touch it
- SYS_IFACE_STATE_ASSUME: meaning, NM is gracefully taking over
- SYS_IFACE_STATE_MANAGED: meaning, the device is managed by NM
- SYS_IFACE_STATE_REMOVED: the device no longer exists
This replaces most checks of nm_device_state_reason_check() and
nm_active_connection_get_activation_type() by instead looking at
the sys-iface-state of the device.
This patch probably has still issues, but the previous behavior was
not very clear either. We will need to identify those issues in future
tests and tweak the behavior. At least, now there is one flag that
describes how to behave.
nm_device_uses_assumed_connection() basically called
nm_active_connection_get_assumed() on the device.
Rename those functions to be closer to the activation-type
flags.
The concepts of "assume", "external", and "assume_or_external"
will make sense with the following commits.
The only reliable way of setting a MAC address for the team is through
the "hwaddr" property in the configuration passed to teamd. In order
to rewrite the configuration we need Jansson support; since it is
already a requirement for teamd, let the team plugin depend on it.
This argument is only relevant when the NMActStageReturn argument
indicates NM_ACT_STAGE_RETURN_FAILURE. In all other cases it is ignored.
Rename the argument to make the meaning clearer. The argument is passed
through several layers of code, it isn't obvious that this argument only
matters for the failure case. Also, the distinct name makes it easier
to distinguish from other uses of the "reason" name.
While at it, do some drive-by cleanup:
- use g_return_*() instead of g_assert() to have a more graceful
assertion.
- functions like dhcp4_start() don't need to return a failure reason.
Most callers don't care, and the caller who does can determine the
proper reason.
- allow omitting the out-argument via NM_SET_OUT().
Instead of overwriting ip4_config_pre_commit(), add a new function
get_mtu().
This also adds a default value in case there is no user-configuration.
This will allow us later to reset a default MTU based on the device
type.
This makes it easier to install the files with proper names.
Also, it makes the makefile rules slightly simpler.
Lastly, the documentation is now generated into docs/api, which makes it
possible to get rid of the awkward relative file names in docbook.
Keep the include paths clean and separate. We use directories to group source
files together. That makes sense (I guess), but then we should use this
grouping also when including files. Thus require to #include files with their
path relative to "src/".
Also, we build various artifacts from the "src/" tree. Instead of having
individual CFLAGS for each artifact in Makefile.am, the CFLAGS should be
unified. Previously, the CFLAGS for each artifact differ and are inconsistent
in which paths they add to the search path. Fix the inconsistency by just
don't add the paths at all.
With systemd < 219, restarting the journald service closes the stdout
and stderr streams associated with services.
The NM process has SIGPIPE ignored, but g_spawn_sync()/g_spawn_async()
re-enable it and so any child executed with those functions will
terminate by default if it tries to log anything to stdout/stderr.
The teamd instance launched by NM is affected by this problem since it
writes debug messages before actually ignoring SIGPIPE.
To fix this, use the @child_setup callback of g_spawn() to ignore
again SIGPIPE in the child process.
https://bugzilla.redhat.com/show_bug.cgi?id=1393853
We only needed proper glib enum types for having properties
and signal arguments. These got all converted to plain int,
so no longer generate such an enum type.
Had to rename "nm-enum-types.h" because it works badly with
"libnm/nm-enum-types.h". Maybe I could fix that differently,
but duplicate names is anyway error prone.
Note that "nm-core-enum-types.h" is already taken too, so
"nm-src-enum-types.h" it is.
We should enable tests by default, probably we even should drop
the configure flags to enable tests and just always build them.
Anyway, at this point there is no use in guarding check-local
with a check for ENABLE_TESTS. A user who does't want to run
the tests, should just not call `make check`.
An interface would make sense to allow the actual device-factory to inherit
from another type.
However, glib interfaces make code much harder to follow and less
efficient. The device factory shall be a very simple type with meta data
about supported device types and the ability to create device instances.
There is no need to make this an interface implementation, instead just
let the factories inherit from NM_TYPE_DEVICE_FACTORY directly.
First of all, G_LOG_DOMAIN only matters when using g_log() directly.
Inside core, we always want to log via nm-logging. Every call to a
g_log() is a bug in the first place (like a failed assertion that logs
a g_critical() during g_return_if_fail()).
So, for all practic purposes, the logging domain is not used.
For nm-logging, the G_LOG_DOMAIN has no effect. Unless we find a proper
use of this domain, G_LOG_DOMAIN should not differ from what the rest of
core.
- use _NM_GET_PRIVATE() and _NM_GET_PRIVATE_PTR() everywhere.
- reorder statements, to have GObject related functions (init, dispose,
constructed) at the bottom of each file and in a consistent order w.r.t.
each other.
- unify whitespaces in signal and properties declarations.
- use NM_GOBJECT_PROPERTIES_DEFINE() and _notify()
- drop unused signal slots in class structures
- drop unused header files for device factories
- don't include "nm-default.h" in header files. Every source file must
include as first header "nm-default.h", thus our headers get the
default include already implicitly.
- we don't support compiling NetworkManager itself with a C++ compiler. Remove
G_BEGIN_DECLS/G_END_DECLS from internal headers. We do however support
users of libnm to use C++, thus they stay in public headers.
(cherry picked from commit f19aff8909)
Unfortunately teamd doesn't have an asynchronous way to notify a
change in the actual configuration, so when a port is enslaved or
released we wait some time for the changes to take effect and read the
configuration again.
https://bugzilla.redhat.com/show_bug.cgi?id=1310435
Request the actual configuration when reading it from teamd. The
actual configuration, differently from the normal one, doesn't contain
non-active team ports.
If the read of teamd configuration failed (possibly due to a timeout),
fail the connection immediately where possible instead of letting it
continue and risk to block again at the next read.
https://bugzilla.redhat.com/show_bug.cgi?id=1257237
When a device is activated any existing teamd instance is killed. But
since commit 28274495d6 ("device/team: always try to connect to
teamd in update_connection()") the disappearing of the D-Bus name
owner always triggers an automatic restart of the instance in
teamd_dbus_vanished() if the name was previously owned. This new
instance conflicts with the instance we're about to start.
Instead, restore the previous behavior of restarting teamd only if
there is an activation in progress and use @tdc as a flag. This also
means that update_connection() should not modify the value of @tdc.
Fixes: 28274495d6