It's enough that all code paths in impl_ppp_manager_set_ifindex() log exactly
one message. Also, give all messages the same prefix, so that it's clear where
they come from.
(cherry picked from commit 2a45c32e8c)
In src/ppp/nm-pppd-plugin.c, it seems that pppd can invoke
phasechange(PHASE_RUNNING:) multiple times. Hence, the plugin
calls SetIfindex multiple times too. In nm-ppp-manager.c, we
want to make sure that the ifindex does not change after it
was set once. However, calling SetIfindex with the same ifindex
is not something worth warning. Just log a debug message and nothing.
Maybe the plugin should remember that it already set the ifindex,
and avoid multiple D-Bus calls. But it's unclear that that is desired.
For now, just downgrade the warning.
(cherry picked from commit 4a4439835d)
When unplugging an USB 3G modem device, pppd does not exit correctly and
we have the following traces:
Sep 10 07:58:24.616465 ModemManager[1158]: <info> (tty/ttyUSB0): released by device '/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1'
Sep 10 07:58:24.620314 pppd[2292]: Modem hangup
Sep 10 07:58:24.621368 ModemManager[1158]: <info> (tty/ttyUSB1): released by device '/sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/usb4/4-1'
Sep 10 07:58:24.621835 ModemManager[1158]: <warn> (ttyUSB1): could not re-acquire serial port lock: (5) Input/output error
Sep 10 07:58:24.621358 NetworkManager[1871]: <debug> ppp-manager: set-ifindex 4
Sep 10 07:58:24.621369 NetworkManager[1871]: <warn> ppp-manager: can't change the ifindex from 4 to 4
Sep 10 07:58:24.623982 NetworkManager[1871]: <info> device (ttyUSB0): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed')
Sep 10 07:58:24.624411 NetworkManager[1871]: <debug> kill child process 'pppd' (2292): wait for process to terminate after sending SIGTERM (15) (send SIGKILL in 1500 milliseconds)...
Sep 10 07:58:24.624440 NetworkManager[1871]: <debug> modem-broadband[ttyUSB0]: notifying ModemManager about the modem disconnection
Sep 10 07:58:24.626591 NetworkManager[1871]: <debug> modem-broadband[ttyUSB0]: notifying ModemManager about the modem disconnection
Sep 10 07:58:24.681016 NetworkManager[1871]: <warn> modem-broadband[ttyUSB0]: failed to disconnect modem: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface 'org.freedesktop.ModemManager1.Modem.Simple' on object at path /org/freedesktop/ModemManager1/Modem/0
Sep 10 07:58:26.126817 NetworkManager[1871]: <debug> kill child process 'pppd' (2292): process not terminated after 1502368 usec. Sending SIGKILL signal
Sep 10 07:58:26.128121 NetworkManager[1871]: <info> device (ppp0): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Sep 10 07:58:26.135571 NetworkManager[1871]: <debug> kill child process 'pppd' (2292): terminated by signal 9 (1511158 usec elapsed)
This is due to nm-ppp-plugin waiting on SetIfIndex call until timeout,
which is longer than termination process timeout.
Calling g_dbus_method_invocation_return_value() on error fixes this.
Fixes: dd98ada33fhttps://mail.gnome.org/archives/networkmanager-list/2018-September/msg00010.html
(cherry picked from commit e66e4d0e71)
1) the command line gets shorter. I frequently run `make V=1` to see
the command line arguments for the compiler, and there is a lot
of noise.
2) define each of these variables at one place. This makes it easy
to verify that for all compilation units, a particular
define has the same value. Previously that was not obvious or
even not the case (see commit e5d1a71396
and commit d63cf1ef2f).
The point is to avoid redundancy.
3) not all compilation units need all defines. In fact, most modules
would only need a few of these defines. We aimed to pass the necessary
minium of defines to each compilation unit, but that was non-obvious
to get right and often we set a define that wasn't used. See for example
"src_settings_plugins_ibft_cppflags" which needlessly had "-DSYSCONFDIR".
This question is now entirely avoided by just defining all variables in
a header. We don't care to find the minimum, because every component
gets anyway all defines from the header.
4) this also avoids the situation, where a module that previously did
not use a particular define gets modified to require it. Previously,
that would have required to identify the missing define, and add
it to the CFLAGS of the complation unit. Since every compilation
now includes "config-extra.h", all defines are available everywhere.
5) the fact that each define is now available in all compilation units
could be perceived as a downside. But it isn't, because these defines
should have a unique name and one specific value. Defining the same
name with different values, or refer to the same value by different
names is a bug, not a desirable feature. Since these defines should
be unique accross the entire tree, there is no problem in providing
them to every compilation unit.
6) the reason why we generate "config-extra.h" this way, instead of using
AC_DEFINE() in configure.ac, is due to the particular handling of
autoconf for directory variables. See [1].
With meson, it would be trivial to put them into "config.h.meson".
While that is not easy with autoconf, the "config-extra.h" workaround
seems still preferable to me.
[1] https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Installation-Directory-Variables.html
We commonly don't use the glib typedefs for char/short/int/long,
but their C types directly.
$ git grep '\<g\(char\|short\|int\|long\|float\|double\)\>' | wc -l
587
$ git grep '\<\(char\|short\|int\|long\|float\|double\)\>' | wc -l
21114
One could argue that using the glib typedefs is preferable in
public API (of our glib based libnm library) or where it clearly
is related to glib, like during
g_object_set (obj, PROPERTY, (gint) value, NULL);
However, that argument does not seem strong, because in practice we don't
follow that argument today, and seldomly use the glib typedefs.
Also, the style guide for this would be hard to formalize, because
"using them where clearly related to a glib" is a very loose suggestion.
Also note that glib typedefs will always just be typedefs of the
underlying C types. There is no danger of glib changing the meaning
of these typedefs (because that would be a major API break of glib).
A simple style guide is instead: don't use these typedefs.
No manual actions, I only ran the bash script:
FILES=($(git ls-files '*.[hc]'))
sed -i \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>\( [^ ]\)/\1\2/g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\> /\1 /g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>/\1/g' \
"${FILES[@]}"
nm_ppp_manager_stop() wants to ensure that the pppd process is really
gone. For that it uses nm_utils_kill_child_async() to first send
SIGTERM, and sending SIGKILL after a timeout.
Later, we want to fix shutdown of NetworkManager to iterate the mainloop
during shutdown, so that such operations are still handled. However, we
can only delay shutdown for a certain time. After a timeout (NM_SHUTDOWN_TIMEOUT_MS
plus NM_SHUTDOWN_TIMEOUT_MS_GRACE) we really have to give up and
terminate.
That means, the right amount of time between sending SIGTERM and SIGKILL
is exactly NM_SHUTDOWN_TIMEOUT_MS. Hopefully that is of course
sufficient in the first place. If not, send SIGKILL afterwards, and give
a bit more time (NM_SHUTDOWN_TIMEOUT_MS_GRACE) to reap the child.
And if all this time is still not enough, something is really odd and we
abort waiting, with a warning in the logfile.
Since we don't properly handle shutdown yet, the description above is
not really true. But with this patch, we fix it from point of view of
NMPPPManager.
Previously, there were two functions nm_ppp_manager_stop_sync() and
nm_ppp_manager_stop_async().
However, stop-sync() would still kill the process asynchronously (with a
2 seconds timeout before sending SIGKILL).
On the other hand, stop-async() did pretty much the same thing as
sync-code, except also using the GAsyncResult.
Merge the two functions. Stopping the instance for the most part can be
done entirely synchrnous. The only thing that is asynchronous, is
to wait for the process to terminate. For that, add a new callback
argument to nm_ppp_manager_stop(). This replaces the GAsyncResult
pattern.
Also, always ensure that NetworkManager runs the mainloop at least as
long until the process really terminated. Currently we don't get that
right, and during shutdown we just stop iterating the mainloop. However,
fix this from point of view of NMPPPManager and register a wait-object,
that later will correctly delay shutdown.
Also, NMDeviceWwan cared to wait (asynchronously) until pppd really
terminated. Keep that functionality. nm_ppp_manager_stop() returns
a handle that can be used to cancel the asynchrounous request and invoke
the callback right away. However note, that even when cancelling the
request, the wait-object that prevents shutdown of NetworkManager is
kept around, so that we can be sure to properly clean up.
We usually structure our code in a (pseudo) object oriented way.
It makes sense to call the variable for the target object "self",
it is more familiar and usually done.
- add callback arguments to _ppp_kill(). Although most callers don't
care, it makes it more obvious that this kills the process
asynchronously.
- the call to nm_utils_kill_child_async() is complicated, with many
arguments. Only call it from one place, and re-use the simpler wrapper
function _ppp_kill() everywhere.
I dislike the static hash table to cache the integer counter for
numbered paths. Let's instead cache the counter at the class instance
itself -- since the class contains the information how the export
path should be exported.
However, we cannot use a plain integer field inside the class structure,
because the class is copied between derived classes. For example,
NMDeviceEthernet and NMDeviceBridge both get a copy of the NMDeviceClass
instance. Hence, the class doesn't contain the counter directly, but
a pointer to one counter that can be shared between sibling classes.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
When NM knows of the ifindex/name of the new PPP interface (through
the SetIfindex() call), it renames it. This can race with the pppd
daemon, which issues ioctl() using the interface name cached in the
global 'ifname' variable:
...
NetworkManager[27213]: <debug> [1515427406.0036] ppp-manager: set-ifindex 71
pppd[27801]: sent [CCP ConfRej id=0x1 <deflate 15> <deflate(old#) 15> <bsd v1 15>]
NetworkManager[27213]: <debug> [1515427406.0036] platform: link: setting 'ppp5' (71) name dsl-ppp
pppd[27801]: sent [IPCP ConfAck id=0x2 <addr 3.1.1.1>]
pppd[27801]: ioctl(SIOCSIFADDR): No such device (line 2473)
pppd[27801]: Interface configuration failed
pppd[27801]: Couldn't get PPP statistics: No such device
...
Fortunately the variable is exposed to plugins and so we can turn the
SetIfindex() D-Bus call into a synchronous one and then update the
value of the 'ifname' global variable with the new interface name
assigned by NM.
If IPV6CP terminates before IPCP, pppd enters the RUNNING phase and we
start IP configuration without having an IP interface set, which
triggers assertions.
Instead, add a SetIfindex() D-Bus method that gets called by the
plugin when pppd becomes RUNNING. The method sets the IP ifindex of
the device and starts IP configuration.
https://bugzilla.redhat.com/show_bug.cgi?id=1515829
We also unconditionally use them with autotools.
Also, the detection for have_version_script does
not seem correct to me. At least, it didn't work
with clang.
The strings holding the names used for libraries have also been
moved to different variables. This way they would be less error
as these variables can be reused easily and any typing error
would be quickly detected.
Some targets are missing dependencies on some generated sources in
the meson port. These makes the build to fail due to missing source
files on a highly parallelized build.
These dependencies have been resolved by taking advantage of meson's
internal dependencies which can be used to pass source files,
include directories, libraries and compiler flags.
One of such internal dependencies called `core_dep` was already in
use. However, in order to avoid any confusion with another new
internal dependency called `nm_core_dep`, which is used to include
directories and source files from the `libnm-core` directory, the
`core_dep` dependency has been renamed to `nm_dep`.
These changes have allowed minimizing the build details which are
inherited by using those dependencies. The parallelized build has
also been improved.
Note that:
- we compile some source files multiple times. Most notably those
under "shared/".
- we include a default header "shared/nm-default.h" in every source
file. This header is supposed to setup a common environment by defining
and including parts that are commonly used. As we always include the
same header, the header must behave differently depending
one whether the compilation is for libnm-core, NetworkManager or
libnm-glib. E.g. it must include <glib/gi18n.h> or <glib/gi18n-lib.h>
depending on whether we compile a library or an application.
For that, the source files need the NETWORKMANAGER_COMPILATION #define
to behave accordingly.
Extend the define to be composed of flags. These flags are all named
NM_NETWORKMANAGER_COMPILATION_WITH_*, they indicate which part of the
build are available. E.g. when building libnm-core.la itself, then
WITH_LIBNM_CORE, WITH_LIBNM_CORE_INTERNAL, and WITH_LIBNM_CORE_PRIVATE
are available. When building NetworkManager, WITH_LIBNM_CORE_PRIVATE
is not available but the internal parts are still accessible. When
building nmcli, only WITH_LIBNM_CORE (the public part) is available.
This granularily controls the build.
Source files for enum types are generated by passing segments of the
source code of the files to the `glib-mkenums` command.
This patch removes those parameters where source code is used from
meson build files by moving those segmeents to template files.
https://mail.gnome.org/archives/networkmanager-list/2017-December/msg00057.html
Instead of having 3 properties @gateway, @never_default and @has_gateway
on NMIP4Config/NMIP6Config that determine the default-route, track the
default-route as a regular route.
The gateway setting is the configuration knob for the default-route.
Since an NMIP4Config/NMIP6Config instance only has one gateway property,
it cannot track more then one default-routes (see related bug rh#1445417).
Especially with policy routing, it might be interesting to configure a
default-route in multiple tables.
Also, later it might be interesting to allow adding default-routes as
regular static routes in a connection, so that the user can configure additional
route parameters for the default-route or add default-routes in multiple tables.
With this patch, default-routes now have a rt_source property according to their
origin.
Also, the previous commits of this branch broke handling of the
default-route :) . That should be working now again.
We can't tell pppd to create an interface with a given name, so we use
the name generated by kernel and rename the interface afterwards. A
race condition can happen during the rename: NM receives the interface
name from pppd, but in the meantime the interface could be deleted and
another one with that name could appear. In this case we would rename
the wrong interface.
Using a changing unit index, we ensure that interfaces created by NM
don't race with each others. There is still the chance to race with
externally-created interfaces, but I guess this is not easily solvable
since the pppd plugin does not expose the ifindex.
When the specified unit is already in use, the kernel selects another
one.
Since commit 22edeb5b69 ("core: track addresses for
NMIP4Config/NMIP6Config via NMDedupMultiIndex"), addresses can be
added to a IP config only after the ifindex has been set.
Fixes: 22edeb5b69
NMIP4Config, NMIP6Config, and NMPlatform shall share one
NMDedupMultiIndex instance.
For that, pass an NMDedupMultiIndex instance to NMPlatform and NMNetns.
NMNetns than passes it on to NMDevice, NMDhcpClient, NMIP4Config and NMIP6Config.
So currently NMNetns is the access point to the shared NMDedupMultiIndex
instance, and it gets it from it's NMPlatform instance.
The NMDedupMultiIndex instance is really a singleton, we don't want
multiple instances of it. However, for testing, instead of adding a
singleton instance, pass the instance explicitly around.
- don't use assert but be more graceful with g_return_if_fail().
- in case of failure, don't log a debug message after the warning.
One message is sufficient, drop "pppd pid %d cleaned up".
- print GPid type as long long.
- increase log level to warning. pppd dying unexpectedly warrants a
warning.
ppp_exit_code() does too much or too little. Either it should log
about all reasons why pppd exited, including signals, or it should
just do the status to string conversion. Split it.
Let's explicitly unexports on dispose(). Probably that already
happened, because NMExportedObject asserts that it is unexported
during !quitting.
During quitting, we probably don't tear down the manager.
Anyway, we should always unexport.
NM always asks pppd to run IPV6CP which will complete if the modem supports
IPv6. If the user doesn't want IPv6 then NM just ignores the result. But
if the host has disabled IPv6, then pppd will fail to complete the connection
because pppd tries to assign the Link-Local address to the pppX interface,
and if IPv6 is disabled that fails and terminates the PPP session.
So only request IPV6CP when the user wants IPv6 on the connection; if they
have disabled IPv6 on their host then they can simply set ipv6.method=ignore.
https://mail.gnome.org/archives/networkmanager-list/2017-March/msg00047.html
Previously, we would require a @self argument and the @call_id in
nm_act_request_cancel_secrets(), although the @call_id already has
a pointer to @self.
In principle that is not necessary, but it makes the API a bit
more robust as you need to care about the lifetime of the @req
as well.
However it is a bit inconvenient, because it requires that caller to
track both the activation request and the call-id.
Now, allow nm_act_request_get_secrets() to instruct the call-id to
take an additional reference to @self. Later on, we would allow to omit
the argument during cancelling. We only allow this, if the call-id
takes a reference to @self.