Find a file
Thomas Haller ce0e898fb4 libnm: refactor caching of D-Bus objects in NMClient
No longer use GDBusObjectMangaerClient and gdbus-codegen generated classes
for the NMClient cache. Instead, use GDBusConnection directly and a
custom implementation (NMLDBusObject) for caching D-Bus' ObjectManager
data.

CHANGES
-------

- This is a complete rework. I think the previous implementation was
difficult to understand. There were unfixed bugs and nobody understood
the code well enough to fix them. Maybe somebody out there understood the
code, but I certainly did not. At least nobody provided patches to fix those
issues. I do believe that this implementation is more straightforward and
easier to understand. It removes a lot of layers of code. Whether this claim
of simplicity is true, each reader must decide for himself/herself. Note
that it is still fairly complex.

- There was a lingering performance issue with large number of D-Bus
objects. The patch tries hard that the implementation scales well. Of
course, when we cache N objects that have N-to-M references to other,
we still are fundamentally O(N*M) for runtime and memory consumption (with
M being the number of references between objects). But each part should behave
efficiently and well.

- Play well with GMainContext. libnm code (NMClient) is generally not
thread safe. However, it should work to use multiple instances in
parallel, as long as each access to a NMClient is through the caller's
GMainContext. This follows glib's style and effectively allows to use NMClient
in a multi threaded scenario. This implies to stick to a main context
upon construction and ensure that callbacks are only invoked when
iterating that context. Also, NMClient itself shall never iterate the
caller's context. This also means, libnm must never use g_idle_add() or
g_timeout_add(), as those enqueue sources in the g_main_context_default()
context.

- Get ordering of messages right. All events are consistently enqueued
in a GMainContext and processed strictly in order. For example,
previously "nm-object.c" tried to combine signals and emit them on an
idle handler. That is wrong, signals must be emitted in the right order
and when they happen. Note that when using GInitable's synchronous initialization
to initialize the NMClient instance, NMClient internally still operates fully
asynchronously. In that case NMClient has an internal main context.

- NMClient takes over most of the functionality. When using D-Bus'
ObjectManager interface, one needs to handle basically the entire state
of the D-Bus interface. That cannot be separated well into distinct
parts, and even if you try, you just end up having closely related code
in different source files. Spreading related code does not make it
easier to understand, on the contrary. That means, NMClient is
inherently complex as it contains most of the logic. I think that is
not avoidable, but it's not as bad as it sounds.

- NMClient processes D-Bus messages and state changes in separate steps.
First NMClient unpacks the message (e.g. _dbus_handle_properties_changed()) and
keeps track of the changed data. Then we update the GObject instances
(_dbus_handle_obj_changed_dbus()) without emitting any signals yet. Finally,
we emit all signals and notifications that were collected
(_dbus_handle_changes_commit()). Note that for example during the initial
GetManagedObjects() reply, NMClient receive a large amount of state at once.
But we first apply all the changes to our GObject instances before
emitting any signals. The result is that signals are always emitted in a moment
when the cache is consistent. The unavoidable downside is that when you receive
a property changed signal, possibly many other properties changed
already and more signals are about to be emitted.

- NMDeviceWifi no longer modifies the content of the cache from client side
during poke_wireless_devices_with_rf_status(). The content of the cache
should be determined by D-Bus alone and follow what NetworkManager
service exposes. Local modifications should be avoided.

- This aims to bring no API/ABI change, though it does of course bring
various subtle changes in behavior. Those should be all for the better, but the
goal is not to break any existing clients. This does change internal
(albeit externally visible) API, like dropping NM_OBJECT_DBUS_OBJECT_MANAGER
property and NMObject no longer implementing GInitableIface and GAsyncInitableIface.

- Some uses of gdbus-codegen classes remain in NMVpnPluginOld, NMVpnServicePlugin
and NMSecretAgentOld. These are independent of NMClient/NMObject and
should be reworked separately.

- While we no longer use generated classes from gdbus-codegen, we don't
need more glue code than before. Also before we constructed NMPropertiesInfo and
a had large amount of code to propagate properties from NMDBus* to NMObject.
That got completely reworked, but did not fundamentally change. You still need
about the same effort to create the NMLDBusMetaIface. Not using
generated bindings did not make anything worse (which tells about the
usefulness of generated code, at least in the way it was used).

- NMLDBusMetaIface and other meta data is static and immutable. This
avoids copying them around. Also, macros like NML_DBUS_META_PROPERTY_INIT_U()
have compile time checks to ensure the property types matches. It's pretty hard
to misuse them because it won't compile.

- The meta data now explicitly encodes the expected D-Bus types and
makes sure never to accept wrong data. That would only matter when the
server (accidentally or intentionally) exposes unexpected types on
D-Bus. I don't think that was previously ensured in all cases.
For example, demarshal_generic() only cared about the GObject property
type, it didn't know the expected D-Bus type.

- Previously GDBusObjectManager would sometimes emit warnings (g_log()). Those
probably indicated real bugs. In any case, it prevented us from running CI
with G_DEBUG=fatal-warnings, because there would be just too many
unrelated crashes. Now we log debug messages that can be enabled with
"LIBNM_CLIENT_DEBUG=trace". Some of these messages can also be turned
into g_warning()/g_critical() by setting LIBNM_CLIENT_DEBUG=warning,error.
Together with G_DEBUG=fatal-warnings, this turns them into assertions.
Note that such "assertion failures" might also happen because of a server
bug (or change). Thus these are not common assertions that indicate a bug
in libnm and are thus not armed unless explicitly requested. In our CI we
should now always run with LIBNM_CLIENT_DEBUG=warning,error and
G_DEBUG=fatal-warnings and to catch bugs. Note that currently
NetworkManager has bugs in this regard, so enabling this will result in
assertion failures. That should be fixed first.

- Note that this changes the order in which we emit "notify:devices" and
"device-added" signals. I think it makes the most sense to emit first
"device-removed", then "notify:devices", and finally "device-added"
signals.
This changes behavior for commit 52ae28f6e5 ('libnm: queue
added/removed signals and suppress uninitialized notifications'),
but I don't think that users should actually rely on the order. Still,
the new order makes the most sense to me.

- In NetworkManager, profiles can be invisible to the user by setting
"connection.permissions". Such profiles would be hidden by NMClient's
nm_client_get_connections() and their "connection-added"/"connection-removed"
signals.
Note that NMActiveConnection's nm_active_connection_get_connection()
and NMDevice's nm_device_get_available_connections() still exposes such
hidden NMRemoteConnection instances. This behavior was preserved.

NUMBERS
-------

I compared 3 versions of libnm.

  [1] 962297f908, current tip of nm-1-20 branch
  [2] 4fad8c7c64, current master, immediate parent of this patch
  [3] this patch

All tests were done on Fedora 31, x86_64, gcc 9.2.1-1.fc31.
The libraries were build with

  $ ./contrib/fedora/rpm/build_clean.sh -g -w test -W debug

Note that RPM build already stripped the library.

---

N1) File size of libnm.so.0.1.0 in bytes. There currently seems to be a issue
  on Fedora 31 generating wrong ELF notes. Usually, libnm is smaller but
  in these tests it had large (and bogus) ELF notes. Anyway, the point
  is to show the relative sizes, so it doesn't matter).

  [1] 4075552 (102.7%)
  [2] 3969624 (100.0%)
  [3] 3705208 ( 93.3%)

---

N2) `size /usr/lib64/libnm.so.0.1.0`:

          text             data              bss                dec               hex   filename
  [1]  1314569 (102.0%)   69980 ( 94.8%)   10632 ( 80.4%)   1395181 (101.4%)   1549ed   /usr/lib64/libnm.so.0.1.0
  [2]  1288410 (100.0%)   73796 (100.0%)   13224 (100.0%)   1375430 (100.0%)   14fcc6   /usr/lib64/libnm.so.0.1.0
  [3]  1229066 ( 95.4%)   65248 ( 88.4%)   13400 (101.3%)   1307714 ( 95.1%)   13f442   /usr/lib64/libnm.so.0.1.0

---

N3) Performance test with test-client.py. With checkout of [2], run

```
prepare_checkout() {
    rm -rf /tmp/nm-test && \
    git checkout -B test 4fad8c7c64 && \
    git clean -fdx && \
    ./autogen.sh --prefix=/tmp/nm-test && \
    make -j 5 install && \
    make -j 5 check-local-clients-tests-test-client
}
prepare_test() {
    NM_TEST_REGENERATE=1 NM_TEST_CLIENT_BUILDDIR="/data/src/NetworkManager" NM_TEST_CLIENT_NMCLI_PATH=/usr/bin/nmcli python3 ./clients/tests/test-client.py -v
}
do_test() {
  for i in {1..10}; do
      NM_TEST_CLIENT_BUILDDIR="/data/src/NetworkManager" NM_TEST_CLIENT_NMCLI_PATH=/usr/bin/nmcli python3 ./clients/tests/test-client.py -v || return -1
  done
  echo "done!"
}
prepare_checkout
prepare_test
time do_test
```

  [1]  real 2m14.497s (101.3%)     user 5m26.651s (100.3%)     sys  1m40.453s (101.4%)
  [2]  real 2m12.800s (100.0%)     user 5m25.619s (100.0%)     sys  1m39.065s (100.0%)
  [3]  real 1m54.915s ( 86.5%)     user 4m18.585s ( 79.4%)     sys  1m32.066s ( 92.9%)

---

N4) Performance. Run NetworkManager from build [2] and setup a large number
of profiles (551 profiles and 515 devices, mostly unrealized). This
setup is already at the edge of what NetworkManager currently can
handle. Of course, that is a different issue. Here we just check how
long plain `nmcli` takes on the system.

```
do_cleanup() {
    for UUID in $(nmcli -g NAME,UUID connection show | sed -n 's/^xx-c-.*:\([^:]\+\)$/\1/p'); do
        nmcli connection delete uuid "$UUID"
    done
    for DEVICE in $(nmcli -g DEVICE device status | grep '^xx-i-'); do
        nmcli device delete "$DEVICE"
    done
}

do_setup() {
    do_cleanup
    for i in {1..30}; do
        nmcli connection add type bond autoconnect no con-name xx-c-bond-$i ifname xx-i-bond-$i ipv4.method disabled ipv6.method ignore
        for j in $(seq $i 30); do
            nmcli connection add type vlan autoconnect no con-name xx-c-vlan-$i-$j vlan.id $j ifname xx-i-vlan-$i-$j vlan.parent xx-i-bond-$i  ipv4.method disabled ipv6.method ignore
        done
    done
    systemctl restart NetworkManager.service
    sleep 5
}

do_test() {
    perf stat -r 50 -B nmcli 1>/dev/null
}

do_test
```

  [1]

   Performance counter stats for 'nmcli' (50 runs):

              456.33 msec task-clock:u              #    1.093 CPUs utilized            ( +-  0.44% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
               5,900      page-faults:u             #    0.013 M/sec                    ( +-  0.02% )
       1,408,675,453      cycles:u                  #    3.087 GHz                      ( +-  0.48% )
       1,594,741,060      instructions:u            #    1.13  insn per cycle           ( +-  0.02% )
         368,744,018      branches:u                #  808.061 M/sec                    ( +-  0.02% )
           4,566,058      branch-misses:u           #    1.24% of all branches          ( +-  0.76% )

             0.41761 +- 0.00282 seconds time elapsed  ( +-  0.68% )

  [2]

   Performance counter stats for 'nmcli' (50 runs):

              477.99 msec task-clock:u              #    1.088 CPUs utilized            ( +-  0.36% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
               5,948      page-faults:u             #    0.012 M/sec                    ( +-  0.03% )
       1,471,133,482      cycles:u                  #    3.078 GHz                      ( +-  0.36% )
       1,655,275,369      instructions:u            #    1.13  insn per cycle           ( +-  0.02% )
         382,595,152      branches:u                #  800.433 M/sec                    ( +-  0.02% )
           4,746,070      branch-misses:u           #    1.24% of all branches          ( +-  0.49% )

             0.43923 +- 0.00242 seconds time elapsed  ( +-  0.55% )

  [3]

   Performance counter stats for 'nmcli' (50 runs):

              352.36 msec task-clock:u              #    1.027 CPUs utilized            ( +-  0.32% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
               4,790      page-faults:u             #    0.014 M/sec                    ( +-  0.26% )
       1,092,341,186      cycles:u                  #    3.100 GHz                      ( +-  0.26% )
       1,209,045,283      instructions:u            #    1.11  insn per cycle           ( +-  0.02% )
         281,708,462      branches:u                #  799.499 M/sec                    ( +-  0.01% )
           3,101,031      branch-misses:u           #    1.10% of all branches          ( +-  0.61% )

             0.34296 +- 0.00120 seconds time elapsed  ( +-  0.35% )

---

N5) same setup as N4), but run `PAGER= /bin/time -v nmcli`:

  [1]

        Command being timed: "nmcli"
        User time (seconds): 0.42
        System time (seconds): 0.04
        Percent of CPU this job got: 107%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.43
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 34456
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 6128
        Voluntary context switches: 1298
        Involuntary context switches: 1106
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

  [2]
        Command being timed: "nmcli"
        User time (seconds): 0.44
        System time (seconds): 0.04
        Percent of CPU this job got: 108%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.44
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 34452
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 6169
        Voluntary context switches: 1849
        Involuntary context switches: 142
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

  [3]

        Command being timed: "nmcli"
        User time (seconds): 0.32
        System time (seconds): 0.02
        Percent of CPU this job got: 102%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.34
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 29196
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 5059
        Voluntary context switches: 919
        Involuntary context switches: 685
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

---

N6) same setup as N4), but run `nmcli monitor` and look at `ps aux` for
  the RSS size.

      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  [1] me     1492900 21.0  0.2 461348 33248 pts/10   Sl+  15:02   0:00 nmcli monitor
  [2] me     1490721  5.0  0.2 461496 33548 pts/10   Sl+  15:00   0:00 nmcli monitor
  [3] me     1495801 16.5  0.1 459476 28692 pts/10   Sl+  15:04   0:00 nmcli monitor
2019-11-25 15:08:00 +01:00
clients all: add device carrier flag 2019-11-22 10:18:27 +01:00
contrib contrib/rpm: avoid warning in specfile about tokens after %endif 2019-11-23 17:01:41 +01:00
data build/meson: cleanup configuration_data() for paths 2019-11-22 15:59:31 +01:00
dispatcher build/meson: cleanup configuration_data() for paths 2019-11-22 15:59:31 +01:00
docs libnm: refactor caching of D-Bus objects in NMClient 2019-11-25 15:08:00 +01:00
examples libnm: export interface flags 2019-11-22 10:18:26 +01:00
introspection introspection: deprecate Carrier properties 2019-11-22 10:18:27 +01:00
libnm libnm: refactor caching of D-Bus objects in NMClient 2019-11-25 15:08:00 +01:00
libnm-core shared: add nm_utils_g_main_context_create_integrate_source() for integrating a GMainContext in another 2019-11-25 12:58:33 +01:00
m4 all: drop emacs file variables from source files 2019-06-11 10:04:00 +02:00
man build/meson: cleanup configuration_data() for paths 2019-11-22 15:59:31 +01:00
po libnm: refactor caching of D-Bus objects in NMClient 2019-11-25 15:08:00 +01:00
shared libnm: refactor caching of D-Bus objects in NMClient 2019-11-25 15:08:00 +01:00
src sd: cleanup integrating systemd's event loop with GMainContext 2019-11-25 12:58:33 +01:00
tools build/meson: cleanup "meson-post-install.sh" 2019-11-22 16:07:02 +01:00
vapi all: goodbye libnm-glib 2019-04-16 15:52:27 +02:00
.dir-locals.el misc: add toplevel .dir-locals file that tells Emacs to show trailing whitespace 2013-03-08 15:15:28 +01:00
.gitignore bluetooth/tests: add "nm-bt-test helper" program for manual testing of bluetooth code 2019-09-22 16:05:50 +02:00
.gitlab-ci.yml gitlab-ci: run tests on extra distributions only manually 2019-11-22 13:46:22 +01:00
.mailmap mailmap: update user 2018-10-01 12:02:55 +02:00
.travis.yml build/debian: install mobile-broadband-provider-info 2019-09-11 14:54:34 +02:00
AUTHORS misc: update maintainers and authors 2016-04-21 13:39:03 -05:00
autogen.sh all: goodbye libnm-glib 2019-04-16 15:52:27 +02:00
ChangeLog all: point git references to the GitLab instance 2018-08-27 11:36:56 +02:00
config-extra.h.meson build: remove duplicate and unused RUNDIR define 2019-05-17 21:24:18 +02:00
config-extra.h.mk build: regenerate config-extra.h if configure was re-run with different arguments 2019-09-25 15:55:37 +02:00
config.h.meson build: add PPPD_PATH to config.h.meson 2019-11-01 07:25:26 +01:00
configure.ac release: bump version to 1.21.3-dev 2019-11-03 09:01:30 +01:00
CONTRIBUTING CONTRIBUTING: update comment about requiring LGPL-2.1+ license instead of LGPL-2.0+ 2019-10-01 07:50:52 +02:00
COPYING COPYING: make sure we ship the relevant license texts 2019-09-10 11:10:52 +02:00
COPYING.GFDL COPYING: make sure we ship the relevant license texts 2019-09-10 11:10:52 +02:00
COPYING.LGPL COPYING: make sure we ship the relevant license texts 2019-09-10 11:10:52 +02:00
linker-script-binary.ver iface-helper/build: add linker version script 2016-10-13 21:33:33 +02:00
linker-script-devices.ver devices/build: use one linker-script-devices.ver for all device plugins 2016-10-13 21:36:06 +02:00
linker-script-settings.ver settings/build: add linker version script for settings plugins 2016-10-13 21:33:33 +02:00
MAINTAINERS misc: update maintainers and authors 2016-04-21 13:39:03 -05:00
Makefile.am libnm: refactor caching of D-Bus objects in NMClient 2019-11-25 15:08:00 +01:00
Makefile.examples examples: add examples/python/gi/nm-update2.py example script 2019-07-25 22:02:00 +02:00
Makefile.glib all: drop emacs file variables from source files 2019-06-11 10:04:00 +02:00
Makefile.vapigen build: fix make always re-making vapigen target 2016-10-21 18:46:03 +02:00
meson.build build/meson: cleanup "meson-post-install.sh" 2019-11-22 16:07:02 +01:00
meson_options.txt dhcp: add nettools dhcp4 client 2019-07-05 11:04:32 +02:00
NetworkManager.pc.in build: update NetworkManager.pc 2013-01-29 16:17:30 -05:00
NEWS libnm: refactor caching of D-Bus objects in NMClient 2019-11-25 15:08:00 +01:00
README all: drop empty first line from sources 2019-06-11 10:15:06 +02:00
TODO all: say Wi-Fi instead of "wifi" or "WiFi" 2018-11-29 17:53:35 +01:00
valgrind.suppressions all: goodbye libnm-glib 2019-04-16 15:52:27 +02:00

******************
NetworkManager core daemon has moved to gitlab.freedesktop.org!

git clone https://gitlab.freedesktop.org/NetworkManager/NetworkManager.git
******************


Networking that Just Works
--------------------------

NetworkManager attempts to keep an active network connection available at all
times.  The point of NetworkManager is to make networking configuration and
setup as painless and automatic as possible.  NetworkManager is intended to
replace default route, replace other routes, set IP addresses, and in general
configure networking as NM sees fit (with the possibility of manual override as
necessary).  In effect, the goal of NetworkManager is to make networking Just
Work with a minimum of user hassle, but still allow customization and a high
level of manual network control.  If you have special needs, we'd like to hear
about them, but understand that NetworkManager is not intended for every
use-case.

NetworkManager will attempt to keep every network device in the system up and
active, as long as the device is available for use (has a cable plugged in,
the killswitch isn't turned on, etc).  Network connections can be set to
'autoconnect', meaning that NetworkManager will make that connection active
whenever it and the hardware is available.

"Settings services" store lists of user- or administrator-defined "connections",
which contain all the settings and parameters required to connect to a specific
network.  NetworkManager will _never_ activate a connection that is not in this
list, or that the user has not directed NetworkManager to connect to.


How it works:

The NetworkManager daemon runs as a privileged service (since it must access
and control hardware), but provides a D-Bus interface on the system bus to
allow for fine-grained control of networking.  NetworkManager does not store
connections or settings, it is only the mechanism by which those connections
are selected and activated.

To store pre-defined network connections, two separate services, the "system
settings service" and the "user settings service" store connection information
and provide these to NetworkManager, also via D-Bus.  Each settings service
can determine how and where it persistently stores the connection information;
for example, the GNOME applet stores its configuration in GConf, and the system
settings service stores its config in distro-specific formats, or in a distro-
agnostic format, depending on user/administrator preference.

A variety of other system services are used by NetworkManager to provide
network functionality: wpa_supplicant for wireless connections and 802.1x
wired connections, pppd for PPP and mobile broadband connections, DHCP clients
for dynamic IP addressing, dnsmasq for proxy nameserver and DHCP server
functionality for internet connection sharing, and avahi-autoipd for IPv4
link-local addresses.  Most communication with these daemons occurs, again,
via D-Bus.


Why doesn't my network Just Work?

Driver problems are the #1 cause of why NetworkManager sometimes fails to
connect to wireless networks.  Often, the driver simply doesn't behave in a
consistent manner, or is just plain buggy.  NetworkManager supports _only_
those drivers that are shipped with the upstream Linux kernel, because only
those drivers can be easily fixed and debugged.  ndiswrapper, vendor binary
drivers, or other out-of-tree drivers may or may not work well with
NetworkManager, precisely because they have not been vetted and improved by the
open-source community, and because problems in these drivers usually cannot
be fixed.

Sometimes, command-line tools like 'iwconfig' will work, but NetworkManager will
fail.  This is again often due to buggy drivers, because these drivers simply
aren't expecting the dynamic requests that NetworkManager and wpa_supplicant
make.  Driver bugs should be filed in the bug tracker of the distribution being
run, since often distributions customize their kernel and drivers.

Sometimes, it really is NetworkManager's fault.  If you think that's
the case, please file a bug at:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues

Attaching NetworkManager debug logs from the journal (or wherever your
distribution directs syslog's 'daemon' facility output, as
/var/log/messages or /var/log/daemon.log) is often very helpful, and
(if you can get) a working wpa_supplicant config file helps
enormously.  See the logging section of file
contrib/fedora/rpm/NetworkManager.conf for how to enable debug logging
in NetworkManager.