linklocal6_complete() had only one caller. The caller would check
whether the conditions for linklocal6_complete() are satisfied, and
then call it. Note that linklocal6_complete() would again assert
that these conditions hold. Don't do this. Just move the check
inside linklocal6_complete(), and rename to linklocal6_check_complete().
Also, linklocal6_complete() was called by update_ip_config(),
which was called by nm_device_capture_initial_config() and
queued_ip6_config_change().
It doesn't make sense to call linklocal6_complete() during
nm_device_capture_initial_config(). Move the call to
queued_ip6_config_change().
Likewise, in ndisc_ra_timeout() we also want to include tentative
addresses. Looking into priv->ip6_config to determine whether
an other IP configuration is active, is anyway odd, and likely
a bug.
Instead have one particular nm_ip6_config_get_address_first_nontentative() function,
make it more extendable. Now, we pass a match-type argument, which can control which
element to search.
This patch has no change in behavior, but it already makes clear, that
nm_ip6_config_get_address_first_nontentative() was buggy, because it would
also return addresses that failed DAD.
Don't do "if (var == FALSE)" for boolean variables.
Also, make booleans in NMDevicePrivate structure bitfields
and reorder the fields beside other bitfields. This allows
a tighter packing of the structure.
addrconf6_start_with_link_ready() cannot fail. Hence, don' return a boolean
success value.
linklocal6_start() can only either POSTPONE or succeed right away. Don't return
a NMActStageReturn value, TRUE/FALSE is enough.
This simplifies the callers that don't have to check for values that never
come.
connections_cached_list stays only valid until we remove/add connections
to NMSettings. Using the list without cloning requires to be aware of that.
When clearing the list, invalidate all pointers, in the hope that a following
use-after-free will blow up with an assertion.
We only do this in elevated assertion mode. It's not to prevent any bugs,
it's to better notice it.
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
We create the all_connections list in impl_manager_add_and_activate_connection().
With this list we either call nm_utils_complete_generic() directly,
or pass it to nm_device_complete_connection(). The latter also ends
up only calling nm_utils_complete_generic() with this argument.
nm_utils_complete_generic() doesn't care about the order in which
the connections are returned. Which also becomes apparent, because
we first sorted the list, and then reverted the order with
g_slist_prepend().
Do not sort the list, hence we also don't need to clone it.
nm_modem_complete_connection() cannot just return FALSE in case
the modem doesn't overwrite complete_connection(). It must set
the error variable.
This leads to a crash when calling AddAndActivate for Ofono type
modem. It does not affect the ModemManager implementation
NMModemBroadband, because that one implements the method.
Kernel recently got support for exposing TUN/TAP information on netlink
[1], [2], [3]. Add support for it to the platform cache.
The advantage of using netlink is that querying sysctl bypasses the
order of events of the netlink socket. It is out of sync and racy. For
example, platform cache might still think that a tun device exists, but
a subsequent lookup at sysfs might fail because the device was deleted
in the meantime. Another point is, that we don't get change
notifications via sysctl and that it requires various extra syscalls
to read the device information. If the tun information is present on
netlink, put it into the cache. This bypasses checking sysctl while
we keep looking at sysctl for backward compatibility until we require
support from kernel.
Notes:
- we had two link types NM_LINK_TYPE_TAP and NM_LINK_TYPE_TUN. This
deviates from the model of how kernel treats TUN/TAP devices, which
makes it more complicated. The link type of a NMPlatformLink instance
should match what kernel thinks about the device. Point in case,
when parsing RTM_NETLINK messages, we very early need to determine
the link type (_linktype_get_type()). However, to determine the
type of a TUN/TAP at that point, we need to look into nested
netlink attributes which in turn depend on the type (IFLA_INFO_KIND
and IFLA_INFO_DATA), or even worse, we would need to look into
sysctl for older kernel vesions. Now, the TUN/TAP type is a property
of the link type NM_LINK_TYPE_TUN, instead of determining two
different link types.
- various parts of the API (both kernel's sysctl vs. netlink) and
NMDeviceTun vs. NMSettingTun disagree whether the PI is positive
(NM_SETTING_TUN_PI, IFLA_TUN_PI, NMPlatformLnkTun.pi) or inverted
(NM_DEVICE_TUN_NO_PI, IFF_NO_PI). There is no consistent way,
but prefer the positive form for internal API at NMPlatformLnkTun.pi.
- previously NMDeviceTun.mode could not change after initializing
the object. Allow for that to happen, because forcing some properties
that are reported by kernel to not change is wrong, in case they
might change. Of course, in practice kernel doesn't allow the device
to ever change its type, but the type property of the NMDeviceTun
should not make that assumption, because, if it actually changes, what
would it mean?
- note that as of now, new netlink API is not yet merged to mainline Linus
tree. Shortcut _parse_lnk_tun() to not accidentally use unstable API
for now.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1277457
[2] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=1ec010e705934c8acbe7dbf31afc81e60e3d828b
[3] https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=118eda77d6602616bc523a17ee45171e879d1818https://bugzilla.redhat.com/show_bug.cgi?id=1547213https://github.com/NetworkManager/NetworkManager/pull/77
The wireless mode property can be unset (NULL), in which case it
assumed to be equivalent to "infrastructure". If we convert an unset
mode to infrastructure, the connection will change on write,
triggering errors like:
settings-connection[...]: write: successfully updated (ifcfg-rh: persist (null)), connection was modified in the process
device (wlp4s0): Activation: (wifi) access point 'test1' has security, but secrets are required.
device (wlp4s0): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed')
device (wlp4s0): The connection was modified since activation
device (wlp4s0): state change: need-auth -> failed (reason 'no-secrets', sys-iface-state: 'managed')
To fix this, remove the MODE key when the mode is unset so that the
property is read back exactly as it was. Note that initscripts need
the MODE set, but in most cases there are other issues that make Wi-Fi
connection written by NM not compatible with initscripts.
https://bugzilla.redhat.com/show_bug.cgi?id=1549972
With g_clear_pointer(pptr, g_free), pptr is cast to a non-const pointer,
and hence there is no compiler warning about calling g_free() in a const
pointer. However, it still feels ugly to pass a const pointer to
g_clear_pointer(). We should either add an explicity cast, or just
make the pointer non-const.
I guess part of the problem is that C's "const char *" means that the
string itself is immutable, but also that it cannot be freed. We most
often want a different semantic: the string itself is immutable after
being initialized once, but the memory itself can and will be freed.
Such a notion of immutable C strings cannot be expressed.
For that, just remove the "const" from the declarations. Although we
don't want to modify the (content of the) string, it's still more a
mutable string.
Only in _vt_cmd_obj_copy_lnk_vlan(), add an explicity cast but keep the
type as const. The reason is, that we really want that NMPObject
instances are immutable (in the sense that they don't modify while
existing), but that doesn't mean the memory cannot be freed.
The g_clear_pointer() macro already contains a cast to GDestroyNotify. No
need to do it ourself. In fact, with the cast, this only works with the
particular g_clear_pointer() implementation, that first assigns the
destroy function to a local variable.
See-also: https://bugzilla.gnome.org/show_bug.cgi?id=674634#c52
We already do conditional build with "#if WITH_CONCHECK".
Get rid of the conditional in the makefile and instead do
conditional compilating inside the source file "nm-connectivity.c".
The advantage is, now if you want to know which parts are build,
you only need to grep for the WITH_CONCHECK preprocessor define
instead of also caring about the conditional in Makefile.am and
meson.build.
It doesn't change the fact of conditional compilation. But it
consistently uses one mechanism to achieve it.
nm_connectivity_state_to_string() is entirely independent of the GObject implementation
of NMConnectivity. Move it to the beginning of the source file. It will be useful next
because we will *always* build "nm-connectivity.c" source file, but disable various
parts with #if. Hence, move the part that should always be build to the top.
We use a private D-Bus socket for example for DHCP clients to report back
at unix:path=/var/run/NetworkManager/private-dhcp.
By default, gdbus will enable the authentication mechanisms EXTERNAL
and DBUS_COOKIE_SHA1. However, DBUS_COOKIE_SHA1 requires a /root/.dbus-keyrings
directory, which is not available to NetworkManager as it is started with
ProtectHome=read-only. And writing to /root would be a bad idea anyway.
This leads to a warning
NetworkManager[10962]: Error adding entry to keyring: Error creating directory “/root/.dbus-keyrings”: Read-only file system
Disable all but the EXTERNAL mechanism.
See-also: https://dbus.freedesktop.org/doc/dbus-specification.html#auth-mechanismshttps://bugzilla.gnome.org/show_bug.cgi?id=793116https://github.com/NetworkManager/NetworkManager/pull/79
The documentation for the ipv4.dhcp-client-id property says:
If the property is not a hex string it is considered as a
non-hardware-address client ID and the 'type' field is set to 0.
However, currently we set the client-id without the leading zero byte
in the dhclient configuration and thus dhclient sends the first string
character as type and the remainder as client-id content. Looking
through git history, the dhclient plugin has always behaved this way
even if the intent was clearly that string client-id had to be zero
padded (this is evident by looking at
nm_dhcp_utils_client_id_string_to_bytes()). The internal plugin
instead sends the correct client-id with zero type.
Change the dhclient plugin to honor the documented behavior and add
the leading zero byte when the client-id is a string.
This commit introduces a change in behavior for users that have
dhcp=dhclient and have a plain string (not hexadecimal) set in
ipv4.dhcp-client-id, as NM will send a different client-id possibly
changing the IP address returned by the server.
https://bugzilla.gnome.org/show_bug.cgi?id=793957
The test tries to do IPv4 DAD. That necessarily involves waiting
for a timeout. Since the NMArpingManager spawns arping processes,
the precise timings depend on the load of the machine and may be
large in some cases.
Usually, our test would run fast to successful completion.
However, sometimes, it can take several hundered milliseconds.
Instead of increasing the timeout to a large value (which would
needlessly extend the run time of our tests in the common cases),
try first with a reasonably short timeout. A timeout which commonly
results in success. If the test with the short timeout fails, just
try again with an excessively large timeout.
This saves about 400 msec for the common case, but extends the
races that we saw where not even 250 msec of wait time were
sufficient.
Previously we would kill the client when the lease expired and we
restarted it 3 times at 2 minutes intervals before failing the
connection. If the client is killed after it received a NACK from the
server, it doesn't have the chance to delete the lease file and the
next time it is started it will request the same lease again.
Also, the previous restart logic is a bit convoluted.
Since clients already know how to deal with NACKs, let them continue
for a grace period after the expiry. When the grace period ends, we
fail the method and this can either fail the whole connection or keep
it active depending on the may-fail configuration.
https://bugzilla.gnome.org/show_bug.cgi?id=783391
I dislike the static hash table to cache the integer counter for
numbered paths. Let's instead cache the counter at the class instance
itself -- since the class contains the information how the export
path should be exported.
However, we cannot use a plain integer field inside the class structure,
because the class is copied between derived classes. For example,
NMDeviceEthernet and NMDeviceBridge both get a copy of the NMDeviceClass
instance. Hence, the class doesn't contain the counter directly, but
a pointer to one counter that can be shared between sibling classes.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The next commit will completely rework NMBusManager and replace
NMExportedObject by a new type NMDBusObject.
Originally, NMDBusObject was added along NMExportedObject to ease
the rework and have compilable, intermediate stages of refactoring. Now,
I think the new name is better, because NMDBusObject is very strongly related
to the bus manager and the old name NMExportedObject didn't make that
clear.
I also slighly prefer the name NMDBusObject over NMBusObject, hence
for consistancy, also rename NMBusManager to NMDBusManager.
This commit only renames the file for a nicer diff in the next commit.
It does not actually update the type name in sources. That will be done
later.
The change doesn't really make a difference. I thought it would, so I
did it. But turns out (as the code correctly assumes), while the
notifications are frozen, it's OK to leave the property still in an
inconsistent state while emitting the notify signal.
Still, it feels slightly more correct this way, so keep the change.
The notify() signal is not emitted while the object properties are
blocked via g_object_freeze_notify(). That makes is unsuitable to
emit a notification for "peer" property whenver the device's "parent"
property changes. Because especially with freeze/thaw, we want to emit
both signals in the same batch, not first emit change signals for "parent",
and then in a second run the signals for "peer".
Use the existing parent_changed_notify() virtual function instead.
The generated code is really just a thin wrapper around direct
GDBusProxy calls. GDBusProxy is reasonably convenient to use directly,
drop this wrapper.
We also don't use a generated wrapper for other cases where
NetworkManager acts as a D-Bus client. There is no reason to
do it in this case.
While the nmdbus_*() functions that we were using are small wrappers,
we also created a NMDBusSecretAgent instance, and hence several other
functions and symbols are used as well. It's unnecessary.
This saves 8552 bytes for NetworkManager binary (2817944 vs. 2809392
bytes for contrib/rpm on x86_64).
On exit during NMManager's dispose(), we must fist remove active connections
via active_connection_remove(), before clearing the volatile-connection-list.
Otheriwise, while deleting the active connection, we schedule a idle action
to delete the volatile connection on idle, but at that point the dispose()
already cleaned up the idle list.
==3150== 72 (24 direct, 48 indirect) bytes in 1 blocks are definitely lost in loss record 3,411 of 6,079
==3150== at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
==3150== by 0x6AB7358: g_malloc (gmem.c:94)
==3150== by 0x6ACEF35: g_slice_alloc (gslice.c:1025)
==3150== by 0x1686B1: connection_flags_changed (nm-manager.c:1823)
==3150== by 0x661F73C: g_closure_invoke (gclosure.c:804)
==3150== by 0x66324DD: signal_emit_unlocked_R (gsignal.c:3635)
==3150== by 0x663AD04: g_signal_emit_valist (gsignal.c:3391)
==3150== by 0x663B66E: g_signal_emit (gsignal.c:3447)
==3150== by 0x2EC753: connection_flags_changed (nm-settings.c:824)
==3150== by 0x661F73C: g_closure_invoke (gclosure.c:804)
==3150== by 0x66324DD: signal_emit_unlocked_R (gsignal.c:3635)
==3150== by 0x663AD04: g_signal_emit_valist (gsignal.c:3391)
==3150== by 0x663B66E: g_signal_emit (gsignal.c:3447)
==3150== by 0x6623C03: g_object_dispatch_properties_changed (gobject.c:1080)
==3150== by 0x1DFD47: dispatch_properties_changed (nm-dbus-object.c:237)
==3150== by 0x6626178: g_object_notify_by_spec_internal (gobject.c:1173)
==3150== by 0x6626178: g_object_notify_by_pspec (gobject.c:1283)
==3150== by 0x2E7377: _notify (nm-settings-connection.c:53)
==3150== by 0x2E7377: nm_settings_connection_set_flags_full (nm-settings-connection.c:2346)
==3150== by 0x2E744D: nm_settings_connection_set_flags (nm-settings-connection.c:2316)
==3150== by 0x2E7466: set_visible (nm-settings-connection.c:316)
==3150== by 0x2E7774: nm_settings_connection_delete (nm-settings-connection.c:795)
==3150== by 0x1665A8: _delete_volatile_connection_do (nm-manager.c:598)
==3150== by 0x1668F4: active_connection_remove (nm-manager.c:625)
==3150== by 0x16ABA7: dispose (nm-manager.c:6726)
==3150== by 0x6624607: g_object_unref (gobject.c:3293)
==3150== by 0x1D779B: _nm_singleton_instance_destroy (nm-core-utils.c:138)
==3150== by 0x4011332: _dl_fini (in /usr/lib64/ld-2.26.so)
==3150== by 0x815FB57: __run_exit_handlers (in /usr/lib64/libc-2.26.so)
==3150== by 0x815FBA9: exit (in /usr/lib64/libc-2.26.so)
==3150== by 0x1383C7: main (main.c:467)
NMTST_ASSERT_PLATFORM_NETNS_CURRENT() already checks that the current namespace
is correct. Remove the duplicate assertion.
Also, NMP_CACHE_OPS_UNCHANGED is numerically identical to NM_PLATFORM_SIGNAL_NONE.
Use it in the assertion.
Return the extended ack message from the WaitForNlResponse delayed
action so that the caller can print a detailed reason with the
appropriate logging level.
From v4.12 the kernel appends some attributes to netlink acks
containing a textual description of the error and other fields (see
commit [1]). Parse those attributes and print the error message.
Examples:
platform-linux: netlink: recvmsg: error message from kernel: Network is unreachable (101) "Nexthop has invalid gateway" for request 12
platform-linux: netlink: recvmsg: error message from kernel: Invalid argument (22) "Local address cannot be multicast" for request 21
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2d4bc93368f5a0ddb57c8c885cdad9c9b7a10ed5
NAP connections are a bit special, in that they also have a [bridge]
setting, but their connection.type is "bluetooth".
The canonical way to check whether a bluetooth connection is of NAP type
is by calling _nm_connection_get_setting_bluetooth_for_nap().
So, instead of checking for bluetooth.type "pan" or "dun", check the
opposite and whether the connection is of NAP type. In practice it's the
same, but let'check for NAP consistently via get_setting_bluetooth_for_nap().
Bluetooth tethering using DUN or PANU is a common way to expose a
metered 3G or 4G connection from a phone to a laptop. We deliberately
ignore NAP connections, which is where we’re sharing internet from the
laptop to another device.
We could also set GUESS_YES for WiMAX connections, but NetworkManager
doesn’t support them any more. Add a comment about that.
Signed-off-by: Philip Withnall <withnall@endlessm.com>
https://bugzilla.gnome.org/show_bug.cgi?id=794120
The condition was obviosly inverted, blocking autoconnect when
it should not, and not blocking it when it should.
[thaller@redhat.com: modified original patch and rewrite commit message]
Fixes: e2c8ef45achttps://bugzilla.gnome.org/show_bug.cgi?id=794014