Previously, if we passed ra_timeout 0 to NMNDisc, then it would
calculate the effective timeout based on the router-solicitations
and the router-solicitation-interval.
The caller may want to know the used timeout, to also run its own timers
with the same timeout. Hence, it cannot leave this automatism internal
to NMNDisc.
It is solely computed from the lease information (the GHashTable).
No need to pass it along as separate argument in NM_DHCP_CLIENT_SIGNAL_STATE_CHANGED,
especially since it only applies to IPv6.
When using VRF devices we must pre-generate dependent local
routes in the VRF's table otherwise they will be incorrectly added
to the local table instead.
https://bugzilla.redhat.com/show_bug.cgi?id=1857133
Fixes: a199cd2a7d ('core: add dependent local routes configured by kernel')
g_clear_pointer() would always cast the destroy notify function
pointer to GDestroyNotify. That means, it lost some type safety, like
GPtrArray *ptr_arr = ...
g_clear_pointer (&ptr_arr, g_array_unref);
Since glib 2.58 ([1]), g_clear_pointer() is also more type safe. But
this is not used by NetworkManager, because we don't set
GLIB_VERSION_MIN_REQUIRED to 2.58.
[1] f9a9902aac
We have nm_clear_pointer() to avoid this issue for a long time (pre
1.12.0). Possibly we should redefine in our source tree g_clear_pointer()
as nm_clear_pointer(). However, I don't like to patch glib functions
with our own variant. Arguably, we do patch g_clear_error() in
such a manner. But there the point is to make the function inlinable.
Also, nm_clear_pointer() returns a boolean that indicates whether
anything was cleared. That is sometimes useful. I think we should
just consistently use nm_clear_pointer() instead, which does always
the preferable thing.
Replace:
sed 's/\<g_clear_pointer *(\([^;]*\), *\([a-z_A-Z0-9]\+\) *)/nm_clear_pointer (\1, \2)/g' $(git grep -l g_clear_pointer) -i
I think it's preferable to use nm_clear_g_free() instead of
g_clear_pointer(, g_free). The reasons are not very strong,
but I think it is overall preferable to have a shorthand for this
frequently used functionality.
sed 's/\<g_clear_pointer *(\([^;]*\), *\(g_free\) *)/nm_clear_g_free (\1)/g' $(git grep -l g_clear_pointer) -i
We will add a property NM_NDISC_RA_TIMEOUT for which this name is better
suited. The problem is really that our convention for object properties
and signals defines have no prefix to indicate whether it's a property
or a signal.
Rename.
Currently the DHCP client reports the BOUND state not only when the
lease is obtained initially but also when it is renewed. Having a
different state for the renewal will be used by NMDevice in the next
patch to determine whether the lease needs to be accept()ed or not.
With accept_ra set to 1, kernel sends its own router solicitation
messages and parses the advertisements. This duplicates what NM
already does in userspace and has unwanted consequences like [1] and
[2].
The only reason why accept_ra was re-enabled in the past was to apply
RA parameters like ReachableTime and RetransTimer [3]; but now NM
supports them and so accept_ra can be turned off again.
Also, note that previously the option was set in
addrconf6_start_with_link_ready(), and so this was done only when the
method was 'auto'. Instead, now we clear it for all methods except
'ignore'.
[1] https://mail.gnome.org/archives/networkmanager-list/2019-June/msg00027.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1734470
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1068673
IPv6 router advertisement messages contain the following parameters
(RFC 4861):
- Reachable time: 32-bit unsigned integer. The time, in
milliseconds, that a node assumes a neighbor is reachable after
having received a reachability confirmation. Used by the Neighbor
Unreachability Detection algorithm. A value of zero means
unspecified (by this router).
- Retrans Timer: 32-bit unsigned integer. The time, in milliseconds,
between retransmitted Neighbor Solicitation messages. Used by
address resolution and the Neighbor Unreachability Detection
algorithm. A value of zero means unspecified (by this router).
Currently NM ignores them; however, since it leaves accept_ra=1, the
kernel parses RAs and applies those parameters for us [1].
In the next commit kernel handling of RAs will be disabled, so let NM
set those neighbor-related parameters.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ndisc.c?h=v5.2#n1353
We have the "internal" DHCP plugin. That's our preferred plugin,
and eventually we may drop all other plugins.
Currently, the "internal" plugin is based on code from systemd-networkd
and implemented in "src/dhcp/nm-dhcp-systemd.c". As this code is forked
we eventually want to switch to nettools' n-dhcp4 library (for IPv4).
For that reason we already have "src/dhcp/nm-dhcp-nettools.c".
Note that "nettools" can be configured as a DHCP plugin, but this configuration
is only experimental and for testing. There is never supposed to be a
"nettools" plugin, but eventually the "internal" plugin will switch
implementation.
We don't want to replace systemd-based implementation right away. Not until
we are sure that nettools works well. For that reason we keep them
both in parallel for a while.
This commit makes "systemd" DHCP plugin explicitly configurable
in NetworkManager.conf. Like "nettools" this is an undocumented option,
only for testing.
If you choose "internal" (the default), you get one of the
implementations (currently the "systemd" one). But by selecting
"systemd" or "nettools" explicitly, you can select the exact plugin.
Drop nm_platform_link_get_address_as_bytes() and introduce
nmp_link_address_get_as_bytes() so that it becomes possible to obtain
also the broadcast address without an additional lookup of the link.
This is inspired by the existing systemd integration, with a few differences:
* This parses the WPAD option, which systemd requested, but did not use.
* We hook into the DAD handling, only making use of the configured address
once DAD has completed successfully, and declining the lease if it fails.
There are still many areas of possible improvement. In particular, we need
to ensure the parsing of all options are compliant, as n-dhcp4 treats all
options as opaque, unlike sd-dhcp4. We probably also need to look at how
to handle failures and retries (in particular if we decline a lease).
We need to query the current MTU at client startu, as well as the hardware
broadcast address. Both these are provided by the kernel over netlink, so
it should simply be a matter of hooking that up with NM's netlink layer.
Contribution under LGPL2.0+, in addition to stated licenses.
We no longer add these. If you use Emacs, configure it yourself.
Also, due to our "smart-tab" usage the editor anyway does a subpar
job handling our tabs. However, on the upside every user can choose
whatever tab-width he/she prefers. If "smart-tabs" are used properly
(like we do), every tab-width will work.
No manual changes, just ran commands:
F=($(git grep -l -e '-\*-'))
sed '1 { /\/\* *-\*- *[mM]ode.*\*\/$/d }' -i "${F[@]}"
sed '1,4 { /^\(#\|--\|dnl\) *-\*- [mM]ode/d }' -i "${F[@]}"
Check remaining lines with:
git grep -e '-\*-'
The ultimate purpose of this is to cleanup our files and eventually use
SPDX license identifiers. For that, first get rid of the boilerplate lines.
From the files under "shared/nm-utils" we build an internal library
that provides glib-based helper utilities.
Move the files of that basic library to a new subdirectory
"shared/nm-glib-aux" and rename the helper library "libnm-core-base.la"
to "libnm-glib-aux.la".
Reasons:
- the name "utils" is overused in our code-base. Everything's an
"utils". Give this thing a more distinct name.
- there were additional files under "shared/nm-utils", which are not
part of this internal library "libnm-utils-base.la". All the files
that are part of this library should be together in the same
directory, but files that are not, should not be there.
- the new name should better convey what this library is and what is isn't:
it's a set of utilities and helper functions that extend glib with
funcitonality that we commonly need.
There are still some files left under "shared/nm-utils". They have less
a unifying propose to be in their own directory, so I leave them there
for now. But at least they are separate from "shared/nm-glib-aux",
which has a very clear purpose.
(cherry picked from commit 80db06f768)
Next we will need to detect more kernel features. First refactor the
handling of these to require less code changes and be more efficient.
A plain nm_platform_kernel_support_get() only reqiures to access an
array in the common case.
The other important change is that the function no longer requires a
NMPlatform instance. This allows us to check kernel support from
anywhere. The only thing is that we require kernel support to be
initialized before calling this function. That means, an NMPlatform
instance must have detected support before.
(cherry picked from commit ee269b318e)
Instead of having two functions nm_logging_set_syslog_identifier()
and nm_logging_set_prefix(), merge them.
They must both be called at earliest point and together. No point
in giving them the appearance that they could be called any time.
Now that we have other helper function on platfrom for setting
IP configuration sysctls, rename the function to set the hop-limit
to match the pattern.
We need a mode that:
* doesn't leave processes behind
* doesn't force an internal dhclient
* doesn't auto-generate default connections
* doesn't write out files into libdir, only /run
The original configure-and-quit mode doesn't really fit the initrd use. But
it's proobably not a good idea to just change its behavior.
We first iterate over addresses that might have failed IPv6 DAD and
update the state in NMNDisc.
However, while we do that, don't yet invoke the changed signal.
Otherwise, we will invoke it multiple times (in case multiple addresses
failed). Instead, keep track of whether something changed, and handle
it once a bit later.
(cherry picked from commit f312620276)
During shutdown, we will need to still iterate the main loop
to do a coordinated shutdown. Currently we do not, and we just
exit, leaving a lot of objects hanging.
If we are going to fix that, we need during shutdown tell
NMDBusManager to reject all future operations.
Note that property getters and "GetManagerObjects" call is not
blocked. It continues to work.
Certainly for some operations, we want to allow them to be called even
during shutdown. However, these have to opt-in.
This also fixes an uglyness, where nm_dbus_manager_start() would
get the set-property-handler and the @manager as user-data. However,
NMDBusManager will always outlife NMManager, hence, after NMManager
is destroyed, the user-data would be a dangling pointer. Currently
that is not an issue, because
- we always leak NMManager
- we don't run the mainloop during shutdown
This was called by via
...
- manager:recheck_assume_connection()
- manager:get_existing_connection()
- nm_device_capture_initial_config()
- update_ext_ip_config(initial=TRUE)
and would parse resolv.conf, and try to fill the device IP config
with nameservers and dns-options.
But why? It would only have effect if NM was started with
nm_dns_manager_get_resolv_conf_explicit(), but is that really sensible?
And it would only take effect on devices that have a default route.
And for what is this information even used?
Let's not do it this way. If we need this information for assuming or
external sys-iface mode, then it should be explicitly loaded at the
appropriate moment.
For now, drop it and see what breaks. Then we can fix it properly. If
it even matters.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The next commit will completely rework NMBusManager and replace
NMExportedObject by a new type NMDBusObject.
Originally, NMDBusObject was added along NMExportedObject to ease
the rework and have compilable, intermediate stages of refactoring. Now,
I think the new name is better, because NMDBusObject is very strongly related
to the bus manager and the old name NMExportedObject didn't make that
clear.
I also slighly prefer the name NMDBusObject over NMBusObject, hence
for consistancy, also rename NMBusManager to NMDBusManager.
This commit only renames the file for a nicer diff in the next commit.
It does not actually update the type name in sources. That will be done
later.
Platform invokes change events while reading netlink events. However,
platform code is not re-entrant and calling into platform again is not
allowed (aside operations that do not process the netlink socket, like
lookup of the platform cache).
That basically means, we have to always process events in an idle
handler. That is not a too strong limitation, because we anyway don't
know the call context in which the platform event is emitted and we
should avoid unguarded recursive calls into platform.
Otherwise, we get hit an assertion/crash in nm-iface-helper:
1 raise()
2 abort()
3 g_assertion_message()
4 g_assertion_message_expr()
5 do_delete_object()
6 ip6_address_delete()
>>> 7 nm_platform_ip6_address_delete()
8 nm_platform_ip6_address_sync()
9 nm_ip6_config_commit()
10 ndisc_config_changed()
11 ffi_call_unix64()
12 ffi_call()
13 g_cclosure_marshal_generic_va()
14 _g_closure_invoke_va()
15 g_signal_emit_valist()
16 g_signal_emit()
>>> 17 nm_ndisc_dad_failed()
18 ffi_call_unix64()
19 ffi_call()
20 g_cclosure_marshal_generic()
21 g_closure_invoke()
22 signal_emit_unlocked_R()
23 g_signal_emit_valist()
24 g_signal_emit()
>>> 25 nm_platform_cache_update_emit_signal()
26 event_handler_recvmsgs()
27 event_handler_read_netlink()
28 delayed_action_handle_one()
29 delayed_action_handle_all()
30 do_delete_object()
31 ip6_address_delete()
32 nm_platform_ip6_address_delete()
33 nm_platform_ip6_address_sync()
>>> 34 nm_ip6_config_commit()
35 ndisc_config_changed()
36 ffi_call_unix64()
37 ffi_call()
38 g_cclosure_marshal_generic_va()
39 _g_closure_invoke_va()
40 g_signal_emit_valist()
41 g_signal_emit()
42 check_timestamps()
43 receive_ra()
44 ndp_call_eventfd_handler()
45 ndp_callall_eventfd_handler()
46 event_ready()
47 g_main_context_dispatch()
48 g_main_context_iterate.isra.22()
49 g_main_loop_run()
>>> 50 main()
NMPlatform already has a check to assert against recursive calls
in delayed_action_handle_all():
g_return_val_if_fail (priv->delayed_action.is_handling == 0, FALSE);
priv->delayed_action.is_handling++;
...
priv->delayed_action.is_handling--;
Fixes: f85728ecffhttps://bugzilla.redhat.com/show_bug.cgi?id=1546656
Convert the string representation of ipv4.dhcp-client-id property already in
NMDevice to a GBytes. Next, we will support more client ID modes, and we
will need the NMDevice context to generate the client id.
GByteArray is a mutable array of bytes. For every practical purpose, the hwaddr
property of NMDhcpClient is an immutable sequence of bytes. Thus, make it a
GBytes.
and nm_utils_ip6_property_path(). The API with static buffers
looks a bit nicer. But I think they are dangerous, because
we tend to pass the buffer down several layers of the stack, and
it's not immediately clear, that we don't overwrite the static
buffer again (which we probably did not, but it's hard to verify
that there is no bug there).
For routes and the default-route from NDisc, set the router preference
RTA_PREF.
Also, previously, we would only configure one IPv6 default-route. That by itself
was not really a problem, as long as NetworkManager would always make sure that
it configured the route to the ~best~ router.
Actually, NM should have done that already. It keeps the list of gateways
sorted, and prefers them according to their preference. But maybe
it didn't, so we have bug rh#1445417 (??).
Change that by configuring a default-route for all gateways, with
appropriate router prefrence. In case, kernel doesn't support RTA_PREF
yet, only configure all routes that share the same maxiumum preference.
https://bugzilla.redhat.com/show_bug.cgi?id=1445417
Instead of having 3 properties @gateway, @never_default and @has_gateway
on NMIP4Config/NMIP6Config that determine the default-route, track the
default-route as a regular route.
The gateway setting is the configuration knob for the default-route.
Since an NMIP4Config/NMIP6Config instance only has one gateway property,
it cannot track more then one default-routes (see related bug rh#1445417).
Especially with policy routing, it might be interesting to configure a
default-route in multiple tables.
Also, later it might be interesting to allow adding default-routes as
regular static routes in a connection, so that the user can configure additional
route parameters for the default-route or add default-routes in multiple tables.
With this patch, default-routes now have a rt_source property according to their
origin.
Also, the previous commits of this branch broke handling of the
default-route :) . That should be working now again.