Specify a reason when creating active connections. The reason will be
used in the next commit to tell whether slaves must be reconnected or
not if a master has autoconnect-slaves=yes.
Previously, we used the generated GDBusInterfaceSkeleton types and glued
them via the NMExportedObject base class to our NM types. We also used
GDBusObjectManagerServer.
Don't do that anymore. The resulting code was more complicated despite (or
because?) using generated classes. It was hard to understand, complex, had
ordering-issues, and had a runtime and memory overhead.
This patch refactors this entirely and uses the lower layer API GDBusConnection
directly. It replaces the generated code, GDBusInterfaceSkeleton, and
GDBusObjectManagerServer. All this is now done by NMDbusObject and NMDBusManager
and static descriptor instances of type GDBusInterfaceInfo.
This adds a net plus of more then 1300 lines of hand written code. I claim
that this implementation is easier to understand. Note that previously we
also required extensive and complex glue code to bind our objects to the
generated skeleton objects. Instead, now glue our objects directly to
GDBusConnection. The result is more immediate and gets rid of layers of
code in between.
Now that the D-Bus glue us more under our control, we can address issus and
bottlenecks better, instead of adding code to bend the generated skeletons
to our needs.
Note that the current implementation now only supports one D-Bus connection.
That was effectively the case already, although there were places (and still are)
where the code pretends it could also support connections from a private socket.
We dropped private socket support mainly because it was unused, untested and
buggy, but also because GDBusObjectManagerServer could not export the same
objects on multiple connections. Now, it would be rather straight forward to
fix that and re-introduce ObjectManager on each private connection. But this
commit doesn't do that yet, and the new code intentionally supports only one
D-Bus connection.
Also, the D-Bus startup was simplified. There is no retry, either nm_dbus_manager_start()
succeeds, or it detects the initrd case. In the initrd case, bus manager never tries to
connect to D-Bus. Since the initrd scenario is not yet used/tested, this is good enough
for the moment. It could be easily extended later, for example with polling whether the
system bus appears (like was done previously). Also, restart of D-Bus daemon isn't
supported either -- just like before.
Note how NMDBusManager now implements the ObjectManager D-Bus interface
directly.
Also, this fixes race issues in the server, by no longer delaying
PropertiesChanged signals. NMExportedObject would collect changed
properties and send the signal out in idle_emit_properties_changed()
on idle. This messes up the ordering of change events w.r.t. other
signals and events on the bus. Note that not only NMExportedObject
messed up the ordering. Also the generated code would hook into
notify() and process change events in and idle handle, exhibiting the
same ordering issue too.
No longer do that. PropertiesChanged signals will be sent right away
by hooking into dispatch_properties_changed(). This means, changing
a property in quick succession will no longer be combined and is
guaranteed to emit signals for each individual state. Quite possibly
we emit now more PropertiesChanged signals then before.
However, we are now able to group a set of changes by using standard
g_object_freeze_notify()/g_object_thaw_notify(). We probably should
make more use of that.
Also, now that our signals are all handled in the right order, we
might find places where we still emit them in the wrong order. But that
is then due to the order in which our GObjects emit signals, not due
to an ill behavior of the D-Bus glue. Possibly we need to identify
such ordering issues and fix them.
Numbers (for contrib/rpm --without debug on x86_64):
- the patch changes the code size of NetworkManager by
- 2809360 bytes
+ 2537528 bytes (-9.7%)
- Runtime measurements are harder because there is a large variance
during testing. In other words, the numbers are not reproducible.
Currently, the implementation performs no caching of GVariants at all,
but it would be rather simple to add it, if that turns out to be
useful.
Anyway, without strong claim, it seems that the new form tends to
perform slightly better. That would be no surprise.
$ time (for i in {1..1000}; do nmcli >/dev/null || break; echo -n .; done)
- real 1m39.355s
+ real 1m37.432s
$ time (for i in {1..2000}; do busctl call org.freedesktop.NetworkManager /org/freedesktop org.freedesktop.DBus.ObjectManager GetManagedObjects > /dev/null || break; echo -n .; done)
- real 0m26.843s
+ real 0m25.281s
- Regarding RSS size, just looking at the processes in similar
conditions, doesn't give a large difference. On my system they
consume about 19MB RSS. It seems that the new version has a
slightly smaller RSS size.
- 19356 RSS
+ 18660 RSS
The connection.mdns setting is a per-connection setting,
so one might expect that one activated device can only have
one MDNS setting at a time.
However, with certain VPN plugins (those that don't have their
own IP interface, like libreswan), the VPN configuration is merged
into the configuration of the device. So, in this case, there
might be multiple settings for one device that must be merged.
We already have a mechanism for that. It's NMIP4Config. Let NMIP4Config
track this piece of information. Although, stricitly speaking this
is not tied to IPv4, the alternative would be to introduce a new
object to track such data, which would be a tremendous effort
and more complicated then this.
Luckily, NMDnsManager and NMDnsPlugin are already equipped to
handle multiple NMIPConfig instances per device (IPv4 vs. IPv6,
and Device vs. VPN).
Also make "connection.mdns" configurable via global defaults in
NetworkManager.conf.
After commit 5c299454b4 ("core: rework tracking of
gateway/default-route in ip-config") NM set a default route for VPNs
only based on the "never-default" option reported by the plugin. It
should also consider the connection setting.
Fixes: 5c299454b4https://bugzilla.redhat.com/show_bug.cgi?id=1505886
Replace the usage of g_str_hash() with our own nm_str_hash().
GLib's g_str_hash() uses djb2 hashing function, just like we
do at the moment. The only difference is, that we use a diffrent
seed value.
Note, that we initialize the hash seed with random data (by calling
getrandom() or reading /dev/urandom). That is a change compared to
before.
This change of the hashing function and accessing the random pool
might be undesired for libnm/libnm-core. Hence, the change is not
done there as it possibly changes behavior for public API. Maybe
we should do that later though.
At this point, there isn't much of a change. This patch becomes
interesting, if we decide to use a different hashing algorithm.
Instead of having 3 properties @gateway, @never_default and @has_gateway
on NMIP4Config/NMIP6Config that determine the default-route, track the
default-route as a regular route.
The gateway setting is the configuration knob for the default-route.
Since an NMIP4Config/NMIP6Config instance only has one gateway property,
it cannot track more then one default-routes (see related bug rh#1445417).
Especially with policy routing, it might be interesting to configure a
default-route in multiple tables.
Also, later it might be interesting to allow adding default-routes as
regular static routes in a connection, so that the user can configure additional
route parameters for the default-route or add default-routes in multiple tables.
With this patch, default-routes now have a rt_source property according to their
origin.
Also, the previous commits of this branch broke handling of the
default-route :) . That should be working now again.
The MSS is only set for VPN connections (by merging it in the respective
NMIP4Config/NMIP6Config).
It is also only used when setting the MSS of the default route.
Don't track that property in NMIP4Config/NMIP6Config, instead, set the
mss of the route directly in nm_vpn_connection_ip4_config_get() and
nm_vpn_connection_ip6_config_get().
There is a potential change in behavior here: NMDevice also consisdered
the MSS for the default route, but that would only be set if the MSS
gets merged from an vpn-ip-config. Which at most is the case for
iterface-less VPN types (libreswan). But even in that case, it doesn't
seem right to me to use the VPN's MSS for the device's default-route.
We added "ipv4.route-table-sync" and "ipv6.route-table-sync" to not change
behavior for users that configured policy routing outside of NetworkManager,
for example, via a dispatcher script. Users had to explicitly opt-in
for NetworkManager to fully manage all routing tables.
These settings were awkward. Replace them with new settings "ipv4.route-table"
and "ipv6.route-table". Note that this commit breaks API/ABI on the unstable
development branch by removing recently added API.
As before, a connection will have no route-table set by default. This
has the meaning that policy-routing is not enabled and only the main table
will be fully synced. Once the user sets a table, we recognize that and
NetworkManager manages all routing tables.
The new route-table setting has other important uses: analog to
"ipv4.route-metric", it is the default that applies to all routes.
Currently it only works for static routes, not DHCP, SLAAC,
default-route, etc. That will be implemented later.
For static routes, each route still can explicitly set a table, and
overwrite the per-connection setting in "ipv4.route-table" and
"ipv6.route-table".
Before commit 6698bf58bb, we would rely on
kernel to add the device-route for manual IPv6 routes. We broke that and now
kernel would still add the device-route, however nm_platform_ip_route_sync()
would delete it immediately after.
That is because previously nm_platform_ip_route_sync() would ignore routes
with rtm_protocol RTPRO_KERNEL. Now, it will sync and delete those too.
Fix that by adding the device-route like we do it for IPv4. This also
fixes an actual issue where the automatically added route always had
route-metric 256. Instead, we now use the metric from ipv6.route-metric
setting.
Fixes: 6698bf58bb
Kernel does not allow to add IPv6 routes with "src", as long as the
corresponding address is still tentative (related bug rh#1457196).
The workaround for this is cumbersome. First, when we fail to add such a
route with "pref_src", we guess that it happend due to this issue. In
that case, nm_ip6_config_commit() returns the list of routes that could
not be added for the moment (but hopefully can be added later).
We track this list in NMDevice, and keep trying to merge the routes
back into ip6_config. In order to not try indefinitely, keep track of a
timestamp when we tried to add this route for the first time.
Another uglyness is that pending tentative routes don't explicitly block
activation. In practice they may do, because for these routes we also have
an IPv6 address that is still doing DAD, so the IP configuration is
still pending due to that.
https://bugzilla.redhat.com/show_bug.cgi?id=1452684
For IPv6, we create device routes when processing the RA and add it to
NMIP6Config like any other route. For IPv4 we didn't do that. Instead
we created the list of device routes during nm_ip4_config_commit() and
passed it to nm_platform_ip_route_sync().
Remove NMDefaultRouteManager. Instead, add the default-route to the
NMIP4Config/NMIP6Config instance.
This basically reverts commit e8824f6a52.
We added NMDefaultRouteManager because we used the corresponding to `ip
route replace` when configuring routes. That would replace default-routes
on other interfaces so we needed a central manager to coordinate routes.
Now, we use the corresponding of `ip route append` to configure routes,
and each interface can configure routes indepdentently.
In NMDevice, when creating the default-route, ignore @auto_method for
external devices. We shall not touch these devices.
Especially the code in NMPolicy regarding selection of the best-device
seems wrong. It probably needs further adjustments in the future.
Especially get_best_ip_config() should be replaced, because this
distinction VPN vs. devices seems wrong to me.
Thereby, remove the @ignore_never_default argument. It was added by
commit bb75026004, I don't think it's
needed anymore.
This brings another change. Now that we track default-routes in
NMIP4Config/NMIP6Config, they are also exposed on D-Bus like regular
routes. I think that makes sense, but it is a change in behavior, as
previously such routes were not exposed there.
Until recently, we would only consier the IP config of the parent device
to determine the route to the external VPN gateway. We changed that, to
additionally improve the guess by letting kernel resolve the route.
Now, drop checking the parent's config entirely. The only thing that
matters is the here and now runtime configuraion on the parent device.
And for that we ask kernel to resolve the route.
Previously, we would try to resolve the route in general (unrestricted
to a certain ifindex), and reject it the result wasn't on the parent
device.
Now, use the oif argument, and resolve the route only on the parent device.
The problem is that kernel would pretend that the destination is onlink, if
there is no route to it. Hence, hack around that by only accepting an onlink
route, if the VPN gateway itself is site-local. Yes, there are scenarios where
this will still lead to a wrong guess. See related bug rh#1489343 for kernel.
In practice, it shouldn't matter much, because NM may frequently
reapply the IP config. Hence, it anyway must cope with the fact that
IP config from a previous iteration is already applied on the VPN device,
before applying it to the parent device.
Anyway, it makes a bit more sense to apply it first the the parent device.
When creating the NMIP4Config/NMIP6Config instance, we must always use the right
ifindex. That is the ifindex, on which we want to apply the config. It also means,
that for device-based VPNs (those with priv->ip_ifindex set, like OpenVPN), the
parent's config must have the ip-ifindex of the parent device. Not of the
VPN's device.
One effect of this bug is that in add_ip4_vpn_gateway_route() we resolve
the route to the external gateway and only accept it if it's on the
parent device. But since the ifindex of the config was wrong, we would accept
route on the wrong interface.
https://bugzilla.gnome.org/show_bug.cgi?id=787370
When activating a route, we commonly need to add a route to the
external VPN gateway, so that the (encrypted) data is not sent over
the VPN itself.
Currently, our VPN connections are rather strongly tied to their
parent device. Maybe the shouldn't be, as VPN may happily support
changing the route from one device/IP address to another.
Anyway, our previous way to determine the gateway for the route
was not great. Instead, ask kernel how to reach the gateway via
(something like) `ip route get`. If kernel would route the packets
to the gateway via our parent device, we take that gateway.
If not, we keep our previous heuristic (which is probably wrong
in this case).
Previously, we would add exclusive routes via netlink message flags
NLM_F_CREATE | NLM_F_REPLACE for RTM_NEWROUTE. Similar to `ip route replace`.
Using that form of RTM_NEWROUTE message, we could only add a certain
route with a certain network/plen,metric triple once. That was already
hugely inconvenient, because
- when configuring routes, multiple (managed) interfaces may get
conflicting routes (multihoming). Only one of the routes can be actually
configured using `ip route replace`, so we need to track routes that are
currently shadowed.
- when configuring routes, we might replace externally configured
routes on unmanaged interfaces. We should not interfere with such
routes.
That was worked around by having NMRouteManager (and NMDefaultRouteManager).
NMRouteManager would keep a list of the routes which NetworkManager would like
to configure, even if momentarily being unable to do so due to conflicting routes.
This worked mostly well but was complicated. It involved bumping metrics to
avoid conflicts for device routes, as we might require them for gateway routes.
Drop that now. Instead, use the corresponding of `ip route append` to configure
routes. This allows NetworkManager to confiure (almost) all routes that we care.
Especially, it can configure all routes on a managed interface, without
replacing/interfering with routes on other interfaces. Hence, NMRouteManager
becomes obsolete.
It practice it is a bit more complicated because:
- when adding an IPv4 address, kernel will automatically create a device route
for the subnet. We should avoid that by using the IFA_F_NOPREFIXROUTE flag for
IPv4 addresses (still to-do). But as kernel may not support that flag for IPv4
addresses yet (and we don't require such a kernel yet), we still need functionality
similar to nm_route_manager_ip4_route_register_device_route_purge_list().
This functionality is now handled via nm_platform_ip4_dev_route_blacklist_set().
- trying to configure an IPv6 route with a source address will be rejected
by kernel as long as the address is tentative (see related bug rh#1457196).
Preferably, NMDevice would keep the list of routes which should be configured,
while kernel would have the list of what actually is configured. There is a
feed-back loop where both affect each other (for example, when externally deleting
a route, NMDevice must forget about it too). Previously, NMRouteManager would have
the task of remembering all routes which we currently want to configure, but cannot
due to conflicting routes.
We get rid of that, because now we configure non-exclusive routes. We however still
will need to remember IPv6 routes with a source address, that currently cannot be
configured yet. Hence, we will need to keep track of routes that
currently cannot be configured, but later may be.
That is still not done yet, as NMRouteManager didn't handle this
correctly either.
Rename to nm_platform_ip_address_flush(), it's more consistent with naming
for other platform functions.
Also, pass an address family argument. Sometimes I feel an option makes it clearer
what the function does. Otherwise, from the name it's not clear which address
families are affected. As an API, it feels more correct to me.
We soon also get a nm_platform_ip_route_flush() function, which will
look similar.
Kernel requires that the host part of a route (based on network/plen)
is zero. Routes with non-zero host part don't really exist.
In settings (NMIPRoute), we don't enforce that. Hence we must ensure
that we don't let such invalid routes into NMIP4Config/NMIP6Config.
Also at other places where we obtain routes from untrusted sources,
we must sanitize them first.
Also add an assertion to catch such bugs.
Reasons:
- it adds an O(1) lookup index for accessing NMIPxConfig's addresses.
Hence, operations like merge/intersect have now runtime O(n) instead
of O(n^2).
Arguably, we expect low numbers of addresses in general. For low
numbers, the O(n^2) doesn't matter and quite likely in those cases
the previous implementation was just fine -- maybe even faster.
But the simple case works fine either way. It's important to scale
well in the exceptional case.
- the tracked objects can be shared between the various NMPI4Config,
NMIP6Config instances with NMPlatform and everybody else.
- the NMPObject can be treated generically, meaning it enables code to
handle both IPv4 and IPv6, or addresses and routes. See for example
_nm_ip_config_add_obj().
- I want core to evolve to somewhere where we don't keep copies of
NMPlatformIP4Address, et al. instances. Instead they shall all be
shared. I hope this will reduce memory consumption (although tracking a
reference consumes some memory too). Also, it shortcuts nmp_object_equal()
when comparing the same object. Calling nmp_object_equal() on the
identical objects would be a common case after the hash function
pre-evaluates equality.
Eventually, every NMPlatformIP4Route, NMPlatformIP6Route,
NMPlatformIP4Address and NMPlatformIP6Address should be shared
an deduplicated via the global NMDedupMultiIndex instance.
As first proof of concept, refactor NMIP4Config to track
IPv4 routes via the shared multi_idx. There is later potential
for improvement, when we pass (deduplicated) NMPObject instances
around instead of plain NMPlatformIP4Route, which needs still
a lot of comparing and cloning.
NMIP4Config, NMIP6Config, and NMPlatform shall share one
NMDedupMultiIndex instance.
For that, pass an NMDedupMultiIndex instance to NMPlatform and NMNetns.
NMNetns than passes it on to NMDevice, NMDhcpClient, NMIP4Config and NMIP6Config.
So currently NMNetns is the access point to the shared NMDedupMultiIndex
instance, and it gets it from it's NMPlatform instance.
The NMDedupMultiIndex instance is really a singleton, we don't want
multiple instances of it. However, for testing, instead of adding a
singleton instance, pass the instance explicitly around.
nm_pacrunner_manager_remove() required a "tag" argument. It was a
bug for callers trying to remove a configuration for a non-existing
tag.
That effectively means, the caller must keep track of whether a certain
"tag" is pending. The caller also must remember the tag -- a tag that he
must choose uniquely in the first place.
Turn that around and have nm_pacrunner_manager_send() return a (non
NULL) call-id. This call-id may later be used to remove the
configuration.
Apparently, previously the tracking of the "tag" was not always correct
and we hit the assertion in nm_pacrunner_manager_remove().
https://bugzilla.redhat.com/show_bug.cgi?id=1444374
(cherry picked from commit b04a9c90eb)
Reduce the use of NM_PLATFORM_GET / nm_platform_get() to get
the platform singleton instance.
For one, this is a step towards supporting namespaces, where we need
to use different NMNetns/NMPlatform instances depending on in which
namespace the device lives.
Also, we should reduce our use of singletons. They are difficult to
coordinate on shutdown. Instead there should be a clear order of
dependencies, expressed by owning a reference to those singelton
instances. We already own a reference to the platform singelton,
so use it and avoid NM_PLATFORM_GET.
(cherry picked from commit 94d9ee129d)
This also ensures that we own a reference to the
NMPlatform, NMRouteManager and NMDefaultRouteManager
instances. See bug rh#1440089 where we might access
the singleton getter after destroing the singleton
instance of NMRouteManager. This is prevented by
keeping a reference to those instances -- indirectly
via the netns instance.
Later, we may add support for multiple namespaces. Then it might
make sense to swap the NMNetns instance of a device when moving
the device between namespaces.
Also, drop the use of singelton instances.
https://bugzilla.redhat.com/show_bug.cgi?id=1440089
(cherry picked from commit c48a19b7c6)
Got an assertion due to priv-proxy unset.
NMDevice:
- _platform_link_cb_idle()
- nm_device_unrealize() [NMDeviceTun]
- nm_device_state_changed()
- _set_state_full()
NMVpnConnection:
- _set_vpn_state()
- call_plugin_disconnect()
It seam to me, that can only happen if the NMVpnConnection never
completed on_proxy_acquired() and is still in preparing state when
being disconnected.
Avoid that be checking whether we have a proxy.
https://bugzilla.redhat.com/show_bug.cgi?id=1442064
(cherry picked from commit bc1d1c9df4)
Don't try to remove the configuration if we haven't added it in the
first place, for example when the connection gets deactivated before
it completes or for slave connections without IP configuration.
Fixes: 3ad89223d0
(cherry picked from commit 3cada7722d)
Fix some issues in nm-pacrunner-manager.c:
- when adding a configuration through nm_pacrunner_manager_send(), we
kept an association between the interface name and the pacrunner
configuration object path, so that the configuration for that
interface could be removed later. Unfortunately not all
configurations have an interface associated, so we need a more
generic way to identify configurations. Introduce a new @tag
argument that serves as key to match configurations
- the interface name of the last pushed configuration was stored in
the manager private config and reused later; this could cause
issues when there are multiple outstanding D-Bus calls. The
interface is not needed anymore after the previous point.
- remove() didn't actually remove the configuration from the list
(cherry picked from commit 3ad89223d0)
It's utterly useless: the textual version of the reason if logged only if
the plugin fails; but the plugin failure already logs the plugin state
change reason which is directly translated to the connection one.
It includes a reason code that makes it possible for the clients to be
more reasonable about error messages.
The reason code is essentially copied from the VPN, plus three more
reasons that were useful for non-VPN connections.