Until now, NetworkManager's platform cache for routes used the quadruple
network/plen,metric,ifindex for equaliy. That is not kernel's
understanding of how routes behave. For example, with `ip route append`
you can add two IPv4 routes that only differ by their gateway. To
the previous form of platform cache, these two routes would wrongly
look identical, as the cache could not contain both routes. This also
easily leads to cache-inconsistencies.
Now that we have NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID, fix the route's
compare operator to match kernel's.
Well, not entirely. Kernel understands more properties for routes then
NetworkManager. Some of these properties may also be part of the ID according
to kernel. To NetworkManager such routes would still look identical as
they only differ in a property that is not understood. This can still
cause cache-inconsistencies. The only fix here is to add support for
all these properties in NetworkManager as well. However, it's less serious,
because with this commit we support several of the more important properties.
See also the related bug rh#1337855 for kernel.
Another difficulty is that `ip route replace` and `ip route change`
changes an existing route. The replaced route has the same
NM_PLATFORM_IP_ROUTE_CMP_TYPE_WEAK_ID, but differ in the actual
NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID:
# ip -d -4 route show dev v
# ip monitor route &
# ip route add 192.168.5.0/24 dev v
192.168.5.0/24 dev v scope link
# ip route change 192.168.5.0/24 dev v scope 10
192.168.5.0/24 dev v scope 10
# ip -d -4 route show dev v
unicast 192.168.5.0/24 proto boot scope 10
Note that we only got one RTM_NEWROUTE message, although from NMPCache's
point of view, a new route (with a particular ID) was added and another
route (with a different ID) was deleted. The cumbersome workaround is,
to keep an ordered list of the routes, and figure out which route was
replaced in response to an RTM_NEWROUTE. In absence of bugs, this should
work fine. However, as we only rely on events, we might wrongly
introduce a cache-inconsistancy as well. See the related bug rh#1337860.
Also drop nm_platform_ip4_route_get() and the like. The ID of routes
is complex, so it makes little sense to look up a route directly.
Via the flags of the RTM_NEWROUTE netlink message, kernel and iproute2
support various variants to add a route.
- ip route add
- ip route change
- ip route replace
- ip route prepend
- ip route append
- ip route test
Previously, our nm_platform_ip4_route_add() function was basically
`ip route replace`. In the future, we should rather user `ip route
append` instead.
Anyway, expose the netlink message flags in the API. This allows to
use the various forms, and makes it also more apparent to the user that
they even exist.
- kernel ignores rtm_tos for IPv6 routes. While iproute2 accepts it,
let libnm reject TOS attribute for routes as well.
- move the tos field from NMPlatformIPRoute to NMPlatformIP4Route.
- the tos field is part of the weak-id of an IPv4 route. Meaning,
`ip route add` can add routes that only differ by their TOS.
There are various notions of how to compare routes. Collect them all
in nm_platform_ip4_route_cmp(), nm_platform_ip4_route_hash(),
nm_platform_ip6_route_cmp(), and nm_platform_ip6_route_hash().
This way, we have them side-by-side, which makes the differences more
discoverable.
It is valid to set "lock" for a property with numeric value 0.
# ip route append 192.168.7.0/24 dev bond0 window lock 0
# ip route append 192.168.7.0/24 dev bond0 window 0
# ip route append 192.168.7.0/24 dev bond0 window lock 10
# ip route append 192.168.7.0/24 dev bond0 window 10
# ip -4 -d route show dev bond0
unicast 192.168.7.0/24 proto boot scope link linkdown
unicast 192.168.7.0/24 proto boot scope link linkdown
unicast 192.168.7.0/24 proto boot scope link linkdown window lock 10
unicast 192.168.7.0/24 proto boot scope link linkdown window 10
Our to-string methods should accurately print the content of
the routes. Note that iproute2 fails to do so too.
The mss (advmss, RTA_METRICS.RTAX_ADVMSS) is in a way part of
the ID for IPv4 routes. That is, you can add multiple IPv4 routes, that
only differ by mss.
On the other hand, that is not the case for IPv6. Two IPv6 routes
that only differ by mss are considered the same.
Another issue is, that you cannot selectively delete an IPv4 route based
on the mss:
ip netns del x
ip netns add x
IP() {
ip netns exec x ip "$@"
}
IP link add type veth
IP link set veth0 name v
IP link set veth1 up
IP link set v up
IP route append 192.168.7.0/24 dev v advmss 6
IP route append 192.168.7.0/24 dev v advmss 7
IP -d route show dev v
IP route delete 192.168.7.0/24 dev v advmss 7
IP -d route show dev v
It seems for deleting routes, kernel ignores mss (which doesn't really
matter for IPv6, but does so for IPv4).
Refactor _nl_msg_new_route() to obtain the route scope (rtm_scope)
from the NMPObject, instead of a separate argument.
That way, when deleting an IPv4 route, we don't pick the first route
that matches (RT_SCOPE_NOWHERE), but use the actual scope of the route
that we want to delete. That matters, if there are more then one
otherwise identical routes that only differ by their scope.
For kernel, the scope of IPv6 routes is always global
(RT_SCOPE_UNIVERSE).
Also, during ip4_route_add() initialize the intermediate @obj to have
the values as we expect them after adding the route. That is necessary
to use it in _nl_msg_new_route(). But also nicer for consistency.
Also, move the scope_inv field in NMPlatformIP4Route to let the other
in_addr_t fields life side by side.
_nl_msg_new_route() should not get extra arguments, but instead
use all parameters from the NMPObject argument. This will allow
during nm_platform_ip_route_delete() to pick the exact route
that should be deleted.
Also, in ip4_route_add()/ip6_route_add(), keep the stack-allocated
@obj object consistent with what we expect to add. That is, set
the rt_source argument to the value of what the route will have
after kernel adds it. That might be necessary, because
do_add_addrroute() searches the cache for @obj.
Routes are complicated.
`ip route add` and `ip route append` behaves differently with respect to
determine whether an existing route is idential or not.
Extend the cmp() and hash() functions to have a compare type, that
covers the different semantics.
Contrary to addresses, routes have no ID. When deleting a route,
you cannot just specify certain properties like network/plen,metric.
Well, actually you can specify only certain properties, but then kernel
will treat unspecified properties as wildcard and delete the first matching
route. That is not something we want, because we need to be in control which
exact route shall be deleted.
Also, rtm_tos *must* match. Even if we like the wildcard behavior,
we would need to pass TOS to nm_platform_ip4_route_delete() to be
able to delete routes with non-zero TOS. So, while certain properties
may be omitted, some must not. See how test_ip4_route_options() was
broken.
For NetworkManager it only makes ever sense to call delete on a route,
if the route is already fully known. Which means, we only delete routes
that we have already in the platform cache (otherwise, how would we know
that there is something to delete). Because of that, no longer have separate
IPv4 and IPv6 functions. Instead, have nm_platform_ip_route_delete() which
accepts a full NMPObject from the platform cache.
The code in core doesn't jet make use of this new functionality. It will
in the future.
At least, it fixes deleting routes with differing TOS.
and refactor NMFakePlatform to also track links via NMPCache.
For one, now NMFakePlatform also tests NMPCache, increasing the
coverage of what we care about.
Also, all our NMPlatform implementations now use NMPObject and NMPCache.
That means, we can expose those as part of the public API. Which is
great, because callers can keep a reference to the NMPObject object
and make use of generic functions like nmp_object_to_string().
And move some code from NMLinuxPlatform to NMPlatform, where it belongs.
The advantage is that we reuse (and test!) the NMPCache implementation for
tracking addresses.
Also, we now always expose proper NMPObjects from both linux and fake
platform.
For example,
obj = NMP_OBJECT_UP_CAST (nm_platform_ip4_address_get (...));
will work as expected. Also, the caller is now by NMPlatform API
allowed to take and keep a reference to the returned objects.
by moving the core functionality to "nm-dedup-multi.c".
As the ref-counting mechanism now is part of "nm-dedup-multi.c",
this works better and is reusable outside of platform.
NMPlatform's cache should be directly accessible to the users,
at least the NMPLookup part and the fact that the cache contains
ref-counted, immutable NMPObjects.
This allows users to inspect the cache with zero overhead. Meaning,
they can obtain an NMDedupMultiHeadEntry and iterate the objects
themself. It also means, the are free to take and keep references
of the NMPObject instances (of course, without modifying them!).
NMFakePlatform will use the very same cache. The fake platform should
only differ when modifying the objects.
Another reason why this makes sense is because NMFakePlatform is for one
a test-stup but also tests behavior of platform itself. Using a separate
internal implementation for the caching is a pointless excecise, because
only the real NMPCache's implementation really matters for production.
So, either NMFakePlatform behaves idential, or it is buggy. Reuse it.
Port fake platform's tracking of routes to NMPCache and move duplicate
code from NMLinuxPlatform to the base class.
This commit only ports IP routes, eventually also addresses and links
should be tracked via the NMPCache instance.
We want to expose the NMPLookup and NMDedupMultiHeadEntry to the users
of NMPlatform, so that they can iterate the cache directly.
That means, NMPCache becames an integral part of NMPlatform's API
and must also be implemented by NMFakePlatform.
We want to move the multi_idx from NMLinuxPlatform to NMPlatform,
so that it can be used by NMFakePlatform as well. For that, we need
to know whether NMPlatform will use udev or not. Add a constrctor
property.
Rework platform object cache to use NMDedupMultiIndex.
Already previously, NMPCache used NMMultiIndex and had thus
O(1) for most operations. What is new is:
- Contrary to NMMultiIndex, NMDedupMultiIndex preserves the order of
the cached items. That is crucial to handle routes properly as kernel
will replace the first matching route based on network/plen/metric
properties. See related bug rh#1337855.
Without tracking the order of routes as they are exposed
by kernel, we cannot properly maintain the route cache.
- All NMPObject instances are now treated immutable, refcounted
and get de-duplicated via NMDedupMultiIndex. This allows
to have a global NMDedupMultiIndex that can be shared with
NMIP4Config and NMRouteManager. It also allows to share the
objects themselves.
Immutable objects are so much nicer. We can get rid of the
update pre-hook callback, which was required previously because
we would mutate the object inplace. Now, we can just update
the cache, and compare obj_old and obj_new after the fact.
- NMMultiIndex was treated as an internal of NMPCache. On the other
hand, NMDedupMultiIndex exposes NMDedupMultiHeadEntry, which is
basically an object that allows to iterate over all related
objects. That means, we can now lookup objects in the cache
and give the NMDedupMultiHeadEntry instance to the caller,
which then can iterate the list on it's own -- without need
for copying anything.
Currently, at various places we still create copies of lookup
results. That can be improved later.
The ability to share NMPObject instances should enable us to
significantly improve performance and scale with large number
of routes.
Of course there is a memory overhead of having an index for each list
entry. Each NMPObject may also require an NMDedupMultiEntry,
NMDedupMultiHeadEntry, and NMDedupMultiBox item, which are tracked
in a GHashTable. Optimally, one NMDedupMultiHeadEntry is the head
for multiple objects, and NMDedupMultiBox is able to deduplicate several
NMPObjects, so that there is a net saving.
Also, each object type has several indexes of type NMPCacheIdType.
So, worst case an NMPlatformIP4Route in the platform cache is tracked
by 8 NMPCacheIdType indexes, for each we require a NMDedupMultiEntry,
plus the shared NMDedupMultiHeadEntry. The NMDedupMultiBox instance
is shared between the 8 indexes (and possibly other).
NMIP4Config, NMIP6Config, and NMPlatform shall share one
NMDedupMultiIndex instance.
For that, pass an NMDedupMultiIndex instance to NMPlatform and NMNetns.
NMNetns than passes it on to NMDevice, NMDhcpClient, NMIP4Config and NMIP6Config.
So currently NMNetns is the access point to the shared NMDedupMultiIndex
instance, and it gets it from it's NMPlatform instance.
The NMDedupMultiIndex instance is really a singleton, we don't want
multiple instances of it. However, for testing, instead of adding a
singleton instance, pass the instance explicitly around.
Commits 39d0559d9a ("platform: sort links by name instead of
ifindex") and 529a0a1a7f ("manager: sort slaves to be autoconnected
by device name") changed the order of activation of slaves. Introduce
a system-wide configuration property to preserve the old behavior.
https://bugzilla.redhat.com/show_bug.cgi?id=1452585
Arguably, we currently only have one instance of NMPlatform,
NMRouteManager, NMDefaultRouteManager -- the one owned by the
NMNetns singleton.
Hence, all these instances we create with "log-with-ptr" set explicitly
to false.
In the future we want to support namespaces, and it will be be common to
have multiple instances. For that we have "log-with-ptr" so we are able
to disambiguiate the logging.
Change the default to TRUE because it makes more sense. It has currently
no effect as the default is never used.
(cherry picked from commit 41148caba8)