We often want to cascade hashing, meaning, to combine the
outcome of various hash functions in a larger hash.
Instead of having each hash function return a guint hash value,
accept a hash state argument. This saves the overhead of initializing
and completing the intermediate hash states.
It also avoids loosing entropy when we reduce the larger hash state
into the intermediate guint hash value.
Whenever we call a platform operation that reads or writes the netlink
socket, there is the possibility that the cache gets updated, as we
receive netlink events.
It is thus racy, if nm_platform_ip_route_sync() *first* adds routes, and
then obtains a list of routes to delete. The correct approach is to
determine which routes to delete first (and keep it in a list
@routes_prune), and pass that list down to nm_platform_ip_route_sync().
Arguably, this doesn't yet solve every race. For example, NMDevice
calls update_ext_ip_config() during ip4_config_merge_and_apply().
That is good, as it resyncs with platform. However, before calling
nm_ip4_config_commit() it calls other platform operations, like
_commit_mtu(). So, the race is still there.
Kernel does not allow to add a route with table 0 (RT_TABLE_UNSPEC). It
effectively is an alias for the main table. We must consider that when
comparing routes sementically.
Kernel does not allow to add IPv6 routes with "src", as long as the
corresponding address is still tentative (related bug rh#1457196).
The workaround for this is cumbersome. First, when we fail to add such a
route with "pref_src", we guess that it happend due to this issue. In
that case, nm_ip6_config_commit() returns the list of routes that could
not be added for the moment (but hopefully can be added later).
We track this list in NMDevice, and keep trying to merge the routes
back into ip6_config. In order to not try indefinitely, keep track of a
timestamp when we tried to add this route for the first time.
Another uglyness is that pending tentative routes don't explicitly block
activation. In practice they may do, because for these routes we also have
an IPv6 address that is still doing DAD, so the IP configuration is
still pending due to that.
https://bugzilla.redhat.com/show_bug.cgi?id=1452684
Let's not treat those routes special. I think this was originally done, because
we relied on kernel to add the IPv4 device route, so we would ignore RTPROT_KERNEL
routes and not delete them.
We want to track them for various reasons:
- for consistency, there is nothing special except that they might be
added by kernel.
- we expose the routes of NMIP4Config/NMIP6Config on D-Bus. That should
include also routes such as device routes. Note, this commit changes
that we now expose device routes on D-Bus too.
For kernel, route ID compare identical according to NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID.
Well, mostly. In practice, NM ignores several route properties that
kernel considers part of the ID too. This leaves the possibility that
kernel allows addition of two routes that compare idential for
NetworkManager.
Anyway, NMIP4Config/NMIP6Config should use the same equality as platform
cache. Otherwise, there is the odd situation that ip-config merges routes
that are treated as different by kernel.
For IP addresses the ID operator already corresponded to what kernel
does. There is no change for addresses.
Note that NMSettingIPConfig also uses a different algorithm for
comparing routes. But that doesn't really matter here, it it differed
before too.
Change the output of nm_platform_error_to_string() to print the numeric value.
Also, accept a string buffer instead of using an alloca() allocated buffer.
There is still a macro to provide the previous functionality, but it
was ill-suited to call from inside a loop.
Let nm_platform_ip_route_add() and friends return an NMPlatformError
failure reason.
Also, do_add_addrroute() did not return the response from kernel.
Instead, it determined success/failure based on the presence of the
object in the cache. That is racy and does not allow to give a failure
reason from kernel.
Instead, determine success solely based on the netlink reply from
kernel. The received errno shall be authorative, there is no need
to second guess the response.
There is a problem that netlink is not a reliable protocol. In case
of receive buffer overflow, the response is lost and we don't know
whether the command succeeded (it likely did). It's unclear how to fix
that, but for now just return "unspecified" error. We probably avoid
that already by having a huge buffer size.
Also, downgrade the error message to <warn> level. <error> is really
for bugs only.
Inspired from iproute2. As such, don't use libnl3's "struct nl_msg", but
add _nl_addattr_l() and use a stack-allocated "struct nlmsghdr". With
this, we are closer to the raw netlink API. It really is simple enough.
The complicated part of the patch is that we re-use the existing netlink
socket for events. Hence, we must process the socket via our common
event_handler_recvmsgs(). That also means, that we get the netlink
response a few layers down the stack and have to return the result
via DelayedActionWaitForNlResponseData.
Previously, we would add exclusive routes via netlink message flags
NLM_F_CREATE | NLM_F_REPLACE for RTM_NEWROUTE. Similar to `ip route replace`.
Using that form of RTM_NEWROUTE message, we could only add a certain
route with a certain network/plen,metric triple once. That was already
hugely inconvenient, because
- when configuring routes, multiple (managed) interfaces may get
conflicting routes (multihoming). Only one of the routes can be actually
configured using `ip route replace`, so we need to track routes that are
currently shadowed.
- when configuring routes, we might replace externally configured
routes on unmanaged interfaces. We should not interfere with such
routes.
That was worked around by having NMRouteManager (and NMDefaultRouteManager).
NMRouteManager would keep a list of the routes which NetworkManager would like
to configure, even if momentarily being unable to do so due to conflicting routes.
This worked mostly well but was complicated. It involved bumping metrics to
avoid conflicts for device routes, as we might require them for gateway routes.
Drop that now. Instead, use the corresponding of `ip route append` to configure
routes. This allows NetworkManager to confiure (almost) all routes that we care.
Especially, it can configure all routes on a managed interface, without
replacing/interfering with routes on other interfaces. Hence, NMRouteManager
becomes obsolete.
It practice it is a bit more complicated because:
- when adding an IPv4 address, kernel will automatically create a device route
for the subnet. We should avoid that by using the IFA_F_NOPREFIXROUTE flag for
IPv4 addresses (still to-do). But as kernel may not support that flag for IPv4
addresses yet (and we don't require such a kernel yet), we still need functionality
similar to nm_route_manager_ip4_route_register_device_route_purge_list().
This functionality is now handled via nm_platform_ip4_dev_route_blacklist_set().
- trying to configure an IPv6 route with a source address will be rejected
by kernel as long as the address is tentative (see related bug rh#1457196).
Preferably, NMDevice would keep the list of routes which should be configured,
while kernel would have the list of what actually is configured. There is a
feed-back loop where both affect each other (for example, when externally deleting
a route, NMDevice must forget about it too). Previously, NMRouteManager would have
the task of remembering all routes which we currently want to configure, but cannot
due to conflicting routes.
We get rid of that, because now we configure non-exclusive routes. We however still
will need to remember IPv6 routes with a source address, that currently cannot be
configured yet. Hence, we will need to keep track of routes that
currently cannot be configured, but later may be.
That is still not done yet, as NMRouteManager didn't handle this
correctly either.
Rename to nm_platform_ip_address_flush(), it's more consistent with naming
for other platform functions.
Also, pass an address family argument. Sometimes I feel an option makes it clearer
what the function does. Otherwise, from the name it's not clear which address
families are affected. As an API, it feels more correct to me.
We soon also get a nm_platform_ip_route_flush() function, which will
look similar.
Until now, NetworkManager's platform cache for routes used the quadruple
network/plen,metric,ifindex for equaliy. That is not kernel's
understanding of how routes behave. For example, with `ip route append`
you can add two IPv4 routes that only differ by their gateway. To
the previous form of platform cache, these two routes would wrongly
look identical, as the cache could not contain both routes. This also
easily leads to cache-inconsistencies.
Now that we have NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID, fix the route's
compare operator to match kernel's.
Well, not entirely. Kernel understands more properties for routes then
NetworkManager. Some of these properties may also be part of the ID according
to kernel. To NetworkManager such routes would still look identical as
they only differ in a property that is not understood. This can still
cause cache-inconsistencies. The only fix here is to add support for
all these properties in NetworkManager as well. However, it's less serious,
because with this commit we support several of the more important properties.
See also the related bug rh#1337855 for kernel.
Another difficulty is that `ip route replace` and `ip route change`
changes an existing route. The replaced route has the same
NM_PLATFORM_IP_ROUTE_CMP_TYPE_WEAK_ID, but differ in the actual
NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID:
# ip -d -4 route show dev v
# ip monitor route &
# ip route add 192.168.5.0/24 dev v
192.168.5.0/24 dev v scope link
# ip route change 192.168.5.0/24 dev v scope 10
192.168.5.0/24 dev v scope 10
# ip -d -4 route show dev v
unicast 192.168.5.0/24 proto boot scope 10
Note that we only got one RTM_NEWROUTE message, although from NMPCache's
point of view, a new route (with a particular ID) was added and another
route (with a different ID) was deleted. The cumbersome workaround is,
to keep an ordered list of the routes, and figure out which route was
replaced in response to an RTM_NEWROUTE. In absence of bugs, this should
work fine. However, as we only rely on events, we might wrongly
introduce a cache-inconsistancy as well. See the related bug rh#1337860.
Also drop nm_platform_ip4_route_get() and the like. The ID of routes
is complex, so it makes little sense to look up a route directly.
Via the flags of the RTM_NEWROUTE netlink message, kernel and iproute2
support various variants to add a route.
- ip route add
- ip route change
- ip route replace
- ip route prepend
- ip route append
- ip route test
Previously, our nm_platform_ip4_route_add() function was basically
`ip route replace`. In the future, we should rather user `ip route
append` instead.
Anyway, expose the netlink message flags in the API. This allows to
use the various forms, and makes it also more apparent to the user that
they even exist.
- kernel ignores rtm_tos for IPv6 routes. While iproute2 accepts it,
let libnm reject TOS attribute for routes as well.
- move the tos field from NMPlatformIPRoute to NMPlatformIP4Route.
- the tos field is part of the weak-id of an IPv4 route. Meaning,
`ip route add` can add routes that only differ by their TOS.
There are various notions of how to compare routes. Collect them all
in nm_platform_ip4_route_cmp(), nm_platform_ip4_route_hash(),
nm_platform_ip6_route_cmp(), and nm_platform_ip6_route_hash().
This way, we have them side-by-side, which makes the differences more
discoverable.
The mss (advmss, RTA_METRICS.RTAX_ADVMSS) is in a way part of
the ID for IPv4 routes. That is, you can add multiple IPv4 routes, that
only differ by mss.
On the other hand, that is not the case for IPv6. Two IPv6 routes
that only differ by mss are considered the same.
Another issue is, that you cannot selectively delete an IPv4 route based
on the mss:
ip netns del x
ip netns add x
IP() {
ip netns exec x ip "$@"
}
IP link add type veth
IP link set veth0 name v
IP link set veth1 up
IP link set v up
IP route append 192.168.7.0/24 dev v advmss 6
IP route append 192.168.7.0/24 dev v advmss 7
IP -d route show dev v
IP route delete 192.168.7.0/24 dev v advmss 7
IP -d route show dev v
It seems for deleting routes, kernel ignores mss (which doesn't really
matter for IPv6, but does so for IPv4).
Refactor _nl_msg_new_route() to obtain the route scope (rtm_scope)
from the NMPObject, instead of a separate argument.
That way, when deleting an IPv4 route, we don't pick the first route
that matches (RT_SCOPE_NOWHERE), but use the actual scope of the route
that we want to delete. That matters, if there are more then one
otherwise identical routes that only differ by their scope.
For kernel, the scope of IPv6 routes is always global
(RT_SCOPE_UNIVERSE).
Also, during ip4_route_add() initialize the intermediate @obj to have
the values as we expect them after adding the route. That is necessary
to use it in _nl_msg_new_route(). But also nicer for consistency.
Also, move the scope_inv field in NMPlatformIP4Route to let the other
in_addr_t fields life side by side.
_nl_msg_new_route() should not get extra arguments, but instead
use all parameters from the NMPObject argument. This will allow
during nm_platform_ip_route_delete() to pick the exact route
that should be deleted.
Also, in ip4_route_add()/ip6_route_add(), keep the stack-allocated
@obj object consistent with what we expect to add. That is, set
the rt_source argument to the value of what the route will have
after kernel adds it. That might be necessary, because
do_add_addrroute() searches the cache for @obj.
Routes are complicated.
`ip route add` and `ip route append` behaves differently with respect to
determine whether an existing route is idential or not.
Extend the cmp() and hash() functions to have a compare type, that
covers the different semantics.
Contrary to addresses, routes have no ID. When deleting a route,
you cannot just specify certain properties like network/plen,metric.
Well, actually you can specify only certain properties, but then kernel
will treat unspecified properties as wildcard and delete the first matching
route. That is not something we want, because we need to be in control which
exact route shall be deleted.
Also, rtm_tos *must* match. Even if we like the wildcard behavior,
we would need to pass TOS to nm_platform_ip4_route_delete() to be
able to delete routes with non-zero TOS. So, while certain properties
may be omitted, some must not. See how test_ip4_route_options() was
broken.
For NetworkManager it only makes ever sense to call delete on a route,
if the route is already fully known. Which means, we only delete routes
that we have already in the platform cache (otherwise, how would we know
that there is something to delete). Because of that, no longer have separate
IPv4 and IPv6 functions. Instead, have nm_platform_ip_route_delete() which
accepts a full NMPObject from the platform cache.
The code in core doesn't jet make use of this new functionality. It will
in the future.
At least, it fixes deleting routes with differing TOS.
Reasons:
- it adds an O(1) lookup index for accessing NMIPxConfig's addresses.
Hence, operations like merge/intersect have now runtime O(n) instead
of O(n^2).
Arguably, we expect low numbers of addresses in general. For low
numbers, the O(n^2) doesn't matter and quite likely in those cases
the previous implementation was just fine -- maybe even faster.
But the simple case works fine either way. It's important to scale
well in the exceptional case.
- the tracked objects can be shared between the various NMPI4Config,
NMIP6Config instances with NMPlatform and everybody else.
- the NMPObject can be treated generically, meaning it enables code to
handle both IPv4 and IPv6, or addresses and routes. See for example
_nm_ip_config_add_obj().
- I want core to evolve to somewhere where we don't keep copies of
NMPlatformIP4Address, et al. instances. Instead they shall all be
shared. I hope this will reduce memory consumption (although tracking a
reference consumes some memory too). Also, it shortcuts nmp_object_equal()
when comparing the same object. Calling nmp_object_equal() on the
identical objects would be a common case after the hash function
pre-evaluates equality.
and refactor NMFakePlatform to also track links via NMPCache.
For one, now NMFakePlatform also tests NMPCache, increasing the
coverage of what we care about.
Also, all our NMPlatform implementations now use NMPObject and NMPCache.
That means, we can expose those as part of the public API. Which is
great, because callers can keep a reference to the NMPObject object
and make use of generic functions like nmp_object_to_string().
And move some code from NMLinuxPlatform to NMPlatform, where it belongs.
The advantage is that we reuse (and test!) the NMPCache implementation for
tracking addresses.
Also, we now always expose proper NMPObjects from both linux and fake
platform.
For example,
obj = NMP_OBJECT_UP_CAST (nm_platform_ip4_address_get (...));
will work as expected. Also, the caller is now by NMPlatform API
allowed to take and keep a reference to the returned objects.
NMPlatform's cache should be directly accessible to the users,
at least the NMPLookup part and the fact that the cache contains
ref-counted, immutable NMPObjects.
This allows users to inspect the cache with zero overhead. Meaning,
they can obtain an NMDedupMultiHeadEntry and iterate the objects
themself. It also means, the are free to take and keep references
of the NMPObject instances (of course, without modifying them!).
NMFakePlatform will use the very same cache. The fake platform should
only differ when modifying the objects.
Another reason why this makes sense is because NMFakePlatform is for one
a test-stup but also tests behavior of platform itself. Using a separate
internal implementation for the caching is a pointless excecise, because
only the real NMPCache's implementation really matters for production.
So, either NMFakePlatform behaves idential, or it is buggy. Reuse it.
Port fake platform's tracking of routes to NMPCache and move duplicate
code from NMLinuxPlatform to the base class.
This commit only ports IP routes, eventually also addresses and links
should be tracked via the NMPCache instance.
We want to move the multi_idx from NMLinuxPlatform to NMPlatform,
so that it can be used by NMFakePlatform as well. For that, we need
to know whether NMPlatform will use udev or not. Add a constrctor
property.