Rework to use nm_platform_ip_route_sync() broke to fail
activation when we were unable to configure a route.
Fix it. As before, we only do this for routes that
are configured manually by the user. Invalid routes from
DHCP do not break activation.
Also, improve logging to give a hint what's wrong.
Change the output of nm_platform_error_to_string() to print the numeric value.
Also, accept a string buffer instead of using an alloca() allocated buffer.
There is still a macro to provide the previous functionality, but it
was ill-suited to call from inside a loop.
Let nm_platform_ip_route_add() and friends return an NMPlatformError
failure reason.
Also, do_add_addrroute() did not return the response from kernel.
Instead, it determined success/failure based on the presence of the
object in the cache. That is racy and does not allow to give a failure
reason from kernel.
Instead, determine success solely based on the netlink reply from
kernel. The received errno shall be authorative, there is no need
to second guess the response.
There is a problem that netlink is not a reliable protocol. In case
of receive buffer overflow, the response is lost and we don't know
whether the command succeeded (it likely did). It's unclear how to fix
that, but for now just return "unspecified" error. We probably avoid
that already by having a huge buffer size.
Also, downgrade the error message to <warn> level. <error> is really
for bugs only.
Inspired from iproute2. As such, don't use libnl3's "struct nl_msg", but
add _nl_addattr_l() and use a stack-allocated "struct nlmsghdr". With
this, we are closer to the raw netlink API. It really is simple enough.
The complicated part of the patch is that we re-use the existing netlink
socket for events. Hence, we must process the socket via our common
event_handler_recvmsgs(). That also means, that we get the netlink
response a few layers down the stack and have to return the result
via DelayedActionWaitForNlResponseData.
For IPv6 addresses we use IFA_F_NOPREFIXROUTE for a long time.
If we detect that kernel does not support the flag (for IPv6), we
add addresses as /128 to prevent kernel from adding an onlink route.
We add IPv6 device routes explicitly, whenever needed according
to the onlink RA flag.
For IPv4, we also don't want the route added by kernel. The reason is
that is has an undesired metric of zero. However, usually we want the route
to have a different metric. The complicated part is that kernel does
not add the route immediately but sometimes later. For that we have
nm_platform_ip4_dev_route_blacklist_set() (previously that was
nm_route_manager_ip4_route_register_device_route_purge_list()). It
watches the interface and when a registered device route shows up,
it deletes it.
The better solution is to use the IFA_F_NOPREFIXROUTE flag to prevent
the creation of the route in the first place. It was added for IPv4 to
kernel in commit 7b1311807f3d3eb8bef3ccc53127838b3bea3771, October 2015.
Contrary to IPv6, we cannot (easily) detect whether kernel supports IFA_F_NOPREFIXROUTE
for IPv4 routes. Hence keep nm_platform_ip4_dev_route_blacklist_set() for older
kernels.
- cache the result in NMPlatformPrivate. No need to call the virtual
function every time. The result is not ever going to change.
- if we are unable to detect support, assume support. Those features
were added quite a while ago to kernel, we should default to "support".
Note, that we detect support based on the presence of the absence of
certain netlink flags. That means, we will still detect no support.
The only moment when we actually use the fallback value, is when we
didn't encounter an RTM_NEWADDR or AF_INET6-IFLA_AF_SPEC message yet,
which would be very unusual, because we fill the cache initially and
usually will have some addresses there.
- for no strong reason, track "undetected" as numerical value zero,
and "support"/"no-support" as 1/-1. We already did that previously for
_support_user_ipv6ll, so this just unifies the implementations.
The minor reason is that this puts @_support_user_ipv6ll to the BSS
section and allows us to omit initializing priv->check_support_user_ipv6ll_cached
in platforms constructor.
- detect _support_kernel_extended_ifa_flags also based on IPv4
RTM_NEWADDR messages. Originally, extended flags were added for IPv6,
and later to IPv4 as well. Once we see an IPv4 message with IFA_FLAGS,
we know we have support.
Previously, we would add exclusive routes via netlink message flags
NLM_F_CREATE | NLM_F_REPLACE for RTM_NEWROUTE. Similar to `ip route replace`.
Using that form of RTM_NEWROUTE message, we could only add a certain
route with a certain network/plen,metric triple once. That was already
hugely inconvenient, because
- when configuring routes, multiple (managed) interfaces may get
conflicting routes (multihoming). Only one of the routes can be actually
configured using `ip route replace`, so we need to track routes that are
currently shadowed.
- when configuring routes, we might replace externally configured
routes on unmanaged interfaces. We should not interfere with such
routes.
That was worked around by having NMRouteManager (and NMDefaultRouteManager).
NMRouteManager would keep a list of the routes which NetworkManager would like
to configure, even if momentarily being unable to do so due to conflicting routes.
This worked mostly well but was complicated. It involved bumping metrics to
avoid conflicts for device routes, as we might require them for gateway routes.
Drop that now. Instead, use the corresponding of `ip route append` to configure
routes. This allows NetworkManager to confiure (almost) all routes that we care.
Especially, it can configure all routes on a managed interface, without
replacing/interfering with routes on other interfaces. Hence, NMRouteManager
becomes obsolete.
It practice it is a bit more complicated because:
- when adding an IPv4 address, kernel will automatically create a device route
for the subnet. We should avoid that by using the IFA_F_NOPREFIXROUTE flag for
IPv4 addresses (still to-do). But as kernel may not support that flag for IPv4
addresses yet (and we don't require such a kernel yet), we still need functionality
similar to nm_route_manager_ip4_route_register_device_route_purge_list().
This functionality is now handled via nm_platform_ip4_dev_route_blacklist_set().
- trying to configure an IPv6 route with a source address will be rejected
by kernel as long as the address is tentative (see related bug rh#1457196).
Preferably, NMDevice would keep the list of routes which should be configured,
while kernel would have the list of what actually is configured. There is a
feed-back loop where both affect each other (for example, when externally deleting
a route, NMDevice must forget about it too). Previously, NMRouteManager would have
the task of remembering all routes which we currently want to configure, but cannot
due to conflicting routes.
We get rid of that, because now we configure non-exclusive routes. We however still
will need to remember IPv6 routes with a source address, that currently cannot be
configured yet. Hence, we will need to keep track of routes that
currently cannot be configured, but later may be.
That is still not done yet, as NMRouteManager didn't handle this
correctly either.
Kernel does not allow to add an IPv4 route with rt_scope RT_SCOPE_NOWHERE
(255). It would fail with EINVAL.
While adding a route, we coerce/normalize the scope in
nm_platform_ip_route_normalize(). However, that should only be
done, if the scope is not explicitly set already. Otherwise,
leave it unchanged.
nm_platform_ip_route_normalize() is related to the compare functions.
Several compare modes do a fuzzy comparison, and they should compare
equal as if they would be normalized. Hence, we must do the same
normalization there.
One pecularity in NetworkManager is that we track scope as it's
inverse. The reason is to have a default value of zero meaning
RT_SCOPE_NOWHERE. Hence "scope_inv".
Rename to nm_platform_ip_address_flush(), it's more consistent with naming
for other platform functions.
Also, pass an address family argument. Sometimes I feel an option makes it clearer
what the function does. Otherwise, from the name it's not clear which address
families are affected. As an API, it feels more correct to me.
We soon also get a nm_platform_ip_route_flush() function, which will
look similar.
In several cases, does the route compare function a fuzzy match, to get
the result as what would happen if you add that route to kernel.
The rt_source enum contains some NetworkManager specific values which
are mapped to a certain rtm_protocol value. Especially, when adding
a route to kernel, the resulting value will be coerced (and end up being
different).
We must take this coercion into account.
Until now, NetworkManager's platform cache for routes used the quadruple
network/plen,metric,ifindex for equaliy. That is not kernel's
understanding of how routes behave. For example, with `ip route append`
you can add two IPv4 routes that only differ by their gateway. To
the previous form of platform cache, these two routes would wrongly
look identical, as the cache could not contain both routes. This also
easily leads to cache-inconsistencies.
Now that we have NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID, fix the route's
compare operator to match kernel's.
Well, not entirely. Kernel understands more properties for routes then
NetworkManager. Some of these properties may also be part of the ID according
to kernel. To NetworkManager such routes would still look identical as
they only differ in a property that is not understood. This can still
cause cache-inconsistencies. The only fix here is to add support for
all these properties in NetworkManager as well. However, it's less serious,
because with this commit we support several of the more important properties.
See also the related bug rh#1337855 for kernel.
Another difficulty is that `ip route replace` and `ip route change`
changes an existing route. The replaced route has the same
NM_PLATFORM_IP_ROUTE_CMP_TYPE_WEAK_ID, but differ in the actual
NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID:
# ip -d -4 route show dev v
# ip monitor route &
# ip route add 192.168.5.0/24 dev v
192.168.5.0/24 dev v scope link
# ip route change 192.168.5.0/24 dev v scope 10
192.168.5.0/24 dev v scope 10
# ip -d -4 route show dev v
unicast 192.168.5.0/24 proto boot scope 10
Note that we only got one RTM_NEWROUTE message, although from NMPCache's
point of view, a new route (with a particular ID) was added and another
route (with a different ID) was deleted. The cumbersome workaround is,
to keep an ordered list of the routes, and figure out which route was
replaced in response to an RTM_NEWROUTE. In absence of bugs, this should
work fine. However, as we only rely on events, we might wrongly
introduce a cache-inconsistancy as well. See the related bug rh#1337860.
Also drop nm_platform_ip4_route_get() and the like. The ID of routes
is complex, so it makes little sense to look up a route directly.
Via the flags of the RTM_NEWROUTE netlink message, kernel and iproute2
support various variants to add a route.
- ip route add
- ip route change
- ip route replace
- ip route prepend
- ip route append
- ip route test
Previously, our nm_platform_ip4_route_add() function was basically
`ip route replace`. In the future, we should rather user `ip route
append` instead.
Anyway, expose the netlink message flags in the API. This allows to
use the various forms, and makes it also more apparent to the user that
they even exist.
- kernel ignores rtm_tos for IPv6 routes. While iproute2 accepts it,
let libnm reject TOS attribute for routes as well.
- move the tos field from NMPlatformIPRoute to NMPlatformIP4Route.
- the tos field is part of the weak-id of an IPv4 route. Meaning,
`ip route add` can add routes that only differ by their TOS.
There are various notions of how to compare routes. Collect them all
in nm_platform_ip4_route_cmp(), nm_platform_ip4_route_hash(),
nm_platform_ip6_route_cmp(), and nm_platform_ip6_route_hash().
This way, we have them side-by-side, which makes the differences more
discoverable.
It is valid to set "lock" for a property with numeric value 0.
# ip route append 192.168.7.0/24 dev bond0 window lock 0
# ip route append 192.168.7.0/24 dev bond0 window 0
# ip route append 192.168.7.0/24 dev bond0 window lock 10
# ip route append 192.168.7.0/24 dev bond0 window 10
# ip -4 -d route show dev bond0
unicast 192.168.7.0/24 proto boot scope link linkdown
unicast 192.168.7.0/24 proto boot scope link linkdown
unicast 192.168.7.0/24 proto boot scope link linkdown window lock 10
unicast 192.168.7.0/24 proto boot scope link linkdown window 10
Our to-string methods should accurately print the content of
the routes. Note that iproute2 fails to do so too.
The mss (advmss, RTA_METRICS.RTAX_ADVMSS) is in a way part of
the ID for IPv4 routes. That is, you can add multiple IPv4 routes, that
only differ by mss.
On the other hand, that is not the case for IPv6. Two IPv6 routes
that only differ by mss are considered the same.
Another issue is, that you cannot selectively delete an IPv4 route based
on the mss:
ip netns del x
ip netns add x
IP() {
ip netns exec x ip "$@"
}
IP link add type veth
IP link set veth0 name v
IP link set veth1 up
IP link set v up
IP route append 192.168.7.0/24 dev v advmss 6
IP route append 192.168.7.0/24 dev v advmss 7
IP -d route show dev v
IP route delete 192.168.7.0/24 dev v advmss 7
IP -d route show dev v
It seems for deleting routes, kernel ignores mss (which doesn't really
matter for IPv6, but does so for IPv4).
Refactor _nl_msg_new_route() to obtain the route scope (rtm_scope)
from the NMPObject, instead of a separate argument.
That way, when deleting an IPv4 route, we don't pick the first route
that matches (RT_SCOPE_NOWHERE), but use the actual scope of the route
that we want to delete. That matters, if there are more then one
otherwise identical routes that only differ by their scope.
For kernel, the scope of IPv6 routes is always global
(RT_SCOPE_UNIVERSE).
Also, during ip4_route_add() initialize the intermediate @obj to have
the values as we expect them after adding the route. That is necessary
to use it in _nl_msg_new_route(). But also nicer for consistency.
Also, move the scope_inv field in NMPlatformIP4Route to let the other
in_addr_t fields life side by side.
_nl_msg_new_route() should not get extra arguments, but instead
use all parameters from the NMPObject argument. This will allow
during nm_platform_ip_route_delete() to pick the exact route
that should be deleted.
Also, in ip4_route_add()/ip6_route_add(), keep the stack-allocated
@obj object consistent with what we expect to add. That is, set
the rt_source argument to the value of what the route will have
after kernel adds it. That might be necessary, because
do_add_addrroute() searches the cache for @obj.
Routes are complicated.
`ip route add` and `ip route append` behaves differently with respect to
determine whether an existing route is idential or not.
Extend the cmp() and hash() functions to have a compare type, that
covers the different semantics.
Contrary to addresses, routes have no ID. When deleting a route,
you cannot just specify certain properties like network/plen,metric.
Well, actually you can specify only certain properties, but then kernel
will treat unspecified properties as wildcard and delete the first matching
route. That is not something we want, because we need to be in control which
exact route shall be deleted.
Also, rtm_tos *must* match. Even if we like the wildcard behavior,
we would need to pass TOS to nm_platform_ip4_route_delete() to be
able to delete routes with non-zero TOS. So, while certain properties
may be omitted, some must not. See how test_ip4_route_options() was
broken.
For NetworkManager it only makes ever sense to call delete on a route,
if the route is already fully known. Which means, we only delete routes
that we have already in the platform cache (otherwise, how would we know
that there is something to delete). Because of that, no longer have separate
IPv4 and IPv6 functions. Instead, have nm_platform_ip_route_delete() which
accepts a full NMPObject from the platform cache.
The code in core doesn't jet make use of this new functionality. It will
in the future.
At least, it fixes deleting routes with differing TOS.
and refactor NMFakePlatform to also track links via NMPCache.
For one, now NMFakePlatform also tests NMPCache, increasing the
coverage of what we care about.
Also, all our NMPlatform implementations now use NMPObject and NMPCache.
That means, we can expose those as part of the public API. Which is
great, because callers can keep a reference to the NMPObject object
and make use of generic functions like nmp_object_to_string().
And move some code from NMLinuxPlatform to NMPlatform, where it belongs.
The advantage is that we reuse (and test!) the NMPCache implementation for
tracking addresses.
Also, we now always expose proper NMPObjects from both linux and fake
platform.
For example,
obj = NMP_OBJECT_UP_CAST (nm_platform_ip4_address_get (...));
will work as expected. Also, the caller is now by NMPlatform API
allowed to take and keep a reference to the returned objects.