Currently, there is a directy one to one relation between
- DELAYED_ACTION_TYPE_REFRESH_ALL_*
- REFRESH_ALL_TYPE_*
- NMP_OBJECT_TYPE_*
For IP addresses, routes and routing policy rules, when we request
a refresh-all (NLM_F_DUMP), we want to specify the address family.
For addresses and routes that is currently solved by having two
sets of NMPObject types, for each address family one.
I think that is cumbersome because the implementations of both address
families are quite similar. By implementing both families as different
object types, we have a lot of duplicate code and it's hard to see where
the families actually differ. It would be better to have only one NMPObject
type, but then when we "refresh-all" such types, we still want to be able
to dump all (AF_UNSPEC) or only a particular address family (AF_INET, AF_INET6).
Decouple REFRESH_ALL_TYPE_* from NMP_OBJECT_TYPE_* to make that
possible.
This allows the compiler to see that this function does nothing for %NULL.
That is not so unusual, as we use nm_auto_nmpobj to free objects. Often
the compiler can see that these pointers are %NULL.
Until now, all implemented NMPObject types have an ifindex field (from
links, addresses, routes, qdisc to tfilter).
The NMPObject structure contains a union of all available types, that
makes it easier to down-case from an NMPObject pointer to the actual
content.
The "object" field of NMPObject of type NMPlatformObject is the lowest
common denominator.
We will add NMPlatformRoutingRules (for policy routing rules). That type
won't have an ifindex field.
Hence, drop the "ifindex" field from NMPlatformObject type. But also add
a new type NMPlatformObjWithIfindex, that can represent all types that
have an ifindex.
We already have NMPLookup to express a set of objects that can be looked
up via the multi-index.
Change the behavior of nmp_cache_dirty_set_all(), not to accept an
object-type. Instead, make it generally useful and accept a lookup
instance that is used to filter the elements that should be set as
dirty.
We still don't use getnameinfo(). This is used for logging,
where we want to see a string representation that is as close
as possible to the actual bytes (to spot differences). It should
not be obfuscated by a libc function out of our control.
Also fix the notation for the IPv6 scope ID to use the common '%'
character.
Add cmp/hash functions that correctly honor the well known fields, instead
of doing memcmp/memcpy of the entire sockaddr structure.
Also, move the set function to nm_sock_addr_union_cpy() and
nm_sock_addr_union_cpy_untrusted(). This also gets it right
to ensure all bytes of the union are initialized (to zero).
- previously, parsing wireguard genl data resulted in memory corruption:
- _wireguard_update_from_allowedips_nla() takes pointers to
allowedip = &g_array_index (buf->allowedips, NMWireGuardAllowedIP, buf->allowedips->len - 1);
but resizing the GArray will invalidate this pointer. This happens
when there are multiple allowed-ips to parse.
- there was some confusion who owned the allowedips pointers.
_wireguard_peers_cpy() and _vt_cmd_obj_dispose_lnk_wireguard()
assumed each peer owned their own chunk, but _wireguard_get_link_properties()
would not duplicate the memory properly.
- rework memory handling for allowed_ips. Now, the NMPObjectLnkWireGuard
keeps a pointer _allowed_ips_buf. This buffer contains the instances for
all peers.
The parsing of the netlink message is the complicated part, because
we don't know upfront how many peers/allowed-ips we receive. During
construction, the tracking of peers/allowed-ips is complicated,
via a CList/GArray. At the end of that, we prettify the data
representation and put everything into two buffers. That is more
efficient and simpler for user afterwards. This moves complexity
to the way how the object is created, vs. how it is used later.
- ensure that we nm_explicit_bzero() private-key and preshared-key. However,
that only works to a certain point, because our netlink library does not
ensure that no data is leaked.
- don't use a "struct sockaddr" union for the peer's endpoint. Instead,
use a combintation of endpoint_family, endpoint_port, and
endpoint_addr.
- a lot of refactoring.
Add support for a new wireguard link type to the platform code. For now
this only covers querying existing links via genetlink and parsing them
into platform objects.
Kernel does not all allow to configure a route via a gateway, if the
gateway is not directly reachable.
For non-manually added routes (e.g. from DHCP), we ignore them as a
server configuration errors. For manually added routes, we try to work
around them.
Note that if the user adds a manual route that references a gateway,
maybe he should be required to also add a matching onlink route for
the gateway (or an address that results in a device-route), otherwise
the configuration could be considered invalid. That was however not
done historically, and also, it seems a rather unhelpful behavior.
NetworkManage should just make it work, not not assume anything is
wrong with the configuration. Similarly, for IPv4, the user could
configure the route as onlink, however, that still requires extra
configuration of which the user might not be aware.
This would apply for example, when a connection has method=auto,
and would obtain the routes automatically. It seems sensible to
allow the user to add a route via the gateway, if he ~knows~ that
this particular network will provide such a configuration via DHCP.
In the past however, we tried not to automatically add a device route,
but instead see whether we will get a suitable route via DHCP. If we
wouldn't get such a route, we would however fail the connection.
However, this is really very hard to get right.
We call ip_config_merge_and_apply() possibly before receiving automatic
IP configuration (commit 7070d17ced, "device: reset
@con_ip6_config on failure before RA"). In this case, we could not yet
configure the route. Instead, we also cannot fail (yet), because we should
wait whether we will receive a route that makes this configuration
feasable.
That is hard to get right. How long should we wait? If we get a DHCP lease
and still cannot add the route, should we fail the IP configuration or wait
longer for another lease? Worse, if we decide to fail the IP configuration,
it might not fail the entire activation. Instead, we would only mark the
current address family as failed. If we later get a DHCP lease, should we
retry to add the route again? -- probably yes. If we still fail, we would
need to keep the IP configuration in failed state, regardless that DHCP
succeeded. Part of the problem is, that we are bad at tracking the
failed state per IP method. So, if manual configuration fails but DHCP
succeeds, we get the state wrong. That should be fixed separately, but it
just shows how hard it is to have this route that we currently cannot
add, and wanting to wait for something that might never come, but still
fail at some point.
Instead, if we cannot add a route due to a missing onlink gateway,
just retry and add the /32 or /128 direct route ourself.
Note that for IPv6 routes that have a "src" address which is still
TENTATIVE, we also cannot currently add the route and retry later.
However, that is fundamentally different, because:
- the configuration here is correct, it's only that the address
didn't yet pass IPv6 DAD and kernel is being unhelpful (rh#1457196).
- we only have to wait a few seconds for DAD to complete or fail.
So, it's easy to implement this sensibly.
Kernel recently got support for exposing TUN/TAP information on netlink
[1], [2], [3]. Add support for it to the platform cache.
The advantage of using netlink is that querying sysctl bypasses the
order of events of the netlink socket. It is out of sync and racy. For
example, platform cache might still think that a tun device exists, but
a subsequent lookup at sysfs might fail because the device was deleted
in the meantime. Another point is, that we don't get change
notifications via sysctl and that it requires various extra syscalls
to read the device information. If the tun information is present on
netlink, put it into the cache. This bypasses checking sysctl while
we keep looking at sysctl for backward compatibility until we require
support from kernel.
Notes:
- we had two link types NM_LINK_TYPE_TAP and NM_LINK_TYPE_TUN. This
deviates from the model of how kernel treats TUN/TAP devices, which
makes it more complicated. The link type of a NMPlatformLink instance
should match what kernel thinks about the device. Point in case,
when parsing RTM_NETLINK messages, we very early need to determine
the link type (_linktype_get_type()). However, to determine the
type of a TUN/TAP at that point, we need to look into nested
netlink attributes which in turn depend on the type (IFLA_INFO_KIND
and IFLA_INFO_DATA), or even worse, we would need to look into
sysctl for older kernel vesions. Now, the TUN/TAP type is a property
of the link type NM_LINK_TYPE_TUN, instead of determining two
different link types.
- various parts of the API (both kernel's sysctl vs. netlink) and
NMDeviceTun vs. NMSettingTun disagree whether the PI is positive
(NM_SETTING_TUN_PI, IFLA_TUN_PI, NMPlatformLnkTun.pi) or inverted
(NM_DEVICE_TUN_NO_PI, IFF_NO_PI). There is no consistent way,
but prefer the positive form for internal API at NMPlatformLnkTun.pi.
- previously NMDeviceTun.mode could not change after initializing
the object. Allow for that to happen, because forcing some properties
that are reported by kernel to not change is wrong, in case they
might change. Of course, in practice kernel doesn't allow the device
to ever change its type, but the type property of the NMDeviceTun
should not make that assumption, because, if it actually changes, what
would it mean?
- note that as of now, new netlink API is not yet merged to mainline Linus
tree. Shortcut _parse_lnk_tun() to not accidentally use unstable API
for now.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1277457
[2] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=1ec010e705934c8acbe7dbf31afc81e60e3d828b
[3] https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=118eda77d6602616bc523a17ee45171e879d1818https://bugzilla.redhat.com/show_bug.cgi?id=1547213https://github.com/NetworkManager/NetworkManager/pull/77
In device_ipx_changed() we remember the addresses for which it appears
that DAD failed. Later, on an idle handler, we process them during
queued_ip6_config_change().
Note that nm_plaform_ip6_address_sync() might very well decide to remove
some or all addresses and re-add them immidiately later. It might do so,
to get the address priority/ordering right. At that point, we already
emit platform signals that the device disappeared, and track them in
dad6_failed_addrs.
Hence, later during queued_ip6_config_change() we must check again
whether the address is really not there and not still doing DAD.
Otherwise, we wrongly claim that DAD failed and remove the address,
generate a new one, and the same issue might happen again.
This makes its prototype compatible with GDestroyNotify so that GCC 8.0
won't warn.
The return value is not used anywhere and the unref() functions typically
don't return any.
We need to pass more alias-types. Instead of having numbered
versions, use variadic number of macro arguments.
Also, fix build failure with old compiler:
In file included from src/nm-ip6-config.c:24:
./src/nm-ip6-config.h:44:29: error: controlling expression type 'typeof (ipconf_iter->current->obj)' (aka 'const void *const') not compatible with any generic association type
*out_address = has_next ? NMP_OBJECT_CAST_IP6_ADDRESS (ipconf_iter->current->obj) : NULL;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: b1810d7a68
We often want to cascade hashing, meaning, to combine the
outcome of various hash functions in a larger hash.
Instead of having each hash function return a guint hash value,
accept a hash state argument. This saves the overhead of initializing
and completing the intermediate hash states.
It also avoids loosing entropy when we reduce the larger hash state
into the intermediate guint hash value.
CC src/devices/src_libNetworkManager_la-nm-device.lo
In file included from src/devices/nm-device.c:45:0:
src/devices/nm-device.c: In function ‘_v4_has_shadowed_routes_detect’:
./src/platform/nmp-object.h:400:54: error: ‘o’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
_obj ? &_NM_CONSTCAST (NMPObject, _obj)->ip4_route : NULL; \
^
src/devices/nm-device.c:2774:19: note: ‘o’ was declared here
const NMPObject *o;
^
nmp_lookup_init_route_visible() was originally named this way, to only return routes
that are nmp_object_is_visible(). However, all routes are visible (as long as they are
nmp_object_is_alive()). Hence, this is a historic misnomer.
Also, passing @only_default FALSE is identical to the
nmp_lookup_init_addrroute() lookup.
So, rename the function to indicate it is a lookup for default routes
only. Also, get rid of the unsupported ifindex argument for which there
is no index.
Until now, NetworkManager's platform cache for routes used the quadruple
network/plen,metric,ifindex for equaliy. That is not kernel's
understanding of how routes behave. For example, with `ip route append`
you can add two IPv4 routes that only differ by their gateway. To
the previous form of platform cache, these two routes would wrongly
look identical, as the cache could not contain both routes. This also
easily leads to cache-inconsistencies.
Now that we have NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID, fix the route's
compare operator to match kernel's.
Well, not entirely. Kernel understands more properties for routes then
NetworkManager. Some of these properties may also be part of the ID according
to kernel. To NetworkManager such routes would still look identical as
they only differ in a property that is not understood. This can still
cause cache-inconsistencies. The only fix here is to add support for
all these properties in NetworkManager as well. However, it's less serious,
because with this commit we support several of the more important properties.
See also the related bug rh#1337855 for kernel.
Another difficulty is that `ip route replace` and `ip route change`
changes an existing route. The replaced route has the same
NM_PLATFORM_IP_ROUTE_CMP_TYPE_WEAK_ID, but differ in the actual
NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID:
# ip -d -4 route show dev v
# ip monitor route &
# ip route add 192.168.5.0/24 dev v
192.168.5.0/24 dev v scope link
# ip route change 192.168.5.0/24 dev v scope 10
192.168.5.0/24 dev v scope 10
# ip -d -4 route show dev v
unicast 192.168.5.0/24 proto boot scope 10
Note that we only got one RTM_NEWROUTE message, although from NMPCache's
point of view, a new route (with a particular ID) was added and another
route (with a different ID) was deleted. The cumbersome workaround is,
to keep an ordered list of the routes, and figure out which route was
replaced in response to an RTM_NEWROUTE. In absence of bugs, this should
work fine. However, as we only rely on events, we might wrongly
introduce a cache-inconsistancy as well. See the related bug rh#1337860.
Also drop nm_platform_ip4_route_get() and the like. The ID of routes
is complex, so it makes little sense to look up a route directly.
NMPCache can preserve the order of the objects. Until now, the order
was however arbitrary. Soon we will require to preserve the order of
routes.
During a dump, force appending new objects at the end. That ensures,
correct ordering during the dump.
Note that we track objects in several distrinct indexes. Those partition the
set of all objects. Outside a dump when receiving events about new objects (e.g.
RTM_NEWROUTE), it is very unclear at which place the new object should be sorted.
It is especially unclear, as an object might move from one partition (of
an index) to another.
In general, a deterministic order will only be useful in one particular
instance: the NMP_CACHE_ID_TYPE_ROUTES_BY_DESTINATION index for routes.
In this case, we will ensure a particular order of the routes.
We already had a nmp_object_id_equal() function. Generally, an equal() function
is more useful then a cmp() function.
However, implementing a cmp() function is about the same effort then implementing
an equal() function. Also, an equal function can be trivially implemented based on
a compare function, but not the other way around.
That means, it is little extra effort to have both an equal() function
and a cmp() function. Add nmp_object_id_cmp(). If only to be
consistent with other code, which also provides both.
Reasons:
- it adds an O(1) lookup index for accessing NMIPxConfig's addresses.
Hence, operations like merge/intersect have now runtime O(n) instead
of O(n^2).
Arguably, we expect low numbers of addresses in general. For low
numbers, the O(n^2) doesn't matter and quite likely in those cases
the previous implementation was just fine -- maybe even faster.
But the simple case works fine either way. It's important to scale
well in the exceptional case.
- the tracked objects can be shared between the various NMPI4Config,
NMIP6Config instances with NMPlatform and everybody else.
- the NMPObject can be treated generically, meaning it enables code to
handle both IPv4 and IPv6, or addresses and routes. See for example
_nm_ip_config_add_obj().
- I want core to evolve to somewhere where we don't keep copies of
NMPlatformIP4Address, et al. instances. Instead they shall all be
shared. I hope this will reduce memory consumption (although tracking a
reference consumes some memory too). Also, it shortcuts nmp_object_equal()
when comparing the same object. Calling nmp_object_equal() on the
identical objects would be a common case after the hash function
pre-evaluates equality.
and refactor NMFakePlatform to also track links via NMPCache.
For one, now NMFakePlatform also tests NMPCache, increasing the
coverage of what we care about.
Also, all our NMPlatform implementations now use NMPObject and NMPCache.
That means, we can expose those as part of the public API. Which is
great, because callers can keep a reference to the NMPObject object
and make use of generic functions like nmp_object_to_string().
And move some code from NMLinuxPlatform to NMPlatform, where it belongs.
The advantage is that we reuse (and test!) the NMPCache implementation for
tracking addresses.
Also, we now always expose proper NMPObjects from both linux and fake
platform.
For example,
obj = NMP_OBJECT_UP_CAST (nm_platform_ip4_address_get (...));
will work as expected. Also, the caller is now by NMPlatform API
allowed to take and keep a reference to the returned objects.
Routes and addresses don't implement cmd_obj_is_visible(),
hence they are always visible, and NMP_CACHE_ID_TYPE_OBJECT_TYPE_VISIBLE_ONLY
is identical to NMP_CACHE_ID_TYPE_OBJECT_TYPE.
Only link objects can be alive but invisible. Still, drop the index
for looking up visible links entirely. Let callers do the filtering,
if they care.
Maintaining an index is expensive.Not so much in term of runtime, but
in term of memory.
Drop some indexes, and require the caller to use a more broad index (and
filter out unwanted elements).
Dropped:
- can no longer lookup visible default-routes by ifindex.
If you care about default-routes, lookup all and search for the
desired ifindex. The overall number of default-routes is expected
to be small.
We drop NMP_CACHE_ID_TYPE_ROUTES_VISIBLE_BY_IFINDEX_WITH_DEFAULT
entirely.
- no longer have a separate index for non-default routes. We
expect that the most routes are non-default routes. So, don't
have an index without default-routes, instead let the caller
just lookup all routes, and reject default-routes themself.
We keep NMP_CACHE_ID_TYPE_ROUTES_VISIBLE_BY_DEFAULT, but it
now no longer tracks non-default routes.
This drops 1 out of 6 route indexes, and modifes another one, so
that we expect that there are almost no entires tracked by it.
Implement the reference counting of NMPObject as part of
NMDedupMultiObj and get rid of NMDedupMultiBox.
With this change, the NMPObject is aware in which NMDedupMultiIndex
instance it is tracked.
- this saves an additional GSlice allocation for the NMDedupMultiBox.
- it is immediately known, whether an NMPObject is tracked by a
certain NMDedupMultiIndex or not. This saves an additional hash
lookup.
- previously, when all idx-types cease to reference an NMDedupMultiObj
instance, it was removed. Now, a tracked objects stays in the
NMDedupMultiIndex until it's last reference is deleted. This possibly
extends the lifetime of the object and we may reuse it better.
- it is no longer possible to add one object to more then one
NMDedupMultiIndex instance. As we anyway want to have only one
instance to deduplicate the objects, this is fine.
- the ref-counting implementation is now part of NMDedupMultiObj.
Previously, NMDedupMultiIndex could also track objects that were
not ref-counted. Hoever, the object anyway *must* implement the
NMDedupMultiObj API, so this flexibility is unneeded and was not
used.
- a downside is, that NMPObject grows by one pointer size, even if
it isn't tracked in the NMDedupMultiIndex. But we really want to
put all objects into the index for sharing and deduplication. So
this downside should be acceptable. Still, code like
nmp_object_stackinit*() needs to handle a larger object.