Without ifindex, adding the direct route to gateway fails:
platform: route-sync: failure to add IPv6 route: fd02::/64 via fd01::1 dev 1635 metric 101 mss 0 rt-src user: No route to host (113); try adding direct route to gateway fd01::1/128 via :: metric 101 mss 0 rt-src user
platform: route: append IPv6 route: fd01::1/128 via :: metric 101 mss 0 rt-src user
platform-linux: delayed-action: schedule wait-for-nl-response (seq 269, timeout in 0.199999195, response-type 0)
platform-linux: delayed-action: handle wait-for-nl-response (any)
platform-linux: netlink: recvmsg: new message NLMSG_ERROR, flags 0, seq 269
platform-linux: netlink: recvmsg: error message from kernel: No such device (19) for request 269
Fixes: c9f89cafdf
Since commit 78ed0a4a23 (device: add
IPv6 link local address via merge-and-apply) we handle also IPv6 link
local addresses like regular addresses. That is, we also add them during
merge-and-apply and sync them via nm_platform_ip6_address_sync().
ip6-address-sync loops over the platform addresses, to find which
addresses shall be deleted, and which shall be deleted in order to
fix the address order/priority. At that point, we must not ignore
link-local addresses anymore, but handle them too.
Otherwise, during each resync we have link local addresses, and
platform-sync thinks that the address order is wrong. That wrongly
leads to remove most addresses and re-adding them.
Fixes: 78ed0a4a23
For completeness, extend the API to support non-persistant
device. That requires that nm_platform_link_tun_add()
returns the file descriptor.
While NetworkManager doesn't create such devices itself,
it recognizes the IFLA_TUN_PERSIST / IFF_PERSIST flag.
Since ip-tuntap (obviously) cannot create such devices,
we cannot add a test for how non-persistent devices look
in the platform cache. Well, we could instead add them
with ioctl directly, but instead, just extend the platform
API to allow for that.
Also, use the function from test-lldp.c to (optionally) use
nm_platform_link_tun_add() to create the tap device.
Switch from "pi on|off" to optinally printing "pi" to indicate
whether the flag is set. That follows ip-tuntap syntax and is
more familiar:
$ ip tuntap help
Usage: ip tuntap { add | del | show | list | lst | help } [ dev PHYS_DEV ]
[ mode { tun | tap } ] [ user USER ] [ group GROUP ]
[ one_queue ] [ pi ] [ vnet_hdr ] [ multi_queue ] [ name NAME ]
Where: USER := { STRING | NUMBER }
GROUP := { STRING | NUMBER }
Also, print the "persist" flag.
Kernel does not all allow to configure a route via a gateway, if the
gateway is not directly reachable.
For non-manually added routes (e.g. from DHCP), we ignore them as a
server configuration errors. For manually added routes, we try to work
around them.
Note that if the user adds a manual route that references a gateway,
maybe he should be required to also add a matching onlink route for
the gateway (or an address that results in a device-route), otherwise
the configuration could be considered invalid. That was however not
done historically, and also, it seems a rather unhelpful behavior.
NetworkManage should just make it work, not not assume anything is
wrong with the configuration. Similarly, for IPv4, the user could
configure the route as onlink, however, that still requires extra
configuration of which the user might not be aware.
This would apply for example, when a connection has method=auto,
and would obtain the routes automatically. It seems sensible to
allow the user to add a route via the gateway, if he ~knows~ that
this particular network will provide such a configuration via DHCP.
In the past however, we tried not to automatically add a device route,
but instead see whether we will get a suitable route via DHCP. If we
wouldn't get such a route, we would however fail the connection.
However, this is really very hard to get right.
We call ip_config_merge_and_apply() possibly before receiving automatic
IP configuration (commit 7070d17ced, "device: reset
@con_ip6_config on failure before RA"). In this case, we could not yet
configure the route. Instead, we also cannot fail (yet), because we should
wait whether we will receive a route that makes this configuration
feasable.
That is hard to get right. How long should we wait? If we get a DHCP lease
and still cannot add the route, should we fail the IP configuration or wait
longer for another lease? Worse, if we decide to fail the IP configuration,
it might not fail the entire activation. Instead, we would only mark the
current address family as failed. If we later get a DHCP lease, should we
retry to add the route again? -- probably yes. If we still fail, we would
need to keep the IP configuration in failed state, regardless that DHCP
succeeded. Part of the problem is, that we are bad at tracking the
failed state per IP method. So, if manual configuration fails but DHCP
succeeds, we get the state wrong. That should be fixed separately, but it
just shows how hard it is to have this route that we currently cannot
add, and wanting to wait for something that might never come, but still
fail at some point.
Instead, if we cannot add a route due to a missing onlink gateway,
just retry and add the /32 or /128 direct route ourself.
Note that for IPv6 routes that have a "src" address which is still
TENTATIVE, we also cannot currently add the route and retry later.
However, that is fundamentally different, because:
- the configuration here is correct, it's only that the address
didn't yet pass IPv6 DAD and kernel is being unhelpful (rh#1457196).
- we only have to wait a few seconds for DAD to complete or fail.
So, it's easy to implement this sensibly.
Move handling non-NM_IP_CONFIG_SOURCE_USER routes first. These are
routes that were added manually by the user in the connection.
Note that there is no change in behavior, because of how
_err_inval_due_to_ipv6_tentative_pref_src() would only accept
user routes already.
Kernel recently got support for exposing TUN/TAP information on netlink
[1], [2], [3]. Add support for it to the platform cache.
The advantage of using netlink is that querying sysctl bypasses the
order of events of the netlink socket. It is out of sync and racy. For
example, platform cache might still think that a tun device exists, but
a subsequent lookup at sysfs might fail because the device was deleted
in the meantime. Another point is, that we don't get change
notifications via sysctl and that it requires various extra syscalls
to read the device information. If the tun information is present on
netlink, put it into the cache. This bypasses checking sysctl while
we keep looking at sysctl for backward compatibility until we require
support from kernel.
Notes:
- we had two link types NM_LINK_TYPE_TAP and NM_LINK_TYPE_TUN. This
deviates from the model of how kernel treats TUN/TAP devices, which
makes it more complicated. The link type of a NMPlatformLink instance
should match what kernel thinks about the device. Point in case,
when parsing RTM_NETLINK messages, we very early need to determine
the link type (_linktype_get_type()). However, to determine the
type of a TUN/TAP at that point, we need to look into nested
netlink attributes which in turn depend on the type (IFLA_INFO_KIND
and IFLA_INFO_DATA), or even worse, we would need to look into
sysctl for older kernel vesions. Now, the TUN/TAP type is a property
of the link type NM_LINK_TYPE_TUN, instead of determining two
different link types.
- various parts of the API (both kernel's sysctl vs. netlink) and
NMDeviceTun vs. NMSettingTun disagree whether the PI is positive
(NM_SETTING_TUN_PI, IFLA_TUN_PI, NMPlatformLnkTun.pi) or inverted
(NM_DEVICE_TUN_NO_PI, IFF_NO_PI). There is no consistent way,
but prefer the positive form for internal API at NMPlatformLnkTun.pi.
- previously NMDeviceTun.mode could not change after initializing
the object. Allow for that to happen, because forcing some properties
that are reported by kernel to not change is wrong, in case they
might change. Of course, in practice kernel doesn't allow the device
to ever change its type, but the type property of the NMDeviceTun
should not make that assumption, because, if it actually changes, what
would it mean?
- note that as of now, new netlink API is not yet merged to mainline Linus
tree. Shortcut _parse_lnk_tun() to not accidentally use unstable API
for now.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1277457
[2] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=1ec010e705934c8acbe7dbf31afc81e60e3d828b
[3] https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=118eda77d6602616bc523a17ee45171e879d1818https://bugzilla.redhat.com/show_bug.cgi?id=1547213https://github.com/NetworkManager/NetworkManager/pull/77
NMTST_ASSERT_PLATFORM_NETNS_CURRENT() already checks that the current namespace
is correct. Remove the duplicate assertion.
Also, NMP_CACHE_OPS_UNCHANGED is numerically identical to NM_PLATFORM_SIGNAL_NONE.
Use it in the assertion.
Although IFA_F_TEMPORARY is numerically equal to IFA_F_SECONDARY,
their meaning is different. One applies to IPv6 temporary addresses,
and the other to IPv4 secondary addresses.
During _addr_array_clean_expired() we want to ignore and clear
IPv6 temporary addresses, but not IPv4 secondary addresses.
Fixes: f2c4720bca
While the numerical values of IFA_F_SECONDARY and IFA_F_TEMPORARY
are identical, their meaning is not.
IFA_F_SECONDARY is only relevant for IPv4 addresses, while
IFA_F_TEMPORARY is only relevant for IPv6 addresses.
IFA_F_TEMPORARY is automatically set by kernel for the addresses
that it generates as part of IFA_F_MANAGETEMPADDR. It cannot be
actively set by user-space.
IFA_F_SECONDARY is automatically set by kernel depending on the order
in which the addresses for the same subnet are added.
This essentially reverts 8b4f11927 (core: avoid IFA_F_TEMPORARY alias for
IFA_F_SECONDARY).
We want to add addresses in a particular order so that source address
selection works.
Note that @known_addresses contains the desired addresses in order of
least-important first, while @plat_addresses contains them in opposite
order. Previously, this inverted order was not considered, and we
essentially ended up removing and re-adding all addresses every time.
Fix that. While at it, get rid of the O(n^2) runtime complexity, and
make it O(n) by iterating both lists simultaneously.
Temporary addresses (RFC4941) are not handled by NetworkManager directly, but by
kernel. If they are in the @known_addresses list, clear them out early.
They shall be ignored.
Often, we want in API that an input argument is read-only and not modified
by the function call. Not modifying input arguments is a good
convention.
However, in this case there are only two callers, and both clearly do
not care whether the @known_addresses array will be modified.
Clear out addresses that are already expired and enforce that there are
no duplicate addresses. Basically, use @known_addresses for bookkeeping
which addresses are to be ignored.
We do a pre-run that constructs an index of all addresses and drops
addresses that are already expired.
Move this code to a separate function, it will be reused for IPv6.
Also, note that nm_platform_ip4_address_sync() has only 2 callers. Both
callers make sure to not pass duplicate known addresses, because the
addresses also come from a cache. Make that a requirement and assert
against unique addresses. If we would allow duplicate addresses, we would
have to handle them in a defined way (like, dropping the ones with lower
priority). That would be more complicated, and since no caller is
supposed to provide duplicate addresses, don't bother but assert.
nm_utils_lifetime_get() already has so many arguments.
Essentially, the function returned %TRUE if and only if the
lifetime was greater then zero.
Combine the return value and the output argument for the lifetime.
It also matches better the function name: to get the lifetime.
Add a function that allows to re-request all objects of a certain type.
Usually, the cache is supposed to keep itself in a consistent state and
this function is not useful.
It is however useful during testing and debugging to explicitly reload
an object type.
If you ever think to need this function in non-testing code, then
something else is probably wrong with the cache implementation.
Print the "tentative" flags as last. Most other flags, have more the character of
a user configured attribute, while "tentative" reflects the current state of the address.
Previously, we would log
secondary,tentative
and
tentative,mngtmpaddr,noprefixroute
Print the "tenative" flag last. This way, the flag that commonly
will flip by kernel's decision, is consistently printed last.
The @keep_link_local logic was wrong: when set to TRUE we must not
delete addresses and when set to FALSE we must delete addresses only
if they are unknown.
Also, ignore link-local addresses when comparing positions.
Fixes: 19d6d54b6f
nm_platform_ip6_address_sync() must take care not only of adding
missing addresses and removing unknown addresses, but also of the
order in which they are added. The order is important because it
determines which address is preferred by kernel.
Since we can only add addresses at the top of the list, in order to
change the position of an address we must first remove it and then
re-add it in the right position.
@kind might be NULL. There are 3 forms of the hash-update functions for
string: str(), str0(), and strarr().
- str0() is when the string might be NULL.
- str() does not allow the string to be NULL
- strarr() is like str(), except it adds a G_STATIC_ASSERT()
that the argument is a C array.
The reason why a difference between str() and str0() exists, is
because str0() hashes NULL different from a "" or any other string.
This has an overhead, because it effectively must hash another bit
of information that tells whether a string was passed or not.
The reason is, that hashing a tupple of two strings should always
yield a different hash value, even for "aa",""; "a","a"; "","aa",
where naive concatentation would yield identical hash values in all
three cases.
Fixes: e75fc8279b
It only makes sense to call delete() with NMPObjects that
we obtained from the platform cache. Otherwise, if we didn't
get it from the cache in the first place, we wouldn't know
what to delete.
Hence, the input argument is (almost) always an NMPObject
in the first place. That is different from add(), where
we might create a new specific NMPlatform* instance on the
stack. For add() it makes slightly more sense to have different
functions depending on the type. For delete(), it doesn't.
We also do this for libnm, where it causes visible changes
in behavior. But if somebody would rely on the hashing implementation
for hash tables, it would be seriously flawed.
GHashTable optimizes a NULL equality function to use direct pointer
comparison. That saves the overhead of calling g_direct_equal().
This is also documented behavior for g_hash_table_new().
While at it, also don't pass g_direct_hash() but use the default
of %NULL. The behavior is the same, but consistently don't use
g_direct_hash().
The "onlink" flag for IPv4 routes is part of the route ID.
Consider it in nm_platform_ip4_route_cmp().
Also, allow configuring the flag when adding a route.
Note that for IPv6, the onlink flag is still ignored.
Pretty much like kernel does.
and nm_utils_ip6_property_path(). The API with static buffers
looks a bit nicer. But I think they are dangerous, because
we tend to pass the buffer down several layers of the stack, and
it's not immediately clear, that we don't overwrite the static
buffer again (which we probably did not, but it's hard to verify
that there is no bug there).
Setting the MTU failes under regular conditions, for example when
setting the MTU of a master larger then the MTU of the slaves.
Logging a warning it too alarming.
We don't need this extra distinguisher. It makes no sense to ever
compare two routes with a different compare-type.
Also, the number of fields that is hashed already differs between each
compare type. If we have a good hashing algorithm, this already suffices
that the hash value looks largely different.
We often want to cascade hashing, meaning, to combine the
outcome of various hash functions in a larger hash.
Instead of having each hash function return a guint hash value,
accept a hash state argument. This saves the overhead of initializing
and completing the intermediate hash states.
It also avoids loosing entropy when we reduce the larger hash state
into the intermediate guint hash value.
By using a macro, we don't cast all the types to guint. Instead,
we use their native types directly. Hence, we don't need
nm_hash_update_uint64() nor nm_hash_update_ptr().
Also, for types smaller then guint like char, we save hashing
the all zero bytes.