Instead of using libnl-route-3 library to serialize netlink messages,
construct the netlink messages ourselves.
This has several advantages:
- Creating the netlink message ourself is actually more straight
forward then having an intermediate layer between NM and the kernel.
Now it is immediately clear, how a platform request translates to
a netlink/kernel request.
You can look at the kernel sources how a certain netlink attribute
behaves, and then it's immediately clear how to set that (and vice
versa).
- Older libnl versions might have bugs or missing features for which
we needed to workaround (often by offering a reduced/broken/untested
functionality). Now we can get rid or workaround like _nl_has_capability(),
check_support_libnl_extended_ifa_flags(), HAVE_LIBNL_INET6_TOKEN.
Another example is a libnl bug when setting vlan ingress map which
isn't even yet fixed in libnl upstream.
- We no longer need libnl-route-3 at all and can drop that runtime
requirement, saving some 400k.
Constructing the messages ourselves also gives better performance
because we don't have to create the intermediate libnl object.
- In the future we will add more link-type support which is easier
to support by basing directly on the plain kernel/netlink API,
instead of requiring also libnl3 to expose this functionality.
E.g. adding macvtap support: we already parsed macvtap properties
ourselves because of missing libnl support. To *add* macvtap
support, we also would have to do it ourself (or extend libnl).
Having a static string buffer for convenience is useful not only
for platform. Define the string buffer in NetworkManagerUtils.h,
so that all to-string functions can reuse *one* buffer.
Of course, this has the potential danger, that different
to-string method might reuse the same buffer. Hence, low-level
library functions are adviced to use their own buffer, because
an upper level might already use the global buffer for another
string.
The peer-address (IFA_ADDRESS) can also be all-zero (0.0.0.0).
That is distinct from an usual address without explicit peer-address,
which implicitly has the same peer and local address.
Previously, we treated an all-zero peer_address as having peer and
local address equal. This is especially grave, because the peer is part
of the primary key for an IPv4 address. So we not only get a property of
the address wrong, but we wrongly consider two different addresses as
one and the same.
To properly handle these addresses, we always must explicitly set the peer.
We already have nm_platform_tun_get_properties(). Rename the function
as they both sidestep the platform cache to lookup some link-specific
properties.
For recent kernels, the peer-ifindex of veths is reported as
parent (IFA_LINK). Prefer that over the ethtool lookup.
For one, this avoids the extra ethtool call which has the
downside of sidestepping the platform cache. Also, looking
up the peer-ifindex in ethtool does not report whether the
peer lifes in another netns (NM_PLATFORM_LINK_OTHER_NETNS).
Only use ethtool as fallback for older kernels.
Constructing the libnl3 object only to parse the message
is wasteful. It involves several memory allocations, thread
synchronization and parsing fields that we don't care about.
But moreover, older libnl version might not support all the
fields we are interested in, hence we have workarounds like
_nl_link_parse_info_data(). Certain features might not fully
work unless libnl supports it too (although kernel would).
As we already parse the data ourselves sometimes, just go
all they way and don't construct the intermediate libnl object.
This will allow us to drop the _nl_link_parse_info_data() workarounds
in next commits. That is important, because _nl_link_parse_info_data()
sidesteps our platform cache and is not in sync with the cache (not to
mention the extra work to explicitly refetch the data on every lookup).
Also, it gets us 60% on the way to no longer needing 'libnl-route-3.so'
at all and eventually drop requiring the library.
This function only accesses sysctl function to retrieve the tun-properties.
sysctl is already defined in the base class and equally inherited by linux
and fake platform. Move the implementation there.
Arguably, it is more convenient to use the static buffer as
it saves typing.
But having such a low-level function use a static buffer also
limits the way how to use it. As it was, you could not avoid
using the static buffer.
E.g. you cannot do:
char buf[100];
_LOGD ("nmp-object: %s; platform-link: %s",
nmp_object_to_string (nmpobj, buf, sizeof(buf)),
nm_platform_link_to_string (link));
This will fail for non-obvious reasons because both
to-string functions end up using the same static buffer.
Also change the to-string implementations to accept NULL
as valid and return it as "(null)".
https://bugzilla.gnome.org/show_bug.cgi?id=756427
Kernel allows to add the same IPv4 address that only differs by
peer-address (IFL_ADDRESS):
$ ip link add dummy type dummy
$ ip address add 1.1.1.1 peer 1.1.1.3/24 dev dummy
$ ip address add 1.1.1.1 peer 1.1.1.4/24 dev dummy
RTNETLINK answers: File exists
$ ip address add 1.1.1.1 peer 1.1.2.3/24 dev dummy
$ ip address show dev dummy
2: dummy@NONE: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 52:58:a7:1e:e8:93 brd ff:ff:ff:ff:ff:ff
inet 1.1.1.1 peer 1.1.1.3/24 scope global dummy
valid_lft forever preferred_lft forever
inet 1.1.1.1 peer 1.1.2.3/24 scope global dummy
valid_lft forever preferred_lft forever
We must also consider peer-address, otherwise platform will treat
two different addresses as one and the same.
https://bugzilla.gnome.org/show_bug.cgi?id=756356
The peer-address seems less important then the prefix-length.
Also, nm_platform_ip4_address_delete() has the peer-address
argument as last.
Soon ip4_address_get() also receives a peer-address argument,
so get the order right first.
The parent of a link (IFLA_LINK) can be in another network namespace and
thus invisible to NM.
This requires the netlink attribute IFLA_LINK_NETNSID which is supported
by recent versions of kernel and libnl.
In this case, set the parent field to NM_PLATFORM_LINK_OTHER_NETNS
and properly handle this special case.
Rather than randomly including one or more of <glib.h>,
<glib-object.h>, and <gio/gio.h> everywhere (and forgetting to include
"nm-glib-compat.h" most of the time), rename nm-glib-compat.h to
nm-glib.h, include <gio/gio.h> from there, and then change all .c
files in NM to include "nm-glib.h" rather than including the glib
headers directly.
(Public headers files still have to include the real glib headers,
since nm-glib.h isn't installed...)
Also, remove glib includes from header files that are already
including a base object header file (which must itself already include
the glib headers).
When adding an IPv4 address, kernel will also add a device-route.
We don't want that route because it has the wrong metric. Instead,
we add our own route (with a different metric) and remove the
kernel-added one.
This could be avoided if kernel would support an IPv4 address flag
IFA_F_NOPREFIXROUTE like it does for IPv6 (see related bug rh#1221311).
One important thing is, that we want don't want to manage the
device-route on assumed devices. Note that this is correct behavior
if "assumed" means "do-not-touch".
If "assumed" means "seamlessly-takeover", then this is wrong.
Imagine we get a new DHCP address. In this case, we would not manage
the device-route on the assumed device. This cannot be fixed without
splitting unmanaged/assumed with related bug bgo 746440.
This is no regression as we would also not manage device-routes
for assumed devices previously.
We also don't want to remove the device-route if the user added
it externally. Note that here we behave wrongly too, because we
don't record externally added kernel routes in update_ip_config().
This still needs fixing.
Let IPv4 device-routes also be managed by NMRouteManager. NMRouteManager
has a list of all routes and can properly add, remove, and restore
the device route as needed.
One problem is, that the device-route does not get added immediately
with the address. It only appears some time later. This is solved
by NMRouteManager watching platform and if a matchin device-route shows up
within a short time after configuring addresses, remove it.
If the route appears after the short timeout, assume they were added for
other reasons (e.g. by the user) and don't remove them.
https://bugzilla.gnome.org/show_bug.cgi?id=751264https://bugzilla.redhat.com/show_bug.cgi?id=1211287
Change nm_platform_link_get() to return the cached NMPlatformLink
instance. Now what all our implementations (fake and linux) have such a
cache internal object, let's just expose it directly.
Note that the lifetime of the exposed link object is possibly quite
short. A caller must copy the returned value if he intends to preserve
it for later.
Also add nm_platform_link_get_by_ifname() and modify nm_platform_link_get_by_address()
to return the instance.
Certain functions, such as nm_platform_link_get_name(),
nm_platform_link_get_ifindex(), etc. are solely implemented based
on looking at the returned NMPlatformLink object. No longer implement
them as virtual functions but instead implement them in the base class
(nm-platform.c).
This removes code and eliminates the redundancy of the exposed
NMPlatformLink instance and the nm_platform_link_get_*() accessors.
Thereby also fix a bug in NMFakePlatform that tracked the link address
in a separate "address" field, instead of using "link.addr". That was
a case where the redundancy actually led to a bug in fake platform.
Also remove some stub implementations in NMFakePlatform that just
bail out. Instead allow for a missing virtual functions and perform
the "default" action in the accessor.
An example for that is nm_platform_link_get_permanent_address().
For NMPlatform instances we had an error reporting mechanism
which stores the last error reason in a private field. Later we
would check it via nm_platform_get_error().
Remove this. It was not used much, and it is not a great way
to report errors.
One problem is that at the point where the error happens, you don't
know whether anybody cares about an error code. So, you add code to set
the error reason because somebody *might* need it (but in realitiy, almost
no caller cares).
Also, we tested this functionality which is hardly used in non-testing code.
While this was a burden to maintain in the tests, it was likely still buggy
because there were no real use-cases, beside the tests.
Then, sometimes platform functions call each other which might overwrite the
error reason. So, every function must be cautious to preserve/set
the error reason according to it's own meaning. This can involve storing
the error code, calling another function, and restoring it afterwards.
This is harder to get right compared to a "return-error-code" pattern, where
every function manages its error code independently.
It is better to return the error reason whenever due. For that we already
have our common glib patterns
(1) gboolean fcn (...);
(2) gboolean fcn (..., GError **error);
In few cases, we need more details then a #gboolean, but don't want
to bother constructing a #GError. Then we should do instead:
(3) NMPlatformError fcn (...);
Later remove nm_platform_get_error() and signal errors via return
error codes.
Also, fix nm_platform_infiniband_partition_add() and
nm_platform_vlan_add() to check the type of the existing link
and fail with WRONG_TYPE otherwise.
- rename "NONE" to "SUCCESS", what it really is.
- change the to-string result not to contain spaces
and being closer the name of the enum value.
- add new error reasons "UNSPECIFIED" and "BUG".
- remove the code comments around the enum definition.
They add no further description about why this error
happens and only paraphrase the name of the enum.
- reserve negative integers for 'errno'. This is neat
because if we get a system error we can pass on the
underlying errno as cause.
The @udi field is not a static string, so any user of a NMPlatformLink
instance must make sure not to use the field beyond the lifetime of the
NMPlatformLink instance.
As we pass on the platform link instance during platform changed events,
this is hard to ensure for the subscriber of the signal -- because a
call back into platform could invalidate/modify the object.
Just not expose this field as part of the link instance. The few callers
who actually needed it should instead call nm_platform_get_uid(). With
that, the lifetime of the returned 'const char *' pointer is clearly
defined.
Add a construct-only property NM_PLATFORM_REGISTER_SINGLETON to NMPlatform.
When set to TRUE, the constructor will self-register to nm_platform_setup().
The reason for this is that the _LOG() macro in NMLinuxPlatform logs the
self pointer if the instance is not the singleton instance.
During construction, we already have many log lines due to initialization
of the instance. These lines all end up qualified with the self pointer.
By earlier self-registering, printing the pointer value is omitted.
Yes, this patch is really just to prettify logging.
NMPObject is a simple "object" implemenation around NMPlatformObject.
They are ref-counted and have a class-pointer. Several basic functions
like equality, hash, to-string are implemented.
NMPCache is can be used to store the NMPObject. Objects are indexed
via their primary id, but there is also multi-lookup via NMCacheId
and NMMultiIndex.
Part of the implementation is inside "nm-linux-platform.c",
because it depends on utility functions from there.