We have two hooks to modify setup of the platform singleton:
nm_linux_platform_setup() and the virtual setup() function.
On the other hand, nm_platform_setup() limits us by accepting
only a GType, instead of a prepeared platform instance.
Make the nm_platform_setup() method more flexible, so that we can
later drop the setup() hook.
(cherry picked from commit a50d77d952)
setup() can be used to initialize a NMPlatform instance that is
registered as singleton via nm_platform_setup(). It should not
be used to initialize the object.
Prior to c6529a9d74, this change was
not possible because constructed() will call back into nm_platform_*()
functions, without having the singleton instance setup.
(cherry picked from commit dc9b25f161)
Most nm_platform_*() functions operate on the platform
singleton nm_platform_get(). That made sense because the
NMPlatform instance was mainly to hook fake platform for
testing.
While the implicit argument saved some typing, I think explicit is
better. Especially, because NMPlatform could become a more usable
object then just a hook for testing.
With this change, NMPlatform instances can be used individually, not
only as a singleton instance.
Before this change, the constructor of NMLinuxPlatform could not
call any nm_platform_*() functions because the singleton was not
yet initialized. We could only instantiate an incomplete instance,
register it via nm_platform_setup(), and then complete initialization
via singleton->setup().
With this change, we can create and fully initialize NMPlatform instances
before/without setting them up them as singleton.
Also, currently there is no clear distinction between functions
that operate on the NMPlatform instance, and functions that can
be used stand-alone (e.g. nm_platform_ip4_address_to_string()).
The latter can not be mocked for testing. With this change, the
distinction becomes obvious. That is also useful because it becomes
clearer which functions make use of the platform cache and which not.
Inside nm-linux-platform.c, continue the pattern that the
self instance is named @platform. That makes sense because
its type is NMPlatform, and not NMLinuxPlatform what we
would expect from a paramter named @self.
This is a major diff that causes some pain when rebasing. Try
to rebase to the parent commit of this commit as a first step.
Then rebase on top of this commit using merge-strategy "ours".
(cherry picked from commit c6529a9d74)
The function names in linux-platform should get better prefixes
indicating whether they are related to libnl or nm objects.
Add a prefix _nlo_ for functions that operate on libnl objects.
(cherry picked from commit 7c5d361c66)
Reorder some functions in nm-platform, so that we first have independent
libnl wrappers/utils, then NMPlatform type definition, and then the
rest.
(cherry picked from commit 850af91f22)
udev doesn't know about the device yet when NM creates it internally.
NetworkManager[9275]: <info> (team0): carrier is OFF
NetworkManager[9275]: <info> (team0): new Team device (driver: 'team' ifindex: 16)
(NetworkManager:9275): GUdev-CRITICAL **: g_udev_device_get_property: assertion 'G_UDEV_IS_DEVICE (device)' failed
NetworkManager[9275]: <info> (team0): exported as /org/freedesktop/NetworkManager/Devices/5
(cherry picked from commit 8173f0f9e8)
Some out of tree drivers add Ethernet devices that are supposed to be managed
by other their tooling, e.g. VirtualBox or VMWare.
Rather than hardcoding their drivers (at least VirtualBox doesn't even set a
"driver" property in sysfs) or hardcoding a logic that identifies such devices
let's just add a possibility to blacklist them in udev. This makes it possible
for whoever who ships such a driver to ship rules that prevent NetworkManager
from managing the device itself.
Furthermore it makes it possible for the user with special needs leverage the
flexibility of udev rules to override the defaults. In the end the user can
decide to let NetworkManager manage default-unmanaged interfaces such as VEth
or turn on default-unmanaged for devices on a particular bus.
An udev rule for VirtualBox would look like this:
SUBSYSTEM=="net", ENV{INTERFACE}=="vboxnet[0-9]*", ENV{NM_UNMANAGED}="1"
(cherry picked from commit 85ee1f4a9c)
There's no udev running in containers, it only starts if /sys is writable. If a
hardware device is added to the container's namespace NM would not announce it.
This also removes the software link special case -- the software links will now
wait for udev initialization (in case udev is there) as well. There's no reason
to treat them differently anymore. This makes it possible to use udev properties
of the software links.
https://bugzilla.gnome.org/show_bug.cgi?id=740526
(cherry picked from commit 4a05869557)
The identifying properties of a route are (in libnl)
.o_id_attrs = (ROUTE_ATTR_FAMILY | ROUTE_ATTR_TOS |
ROUTE_ATTR_TABLE | ROUTE_ATTR_DST |
ROUTE_ATTR_PRIO),
NM ignores routes other then in table RT_TABLE_MAIN and considers
only the tuple 'family,network/plen,metric' as identifying for a route.
We must also ignore routes with TOS non-zero as we cannot
handle those, i.e. we cannot distinguish between them.
(cherry picked from commit af2c0ef771)
This is a well known issue that we cannot convert some libnl
objects to NMPlatformObject. The to-string function for libnl
objects is only used for debug logging. No need to assert.
(cherry picked from commit 8f080747c6)
==1345== Invalid read of size 1
==1345== at 0x827DC15: vfprintf (vfprintf.c:1642)
==1345== by 0x8345D04: __vasprintf_chk (vasprintf_chk.c:66)
==1345== by 0x7F882DB: vasprintf (stdio2.h:210)
==1345== by 0x7F882DB: g_vasprintf (gprintf.c:316)
==1345== by 0x7F6319C: g_strdup_vprintf (gstrfuncs.c:507)
==1345== by 0x7F63258: g_strdup_printf (gstrfuncs.c:533)
==1345== by 0x472833: nm_platform_link_to_string (nm-platform.c:2337)
==1345== by 0x472A05: log_link (nm-platform.c:2754)
==1345== by 0x9DC5D5F: ffi_call_unix64 (unix64.S:76)
==1345== by 0x9DC57D0: ffi_call (ffi64.c:525)
==1345== by 0x7CBA553: g_cclosure_marshal_generic (gclosure.c:1448)
==1345== by 0x7CB9D34: g_closure_invoke (gclosure.c:768)
==1345== by 0x7CCB34B: signal_emit_unlocked_R (gsignal.c:3483)
==1345== Address 0xa91b5a0 is 0 bytes inside a block of size 5 free'd
==1345== at 0x4C2ACE9: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1345== by 0x68E7D6D: link_free_data (link.c:223)
==1345== by 0x6D47B1F: nl_object_free (object.c:186)
==1345== by 0x46C31C: put_nl_object (nm-linux-platform.c:222)
==1345== by 0x46C31C: link_change (nm-linux-platform.c:2354)
==1345== by 0x46C87F: link_set_user_ipv6ll_enabled (nm-linux-platform.c:2583)
==1345== by 0x4476C4: set_nm_ipv6ll (nm-device.c:4418)
==1345== by 0x4476C4: ip6_managed_setup (nm-device.c:7515)
==1345== by 0x453F12: _set_state_full (nm-device.c:7665)
==1345== by 0x4B6609: add_device (nm-manager.c:1885)
==1345== by 0x4B6880: system_create_virtual_device (nm-manager.c:1126)
==1345== by 0x4B6B40: system_create_virtual_devices (nm-manager.c:1163)
==1345== by 0x4B6E00: platform_link_added (nm-manager.c:2213)
==1345== by 0x4B6E00: platform_link_cb (nm-manager.c:2228)
==1345== by 0x9DC5D5F: ffi_call_unix64 (unix64.S:76)
(cherry picked from commit f93f0e0b15)
Testing WWAN connections through a Nokia Series 40 phone, addresses of family
AF_PHONET end up triggering an assert() in object_has_ifindex(), just because
object_type_from_nl_object() only handles AF_INET and AF_INET6 address.
In order to avoid this kind of problems, we'll try to make sure that the object
caches kept by NM only store known object types.
(fixup by dcbw to use cached passed to cache_remove_unknown())
https://bugzilla.gnome.org/show_bug.cgi?id=742928
Connect: ppp0 <--> /dev/ttyACM0
nm-pppd-plugin-Message: nm-ppp-plugin: (nm_phasechange): status 5 / phase 'establish'
NetworkManager[27434]: <info> (ppp0): new Generic device (driver: 'unknown' ifindex: 12)
NetworkManager[27434]: <info> (ppp0): exported as /org/freedesktop/NetworkManager/Devices/4
[Thread 0x7ffff1ecf700 (LWP 27439) exited]
NetworkManager[27434]: <info> (ttyACM0): device state change: ip-config -> deactivating (reason 'user-requested') [70 110 39]
Terminating on signal 15
nm-pppd-plugin-Message: nm-ppp-plugin: (nm_phasechange): status 10 / phase 'terminate'
**
NetworkManager:ERROR:platform/nm-linux-platform.c:1534:object_has_ifindex: code should not be reached
Program received signal SIGABRT, Aborted.
0x00007ffff4692a97 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff4692a97 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff4693e6a in abort () from /usr/lib/libc.so.6
#2 0x00007ffff4c8d7f5 in g_assertion_message () from /usr/lib/libglib-2.0.so.0
#3 0x00007ffff4c8d88a in g_assertion_message_expr () from /usr/lib/libglib-2.0.so.0
#4 0x0000000000472b91 in object_has_ifindex (object=0x8a8320, ifindex=12) at platform/nm-linux-platform.c:1534
#5 0x0000000000472bec in check_cache_items (platform=0x7fe8a0, cache=0x7fda30, ifindex=12) at platform/nm-linux-platform.c:1549
#6 0x0000000000472de3 in announce_object (platform=0x7fe8a0, object=0x8a8c30, change_type=NM_PLATFORM_SIGNAL_REMOVED, reason=NM_PLATFORM_REASON_EXTERNAL) at platform/nm-linux-platform.c:1617
#7 0x0000000000473dd2 in event_notification (msg=0x8a7970, user_data=0x7fe8a0) at platform/nm-linux-platform.c:1992
#8 0x00007ffff5ee14de in nl_recvmsgs_report () from /usr/lib/libnl-3.so.200
#9 0x00007ffff5ee1849 in nl_recvmsgs () from /usr/lib/libnl-3.so.200
#10 0x00000000004794df in event_handler (channel=0x7fc930, io_condition=G_IO_IN, user_data=0x7fe8a0) at platform/nm-linux-platform.c:4152
#11 0x00007ffff4c6791d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#12 0x00007ffff4c67cf8 in ?? () from /usr/lib/libglib-2.0.so.0
#13 0x00007ffff4c68022 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#14 0x00000000004477ee in main (argc=1, argv=0x7fffffffeaa8) at main.c:447
(gdb) fr 4
#4 0x0000000000472b91 in object_has_ifindex (object=0x8a8320, ifindex=12) at platform/nm-linux-platform.c:1534
1534 g_assert_not_reached ();
(cherry picked from commit bf7865e859)
The address might be zero-size, and therefore nl_addr_get_binary_addr()
returns a pointer to a zero-size array. We don't want to read past the
end of that array. Since zero-size addresses really mean an address
of all zeros, just make that happen.
As an additional optimization, if the prefix length is zero, the whole
address is host bits and should be cleared.
==30286== Invalid read of size 4
==30286== at 0x478090: clear_host_address (nm-linux-platform.c:3786)
==30286== by 0x4784D4: route_search_cache (nm-linux-platform.c:3883)
==30286== by 0x4785A1: refresh_route (nm-linux-platform.c:3901)
==30286== by 0x4787B6: ip4_route_delete (nm-linux-platform.c:3978)
==30286== by 0x47F674: nm_platform_ip4_route_delete (nm-platform.c:1980)
==30286== by 0x4B279D: _v4_platform_route_delete_default (nm-default-route-manager.c:1122)
==30286== by 0x4AEF03: _platform_route_sync_flush (nm-default-route-manager.c:320)
==30286== by 0x4B043E: _resync_all (nm-default-route-manager.c:574)
==30286== by 0x4B0CA7: _entry_at_idx_remove (nm-default-route-manager.c:631)
==30286== by 0x4B1A66: _ipx_update_default_route (nm-default-route-manager.c:806)
==30286== by 0x4B1A9C: nm_default_route_manager_ip4_update_default_route (nm-default-route-manager.c:813)
==30286== by 0x45C3BC: _cleanup_generic_post (nm-device.c:7143)
==30286== Address 0xee33514 is 0 bytes after a block of size 20 alloc'd
==30286== at 0x4C2C080: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30286== by 0x6B2B0B1: nl_addr_alloc (in /usr/lib/libnl-3.so.200.20.0)
==30286== by 0x6B2B0E3: nl_addr_build (in /usr/lib/libnl-3.so.200.20.0)
==30286== by 0x6B2B181: nl_addr_clone (in /usr/lib/libnl-3.so.200.20.0)
==30286== by 0x66DB0D7: ??? (in /usr/lib/libnl-route-3.so.200.20.0)
==30286== by 0x6B33CE6: nl_object_clone (in /usr/lib/libnl-3.so.200.20.0)
==30286== by 0x6B2D303: nl_cache_add (in /usr/lib/libnl-3.so.200.20.0)
==30286== by 0x472E55: refresh_object (nm-linux-platform.c:1735)
==30286== by 0x473137: add_object (nm-linux-platform.c:1795)
==30286== by 0x478373: ip4_route_add (nm-linux-platform.c:3846)
==30286== by 0x47F375: nm_platform_ip4_route_add (nm-platform.c:1939)
==30286== by 0x4AEC06: _platform_route_sync_add (nm-default-route-manager.c:254)
https://bugzilla.gnome.org/show_bug.cgi?id=742937
(cherry picked from commit d2871089a8)
refresh_object() raised a spurious change event for the route we
are about to delete. Suppress that by adding an internal reason flag.
Fixes: 41e6c4fac1
(cherry picked from commit 96c099de09)
Deleting routes with metric 0 might end up deleting other
routes with a different metric.
Workaround this in platform to only delete a route with
metric 0 if such a route can be found prior to deletion.
Don't only look into the cache (which might be out of date).
Instead refetch the route we are about to delete to be sure.
There is still a race that we might end up deleting the wrong
route.
https://bugzilla.gnome.org/show_bug.cgi?id=741871https://bugzilla.redhat.com/show_bug.cgi?id=1172780
(cherry picked from commit 41e6c4fac1)
This match-any behavior ignoring metric is nowhere used. And even if we
would need such a behavior, using 0 is wrong because IPv4 routes can
have a metric of zero.
(cherry picked from commit 2cb3c7e8a0)
Handling a route with metric 0 effectively means
a metric of 1024 (user default). Adjust the add(),
delete() and exist() functions to consider routes
with metric 0 as 1024.
(cherry picked from commit 06e4eee0ce)
platform/nm-linux-platform.c: In function 'setup':
platform/nm-linux-platform.c:4364:2: error: 'object' undeclared (first use in this function)
object = nl_cache_get_first (priv->link_cache);
^
Fixes 2b8060b9b3
This reverts commit efd09845c4.
It turns out that the socket space might not be the only buffer that may get
too full. 128K ought to be enough for it and we should resynchronize with the
kernel now if needed.
Kernel can return ENOBUFS in variety of reasons. If that happens, we know we've
lost events and should pick up kernel state.
Simple reproducer that triggers an ENOBUFS condition no matter how big our
netlink socket buffer is:
ip link add bridge0 type bridge
for i in seq $(0 1023); do ip link add dummy$i type dummy; \
ip link set dummy$i master bridge0; done
ip link del bridge0
We assume that in nm_nl_cache_search() and correctly set that in
get_kernel_object(), but we rtnl_link_alloc_cache() can initialize the cache
with devices of other families.
The consequence is that we don't notify when the bridge changes to IFF_UP as we
fail to match and remove the old downed object from the cache:
nm_device_bring_up(): [0xf506c0] (bridge0): bringing up device.
nm_platform_link_set_up(): link: setting up 'bridge0' (12)
link_change_flags(): link: change 12: flags set 'up' (1)
get_kernel_object(): get_kernel_object for link: bridge0 (12, family 7)
log_link(): signal: link added: 12: bridge0 <UP> mtu 1500 bridge driver 'bridge' udi '/sys/devices/virtual/net/bridge0'
get_kernel_object(): get_kernel_object for link: bridge0 (12, family 7)
log_link(): signal: link changed: 12: bridge0 <UP> mtu 1500 bridge driver 'bridge' udi '/sys/devices/virtual/net/bridge0'
log_link(): signal: link changed: 12: bridge0 <UP> mtu 1500 bridge driver 'bridge' udi '/sys/devices/virtual/net/bridge0'
(bridge0): device not up after timeout!
(bridge0): preparing device
Since f32075d2fc, we remove the kernel
added IPv4 device route, and re-add it with appropriate metric.
This could potentially replace existing, conflicting routes. Be more
careful and only take any action when we don't have a conflicting
route and when we add the address for the first time.
The motivation for this was libreswan which might install a VPN route
for a subnet that we also have configured on an interface. But the route
conflict could happen easily for other reasons, for example if you
configure a conflicting route manually.
Don't replace the device route if we have any indication that
a conflict could arise.
https://bugzilla.gnome.org/show_bug.cgi?id=723178
The upstream kernel added module aliases for nl80211 in commit
fb4e156886ce6e8309e912d8b370d192330d19d3, so querying nl80211
now auto-loads the module. Previously NM was doing this to
determine whether an ethernet-like device was a Wi-Fi device
that supported nl80211, but this leads to the nl80211 loading
on platforms that will never have or use Wi-Fi.
Since every nl80211-capable device will already have
DEVTYPE=wlan set (from /sys/class/net/wlan0/uevent), we can use
that as an indicator that the ethernet-like device is WiFi
instead of asking nl80211.
https://bugzilla.gnome.org/show_bug.cgi?id=740131
config.h should be included from every .c file, and it should be
included before any other include. Fix that.
(As a side effect of how I did this, this also changes us to
consistently use "config.h" rather than <config.h>. To the extent that
it matters [which is not much], quotes are more correct anyway, since
we're talking about a file in our own build tree, not a system
include.)
Add a new enum NMPlatformGetRouteMode. This extends the existing
functions nm_platform_ip4_route_get_all() and nm_platform_ip6_route_get_all()
to return default routes only.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Kernel, netlink an NMPlatformRoute treat route metrics as
uint32. Fix several places to use the exact type.
Signed-off-by: Thomas Haller <thaller@redhat.com>