Commit graph

14832 commits

Author SHA1 Message Date
Beniamino Galvani
8a06d66d0e
l3cfg: fix nm_l3cfg_commit_type_register() to set commit_type and add debug logging
Co-authored-by: Thomas Haller <thaller@redhat.com>
2021-09-28 22:18:13 +02:00
Thomas Haller
f3def63fed
std-aux: add nm_clear_fd() helper 2021-09-28 15:58:29 +02:00
Thomas Haller
81ed762d46
glib-aux/trivial: update code comment for nm_g_idle_add() 2021-09-28 15:58:28 +02:00
Thomas Haller
78c716cfaf
l3cfg: track "never-default" in NML3Cfg 2021-09-28 13:20:28 +02:00
Thomas Haller
01a09d1194
libnm: minor cleanup of NM_TERNARY_FROM_OPTION_BOOL()/NM_TERNARY_TO_OPTION_BOOL() 2021-09-28 13:19:37 +02:00
Thomas Haller
37047aba36
std-aux: add nm_assert_addr_family_or_unspec() and nm_utils_addr_family_other() helpers 2021-09-28 12:56:39 +02:00
Thomas Haller
c9a833c910
l3cfg: drop nm_l3cfg_property_emit_register() API
The idea was that NMIPConfig would register itself with the property (like "address-data")
and then NML3Cfg would emit the property changed notification.

However, we can already achive that via the regular notification, in particular
by listening to NM_L3_CONFIG_NOTIFY_TYPE_PLATFORM_CHANGE_ON_IDLE notification.

Also, NML3Cfg does not really understand the details when the property should
be emitted. For example, many routes not not exposed via "route-data" property,
and changes to those should not trigger a notification.

Drop the unused API.
2021-09-27 10:09:50 +02:00
Thomas Haller
b8ab6837df
glib-aux: add nm_g_variant_equal() helper 2021-09-27 10:07:43 +02:00
Thomas Haller
e47dd2ee22
l3cfg: configure dependent routes when creating combined config 2021-09-27 07:55:32 +02:00
Thomas Haller
ed1536c890
l3cfg: fix assertion failure in _l3_hook_add_obj_cb()
With nm_l3cfg_get_combined_l3cd(), we can get the commited or
the combined (next) l3cd. The commits is easy, it's cached already.

However, the combined needs to be computed first, if there were any
changes. For that we call _l3cfg_update_combined_config(), which then
also calls nm_l3_config_data_merge().

But in non-commit mode, _l3cfg_update_combined_config() doesn't call
_l3_acd_data_add_all(), so in _l3_hook_add_obj_cb() the ACD data may
not be as expected. This can previously hit an assertion.
2021-09-25 10:10:17 +02:00
Thomas Haller
d68fa91199
platform/tests: fix assertion failure in NMTstpAcdDefender
Seems we can get a DOWN event during unit tests. I don't really
understand why, but let's ignore it.

  [...]
  #4  0x000055e365777786 in _l3_acd_nacd_event (fd=<optimized out>, condition=<optimized out>, user_data=0x55e367566270) at src/core/platform/tests/test-common.c:2703
  #5  0x00007f4399c224cf in g_main_dispatch (context=0x55e36755fce0) at ../glib/gmain.c:3337
  #6  g_main_context_dispatch (context=0x55e36755fce0) at ../glib/gmain.c:4055
  #7  0x00007f4399c764f8 in g_main_context_iterate.constprop.0 (context=context@entry=0x55e36755fce0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
      at ../glib/gmain.c:4131
  #8  0x00007f4399c1fc03 in g_main_context_iteration (context=0x55e36755fce0, context@entry=0x0, may_block=may_block@entry=1) at ../glib/gmain.c:4196
  #9  0x000055e365770719 in test_l3_ipv6ll (test_data=<optimized out>) at src/core/tests/test-l3cfg.c:1024
2021-09-25 10:10:17 +02:00
Thomas Haller
a446e490f9
l3cfg: mediate commit_type with nm_l3cfg_commit_on_idle_schedule()
We have nm_l3cfg_commit(), however that is synchronous and triggers an
avalanche of side effects. So it should be avoided if a component is
not aware of the current circumstances in which it gets called (most of them).

The alternative is nm_l3cfg_commit_on_idle_schedule(), but previously
that only supported the auto type.

Two changes:

- add a commit_type parameter to nm_l3cfg_commit_on_idle_schedule().
  This allows to explicitly select a type for the next commit.
  Previously, if the caller wanted for example to trigger a reapply
  once, they had to register a handle, trigger the commit and unregister
  the handle again. This basically allows to specify an ad-hoc commit
  type that is only used once.

- if an explicit commit type is requested, then still always combine
  it with auto. That means, we always use the "maximum" of what is
  requested and what is registered.
2021-09-24 21:48:39 +02:00
Thomas Haller
ddfd1e8ddf
device: minor cleanup in reapply_cb() 2021-09-24 18:03:05 +02:00
Yu Watanabe
7fba0f7cb2
libsystemd-network: disable event sources before unref them
This also (is supposed to) fix a assertion failure when in ipv4acd
when receiving an ARP packet in an unexpected state.

See-also: https://github.com/systemd/systemd/issues/20825
See-also: https://github.com/systemd/systemd/pull/20826
eb2f750242

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/807
2021-09-24 16:52:54 +02:00
Thomas Haller
70a077e4d8
l3cfg: add nm_l3cfg_check_ready() helper 2021-09-23 21:17:43 +02:00
Thomas Haller
145950d0c2
l3cfg/trivial: add code comment about NM_L3_CONFIG_NOTIFY_TYPE_PLATFORM_CHANGE 2021-09-23 11:06:28 +02:00
Thomas Haller
a1b6896b14
trivial: update fixme comment
The proper tag is "l3cfg" not "next". Currently "next" branch and l3cfg rework
is the same, but in the future we might have other "next" branches, while "l3cfg"
is the tag to indicate this effort.
2021-09-23 10:55:12 +02:00
Thomas Haller
e8a63b0ab5
device: use proper define NM_SETTING_IP6_CONFIG_METHOD_SHARED in addrconf6_start() 2021-09-23 10:31:59 +02:00
Beniamino Galvani
57c1982867 libnm: add required-timeout backported symbol from 1.30.8
The nm_setting_ip_config_get_required_timeout() symbol was introduced
in libnm 1.32.4 and then backported to 1.30.8.

Export it also with version @libnm_1_30_8; this allows a program built
against libnm 1.30.8 to keep working with later versions of the
library.
2021-09-23 09:01:10 +02:00
Thomas Haller
642c160e59
core: use nm_clear_g_free() instead of g_clear_pointer() 2021-09-22 17:25:31 +02:00
Thomas Haller
d6ab4d24cf
l3cfg/trivial: rename parameter in nm_l3cfg_remove_config() 2021-09-21 09:12:23 +02:00
Thomas Haller
79d4b67681
l3cfg: add nm_l3_config_data_lookup_address_4() helper 2021-09-21 09:11:54 +02:00
Thomas Haller
b1ad3f1ba5
l3cfg: return any IPv4/IPv6 route from nm_l3_config_data_get_best_default_route() 2021-09-21 08:36:14 +02:00
Thomas Haller
abeedad315
initrd: reword warnings when parsing "rd.ethtool=" option
"Impossible to set rd.ethtool options: invalid format" is not very
clear. Try to explain what is invalid about the format (the interface
name is missing).

"Invalid value for rd.ethtool.autoneg, rd.ethtool.autoneg was not set"
is also confusing. The message gets printed if the autoneg value was
specified on the command line, so "was not set" seems wrong. Maybe the
message meant that the profile value is left at the default (FALSE),
but that isn't very clear.

Reword.
2021-09-21 08:29:47 +02:00
Thomas Haller
2c3f967d1a
initrd: only reset autoneg/speed if specified
The idea of positional arguments is that they might be extended in the
future. That means, there might be an option "rd.ethtool:eth0:::foo".

Also, if multiple "rd.ethtool:eth0" options are specified on the command
line, then the autoneg/speed settings should only be set if present.
That means

  "rd.ethtool:eth0:on:100 rd.ethtool:eth0:::foo"

should work as expected and first set autoneg/speed options, but the
second argument only sets "foo" (without resetting autoneg/speed).
2021-09-21 08:29:47 +02:00
Thomas Haller
b7b275dead
initrd: warn about disabling autoneg without setting speed
To NetworkManager, "autoneg=FALSE && speed=0" has the meaning to
not configure these options and leave whatever is configured previously.
That is also the default.

Explicitly configuring "rd.ethtool=eth0:off:0" is thus likely a misconfiguration,
because it tells NetworkManager to not configure the interface.

Note that the user can configure that, via "rd.ethtool=eth0::", that
is by omitting all parameters. That is a valid configuration and causes
no warning. The reason to support this silently, is so that we can
add in the future more positional arguments that the user can set
without changing autoneg/speed.
2021-09-21 08:29:47 +02:00
Thomas Haller
44e484a6aa
initrd: refactor parsing of "rd.ethtool=" to accept zero positional arguments
The point of positional arguments is that you can omit them, and that
should be treated as the parameter being set to the default.

So, don't treat "rd.ethtool=eth0" (or "rd.ethtool=eth0:") special.
Just continue the parsing and take all following positional arguments
as unset.
2021-09-21 08:29:46 +02:00
Thomas Haller
0e384b0170
initrd: refactor parsing of "rd.ethtool=" to not return after autoneg
Don't return early from parsing "autoneg", if there are not additional
arguments.

The behavior should be exactly the same, whether a positional
argument is missing, empty, or set to the default.

That is,

  - "rd.ethtool=eth0:on"
  - "rd.ethtool=eth0🔛"
  - "rd.ethtool=eth0🔛:"
  - "rd.ethtool=eth0🔛0:"

should all evaluate the same thing.

That was already the case in practice, but that was hard to see.
So don't treat missing positional arguments special and don't return
early. Parse all parameters regardless.

The change is visible when parsing "rd.ethtool=eth0:off:100 rd.ethtool=eth0:on".
Autoneg and speed really belongs together, so when we parse the second
argument, we should reset the speed too -- even if it's not present.
2021-09-21 08:29:46 +02:00
Thomas Haller
44b8c71ed5
initrd/tests: add more tests for "rd.ethtool" (test_rd_ethtool()) 2021-09-21 08:29:46 +02:00
Thomas Haller
f88a608050
initrd/tests: refactor tests for "rd.ethtool" (test_rd_ethtool())
It tests the same input as before (except, dropping the duplicate test
for "rd.ethtool=eth0🔛100:bogus").
2021-09-21 08:29:39 +02:00
Thomas Haller
a8866095dd
core/ndisc: move nm_lndp_ndisc_get_sysctl() to "nm-ndisc.[ch]"
NMNDisc has two implementations: lndp and fake. Fake only exists as a
stub for unit tests, otherwise there is no purpose to it. Also, we won't
ever add another implementation beside lndp. If lndp is not suitable, it
would be replaced, but not accompanied by a second implementation.

As such, nm_lndp_ndisc_get_sysctl() has no purpose to be in
"nm-lndp-ndisc.c". This split does not exist to abstract "nm-ndisc.c"
from NMPlatform. It exists to make it easier to test.
2021-09-20 13:59:10 +02:00
Beniamino Galvani
d50c4eba9e core: disable tc cache by default
We no longer use tc objects from the platform cache; disable caching
by default.

The only exception where the cache is needed is in tc tests, as we
look into the platform there to check that objects look as expected.
2021-09-20 13:27:16 +02:00
Beniamino Galvani
864e4e6369 platform: allow disabling caching of tc objects
Introduce a construct-only property for platform objects to enable or
disable the caching of tc objects. When disabled, the netlink socket
doesn't receive netlink events for tc objects, and objects are never
added to the cache. This commit doesn't change behavior yet.
2021-09-20 13:27:16 +02:00
Beniamino Galvani
c896973deb platform: drop test-tc-fake
It doesn't seem useful to have a copy of the test which does nothing.
2021-09-20 13:27:15 +02:00
Beniamino Galvani
e0691f9528 device: ensure tc_commit() is called only once per activation
Stage2 can be called multiple times. Ensure that tc_commit() is only
called the first time. This is important now that tc synchronization
requires to clear all qdiscs and recreate them.
2021-09-20 13:27:15 +02:00
Beniamino Galvani
3981bff2a0 core: rework tc sync functions
Update nm_platform_qdisc_sync() and nm_platform_tfilter_sync() to
avoid looking into the platform cache, so that we no longer require to
keep tc and qdiscs in the cache.

There is no API in kernel to retrieve tc objects only for a specific
interface, so NM had to receive all tc events, even for unmanaged
interfaces.  This could cause high CPU usage in some scenarios with
many objects.

Instead, try to delete root qdiscs and filters and then add the known
ones.

Also, combine the two functions together since they are related. In
particular, removing all qdiscs also removes all attached filters.
2021-09-20 13:27:15 +02:00
Beniamino Galvani
d9b2e9d7ea platform: add methods to delete tc qdiscs and tfilters
Introduce two platform methods to delete tc qdiscs and filters by
ifindex and parent.
2021-09-20 13:27:15 +02:00
Beniamino Galvani
8003ca68f7 platform: preserve IPv6 multicast route added by kernel
Kernels < 5.11 add a route like:

  unicast ff00::/8 dev $IFACE proto boot scope global metric 256 pref medium

to allow sending and receiving IPv6 multicast traffic. Ensure it's not
removed it when we do a route sync in mode ALL.

In kernel 5.11 there were commits:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ceed9038b2783d14e0422bdc6fd04f70580efb4c
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a826b04303a40d52439aa141035fca5654ccaccd

After those the route looks like

  multicast ff00::/8 dev $IFACE proto kernel metric 256 pref medium

As NM ignores routes with rtm_type multicast, the code in this commit
is not needed on newer kernels.

https://bugzilla.redhat.com/show_bug.cgi?id=2004212
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/984
2021-09-20 10:27:38 +02:00
Thomas Haller
0d4840c484
l3cfg: fix assertion in NM_IS_L3_CONFIG_DATA() to allow NULL 2021-09-16 20:54:49 +02:00
Thomas Haller
882168c728
l3cfg: accept %NULL argument to nm_l3_config_data_seal()
There is always a question between convenience of allowing %NULL (and
do nothing) and strictly require the user to check the argument to not
be %NULL. In this case, it's more convenient to accept NULL, than require
the callers to check for it.
2021-09-16 20:54:49 +02:00
Thomas Haller
fe80b2d1ec
cloud-setup: use suppress_prefixlength rule to honor non-default-routes in the main table
Background
==========

Imagine you run a container on your machine. Then the routing table
might look like:

    default via 10.0.10.1 dev eth0 proto dhcp metric 100
    10.0.10.0/28 dev eth0 proto kernel scope link src 10.0.10.5 metric 100
    [...]
    10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
    10.42.1.2 dev cali02ad7e68ce1 scope link
    10.42.1.3 dev cali8fcecf5aaff scope link
    10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
    10.42.3.0/24 via 10.42.3.0 dev flannel.1 onlink

That is, there are another interfaces with subnets and specific routes.

If nm-cloud-setup now configures rules:

    0:  from all lookup local
    30400:  from 10.0.10.5 lookup 30400
    32766:  from all lookup main
    32767:  from all lookup default

and

    default via 10.0.10.1 dev eth0 table 30400 proto static metric 10
    10.0.10.1 dev eth0 table 30400 proto static scope link metric 10

then these other subnets will also be reached via the default route.

This container example is just one case where this is a problem. In
general, if you have specific routes on another interface, then the
default route in the 30400+ table will interfere badly.

The idea of nm-cloud-setup is to automatically configure the network for
secondary IP addresses. When the user has special requirements, then
they should disable nm-cloud-setup and configure whatever they want.
But the container use case is popular and important. It is not something
where the user actively configures the network. This case needs to work better,
out of the box. In general, nm-cloud-setup should work better with the
existing network configuration.

Change
======

Add new routing tables 30200+ with the individual subnets of the
interface:

    10.0.10.0/24 dev eth0 table 30200 proto static metric 10
    [...]
    default via 10.0.10.1 dev eth0 table 30400 proto static metric 10
    10.0.10.1 dev eth0 table 30400 proto static scope link metric 10

Also add more important routing rules with priority 30200+, which select
these tables based on the source address:

    30200:  from 10.0.10.5 lookup 30200

These will do source based routing for the subnets on these
interfaces.

Then, add a rule with priority 30350

    30350:  lookup main suppress_prefixlength 0

which processes the routes from the main table, but ignores the default
routes. 30350 was chosen, because it's in between the rules 30200+ and
30400+, leaving a range for the user to configure their own rules.

Then, as before, the rules 30400+ again look at the corresponding 30400+
table, to find a default route.

Finally, process the main table again, this time honoring the default
route. That is for packets that have a different source address.

This change means that the source based routing is used for the
subnets that are configured on the interface and for the default route.
Whereas, if there are any more specific routes in the main table, they will
be preferred over the default route.

Apparently Amazon Linux solves this differently, by not configuring a
routing table for addresses on interface "eth0". That might be an
alternative, but it's not clear to me what is special about eth0 to
warrant this treatment. It also would imply that we somehow recognize
this primary interface. In practise that would be doable by selecting
the interface with "iface_idx" zero.

Instead choose this approach. This is remotely similar to what WireGuard does
for configuring the default route ([1]), however WireGuard uses fwmark to match
the packets instead of the source address.

[1] https://www.wireguard.com/netns/#improved-rule-based-routing
2021-09-16 17:30:25 +02:00
Thomas Haller
0978be5e43
cloud-setup: cleanup configuring addresses/routes/rules in _nmc_mangle_connection() 2021-09-16 15:51:03 +02:00
Thomas Haller
b68d694b78
cloud-setup: limit number of supported interfaces to avoid overlapping table numbers
The table number is chosen as 30400 + iface_idx. That is, the range is
limited and we shouldn't handle more than 100 devices. Add a check for
that and error out.
2021-09-16 15:51:03 +02:00
Thomas Haller
a95ea0eb29
cloud-setup: process iface-datas in sorted order
The routes/rules that are configured are independent of the
order in which we process the devices. That is, because they
use the "iface_idx" for cases where there is ambiguity.

Still, it feels nicer to always process them in a defined order.
2021-09-16 15:51:02 +02:00
Thomas Haller
1c5cb9d3c2
cloud-setup: track sorted list of NMCSProviderGetConfigIfaceData
Sorted by iface_idx. The iface_idx is probably something useful and
stable, provided by the provider. E.g. it's the order in which
interfaces are exposed on the meta data.
2021-09-16 15:51:02 +02:00
Thomas Haller
ec56fe60fb
cloud-setup: add "hwaddr" to NMCSProviderGetConfigIfaceData struct
get-config() gives a NMCSProviderGetConfigResult structure, and the
main part of data is the GHashTable of MAC addresses and
NMCSProviderGetConfigIfaceData instances.

Let NMCSProviderGetConfigIfaceData also have a reference to the MAC
address. This way, I'll be able to create a (sorted) list of interface
datas, that also contain the MAC address.
2021-09-16 15:51:02 +02:00
Thomas Haller
5f047968d7
cloud-setup: skip configuring policy routing if there is only one interface/address
nm-cloud-setup automatically configures the network. That may conflict
with what the user wants. In case the user configures some specific
setup, they are encouraged to disable nm-cloud-setup (and its
automatism).

Still, what we do by default matters, and should play as well with
user's expectations. Configuring policy routing and a higher priority
table (30400+) that hijacks the traffic can cause problems.

If the system only has one IPv4 address and one interface, then there
is no point in configuring policy routing at all. Detect that, and skip
the change in that case.

Note that of course we need to handle the case where previously multiple
IP addresses were configured and an update gives only one address. In
that case we need to clear the previously configured rules/routes. The
patch achieves this.
2021-09-16 15:51:02 +02:00
Thomas Haller
7969ae1a82
cloud-setup: count numbers of valid IPv4 addresses in get-config result
Will be used next.
2021-09-16 15:51:02 +02:00
Thomas Haller
a3cd66d3fa
cloud-setup: cache number of valid interfaces in get-config result
Now that we return a struct from get_config(), we can have system-wide
properties returned.

Let it count and cache the number of valid iface_datas.

Currently that is not yet used, but it will be.
2021-09-16 15:51:02 +02:00
Thomas Haller
323e182768
cloud-setup: return structure for get_config() result instead of generic hash table
Returning a struct seems easier to understand, because then the result
is typed.

Also, we might return additional results, which are system wide and not
per-interface.
2021-09-16 15:51:02 +02:00