Don't progress to the IP ready state until all objects are committed
to platform. Note that l3cfg has a 20 seconds timeout after which
unavailable objects are considered "definitely unavailable" and are
removed from the list.
Fixes-test: @ipv6_routes_with_src
https://bugzilla.redhat.com/show_bug.cgi?id=2043133
(cherry picked from commit f15b3f15a7)
l3cfg has a "temp_not_available" list of objects that couldn't be
added to platform, but can be added once some preconditions become
true (for example, a IPv6 route with a "src" attribute requires a
non-tentative src address to be present).
Retry to commit those objects once all addresses have completed
ACD/DAD.
(cherry picked from commit 9a090fdf7b)
nm_l3_config_data_get_nameservers() returns a pointer to "struct in6_addr". Not
a pointer to pointers.
#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:389
#1 0x00007f8060dd9109 in memcpy (__len=<optimized out>, __src=0xfd, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:29
#2 g_array_append_vals (len=1, data=0xfd, farray=0x55dd69332130) at ../glib/garray.c:522
#3 g_array_append_vals (farray=0x55dd69332130, data=0xfd, len=1) at ../glib/garray.c:509
#4 0x000055dd68d2a27d in _garray_inaddr_add (p_arr=<optimized out>, addr_family=<optimized out>, addr=0xfd) at src/core/nm-l3-config-data.c:295
#5 0x000055dd68ef6510 in nm_l3_config_data_add_nameserver (nameserver=<optimized out>, addr_family=10, self=0x55dd6949f900) at src/core/nm-l3-config-data.c:1442
#6 nm_device_copy_ip6_dns_config (self=0x55dd693c4420, from_device=<optimized out>) at src/core/devices/nm-device.c:10468
#7 0x00007f8060f28aba in _g_closure_invoke_va (param_types=0x0, n_params=<optimized out>, args=0x7fffed43d610, instance=0x55dd693c4420, return_value=0x0, closure=0x55dd693cdb10)
at ../gobject/gclosure.c:893
#8 g_signal_emit_valist (instance=0x55dd693c4420, signal_id=<optimized out>, detail=0, var_args=var_args@entry=0x7fffed43d610) at ../gobject/gsignal.c:3406
#9 0x00007f8060f28c03 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at ../gobject/gsignal.c:3553
#10 0x000055dd68efd1fb in _dev_ipac6_start (self=0x55dd693c4420) at src/core/devices/nm-device.c:11348
#11 0x000055dd68efd698 in _dev_ipac6_start_continue (self=0x55dd693c4420) at src/core/devices/nm-device.c:11373
#12 _dev_ipll6_set_llstate (self=0x55dd693c4420, llstate=<optimized out>, lladdr=<optimized out>) at src/core/devices/nm-device.c:10576
#13 0x000055dd68e7915e in _emit_changed_on_idle_cb (user_data=user_data@entry=0x55dd6941ca50) at src/core/nm-l3-ipv6ll.c:221
#14 0x00007f8060e0639b in g_idle_dispatch (source=0x55dd693eea30, callback=0x55dd68e78fd0 <_emit_changed_on_idle_cb>, user_data=0x55dd6941ca50) at ../glib/gmain.c:5897
#15 0x00007f8060e0a05f in g_main_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:3381
#16 g_main_context_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:4099
#17 0x00007f8060e5f2a8 in g_main_context_iterate.constprop.0 (context=0x55dd6922c800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4175
#18 0x00007f8060e09773 in g_main_loop_run (loop=0x55dd69211010) at ../glib/gmain.c:4373
#19 0x000055dd68d09c7b in main (argc=<optimized out>, argv=<optimized out>) at src/core/main.c:509
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
(cherry picked from commit a2c8a3228b)
Add support for IPv6 multipath routes, by treating them as single-hop
routes. Otherwise, we can easily end up with an inconsistent platform
cache.
Background:
-----------
Routes are hard. We have NMPlatform which is a cache of netlink objects.
That means, we have a hash table and we cache objects based on some
identity (nmp_object_id_equal()). So those objects must have some immutable,
indistinguishable properties that determine whether an object is the
same or a different one.
For routes and routing rules, this identifying property is basically a subset
of the attributes (but not all!). That makes it very hard, because tomorrow
kernel could add an attribute that becomes part of the identity, and NetworkManager
wouldn't recognize it, resulting in cache inconsistency by wrongly
thinking two different routes are one and the same. Anyway.
The other point is that we rely on netlink events to maintain the cache.
So when we receive a RTM_NEWROUTE we add the object to the cache, and
delete it upon RTM_DELROUTE. When you do `ip route replace`, kernel
might replace a (different!) route, but only send one RTM_NEWROUTE message.
We handle that by somehow finding the route that was replaced/deleted. It's
ugly. Did I say, that routes are hard?
Also, for IPv4 routes, multipath attributes are just a part of the
routes identity. That is, you add two different routes that only differ
by their multipath list, and then kernel does as you would expect.
NetworkManager does not support IPv4 multihop routes and just ignores
them.
Also, a multipath route can have next hops on different interfaces,
which goes against our current assumption, that an NMPlatformIP4Route
has an interface (or no interface, in case of blackhole routes). That
makes it hard to meaningfully support IPv4 routes. But we probably don't
have to, because we can just pretend that such routes don't exist and
our cache stays consistent (at least, until somebody calls `ip route
replace` *sigh*).
Not so for IPv6. When you add (`ip route append`) an IPv6 route that is
identical to an existing route -- except their multipath attribute -- then it
behaves as if the existing route was modified and the result is the
merged route with more next-hops. Note that in this case kernel will
only send a RTM_NEWROUTE message with the full multipath list. If we
would treat the multipath list as part of the route's identity, this
would be as if kernel deleted one routes and created a different one (the
merged one), but only sending one notification. That's a bit similar to
what happens during `ip route replace`, but it would be nightmare to
find out which route was thereby replaced.
Likewise, when you delete a route, then kernel will "subtract" the
next-hop and sent a RTM_DELROUTE notification only about the next-hop that
was deleted. To handle that, you would have to find the full multihop
route, and replace it with the remainder after the subtraction.
NetworkManager so far ignored IPv6 routes with more than one next-hop, this
means you can start with one single-hop route (that NetworkManger sees
and has in the platform cache). Then you create a similar route (only
differing by the next-hop). Kernel will merge the routes, but not notify
NetworkManager that the single-hop route is not longer a single-hop
route. This can easily cause a cache inconsistency and subtle bugs. For
IPv6 we MUST handle multihop routes.
Kernels behavior makes little sense, if you expect that routes have an
immutable identity and want to get notifications about addition/removal.
We can however make sense by it by pretending that all IPv6 routes are
single-hop! With only the twist that a single RTM_NEWROUTE notification
might notify about multiple routes at the same time. This is what the
patch does.
The Patch
---------
Now one RTM_NEWROUTE message can contain multiple IPv6 routes
(NMPObject). That would mean that nmp_object_new_from_nl() needs to
return a list of objects. But it's not implemented that way. Instead,
we still call nmp_object_new_from_nl(), and the parsing code can
indicate that there is something more, indicating the caller to call
nmp_object_new_from_nl() again in a loop to fetch more objects.
In practice, I think all RTM_DELROUTE messages for IPv6 routes are
single-hop. Still, we implement it to handle also multi-hop messages the
same way.
Note that we just parse the netlink message again from scratch. The alternative
would be to parse the first object once, and then clone the object and
only update the next-hop. That would be more efficient, but probably
harder to understand/implement.
https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20
(cherry picked from commit dac12a8d61)
To parse the RTA_MULTIHOP message, "policy" is not right (which is used
to parse the overall message). Instead, we don't really have a special
policy that we should use.
This was not a severe issue, because the allocated buffer (with
G_N_ELEMENTS(policy) elements) was larger than need be. And apparently,
using the wrong policy also didn't cause us to reject important
messages.
(cherry picked from commit 997d72932d)
IPv4:
routes
A list of IPv4 destination addresses, prefix length, optional IPv4
next hop addresses, optional route metric, optional attribute. The
valid syntax is: "ip[/prefix] [next-hop] [metric]
[attribute=val]...[,ip[/prefix]...]". For example "192.0.2.0/24
10.1.1.1 77, 198.51.100.0/24".
Various attributes are supported:
• "cwnd" - an unsigned 32 bit integer.
• "initcwnd" - an unsigned 32 bit integer.
• "initrwnd" - an unsigned 32 bit integer.
• "lock-cwnd" - a boolean value.
• "lock-initcwnd" - a boolean value.
• "lock-initrwnd" - a boolean value.
• "lock-mtu" - a boolean value.
• "lock-window" - a boolean value.
• "mtu" - an unsigned 32 bit integer.
• "onlink" - a boolean value.
• "scope" - an unsigned 8 bit integer. IPv4 only.
• "src" - an IPv4 address.
• "table" - an unsigned 32 bit integer. The default depends on
ipv4.route-table.
• "tos" - an unsigned 8 bit integer. IPv4 only.
• "type" - one of unicast, local, blackhole, unavailable,
prohibit. The default is unicast.
• "window" - an unsigned 32 bit integer.
For details see also `man ip-route`.
Format: a comma separated list of routes
IPv6:
routes
A list of IPv6 destination addresses, prefix length, optional IPv6
next hop addresses, optional route metric, optional attribute. The
valid syntax is: "ip[/prefix] [next-hop] [metric]
[attribute=val]...[,ip[/prefix]...]".
Various attributes are supported:
• "cwnd" - an unsigned 32 bit integer.
• "from" - an IPv6 address with optional prefix. IPv6 only.
• "initcwnd" - an unsigned 32 bit integer.
• "initrwnd" - an unsigned 32 bit integer.
• "lock-cwnd" - a boolean value.
• "lock-initcwnd" - a boolean value.
• "lock-initrwnd" - a boolean value.
• "lock-mtu" - a boolean value.
• "lock-window" - a boolean value.
• "mtu" - an unsigned 32 bit integer.
• "onlink" - a boolean value.
• "src" - an IPv6 address.
• "table" - an unsigned 32 bit integer. The default depends on
ipv6.route-table.
• "type" - one of unicast, local, blackhole, unavailable,
prohibit. The default is unicast.
• "window" - an unsigned 32 bit integer.
For details see also `man ip-route`.
Format: a comma separated list of routes
(cherry picked from commit 7b1e9a5c3d)
I don't understand the code, but it mangles the XML.
There is no difference in the markup we have so far. But if you
have nested XML (like for description-docbook tag) there are cases
where this is wrong.
There is also no need to prettify anything. If you want pretty-formatted
XML, do it yourself, for example with
$ tidy --indent yes --indent-spaces 4 --indent-attributes yes --wrap-attributes yes --input-xml yes --output-xml yes src/libnm-client-impl/nm-property-infos-nmcli.xml
I think this was initially done, because we had the tool in perl, and
when migrating, we wanted to generate the exactly same output. And it
was the same output, and it was fine for the input we have. But with
different input, it's wrong. Drop it now.
(cherry picked from commit 35599b4349)
Certain route types (blackhole, unreachable, prohibit) are not tied to
an interface. They are thus global and we need to track them system wide
(or better: per network namespace). That is done by NMPRouteManager.
For the routing rules, it's NMDevice itself to track/untrack the rules.
That is done for historical reasons, at the time, NML3Cfg did not exit.
Now with NML3Cfg, it seems that also NML3Cfg should be the part that
handles nodev routes. One reason is that we want to move IP
functionality out of NMDevice. So callers (NMDevice) would just add
blackhole routes to the NML3ConfigData and let NML3Cfg handle them.
Still, to handle these routes is rather different from regular routes.
Normally, NML3Cfg tracks an object state (ObjStateData) for each address/route,
and it hooks into platform signals to update the os_plobj field. Those signals
are dispatched by NMNetns and are only per-ifindex. Hence, NML3Cfg
wouldn't be notified about those nodev routes. Consequently, there
os_plobj could not be (efficiently) maintained and there is no
ObjStateData for such routes.
Instead, all that NML3Cfg does is have the routes in the NML3ConfigData and
tell NMPRouteManager about them. Seems simple enough. The only question
is when should NMPRouteManager sync? For now, we sync when the
track/untracking brings any changes and during reapply. Which is
probably fine.
(cherry picked from commit 9ab53e561a)
Specifically, in nm_utils_ip_route_attribute_to_platform() and in
_l3_config_data_add_obj() handle such new route type. For the moment,
they cannot be stored in a valid NMSettingIPConfig, but later this will
be necessary.
(cherry picked from commit 6255e0dcac)
This will be required next, when we will have also routes without a
device. Split the generation of the route list out.
(cherry picked from commit e32bc6d248)
The general idea is that when we have entries tracked by the
route-manager, that we can mark them all as dirty. Then, calling the
"track" function will reset the dirty flag. Finally, there is a method
to delete all dirty entries.
As we can lookup an entry with O(1) (using dictionaries), we can
sync the list of tracked objects with O(n). We just need to track
all the ones we care about, and then delete those that were not touched
(that is, are still dirty).
Previously, we had to explicitly mark all entries as dirty. We can do
better. Just let nmp_route_manager_untrack_all() mark the survivors as
dirty right away. This way, we can save iterating the list once.
It also makes sense because the only purpose of the dirty flag is to
aid this prune mechanism with track/untrack-all. So, untrack-all can
just help out, and leave the remaining entries dirty, so that the next
track does the right thing.
(cherry picked from commit 9e90bb0817)
We now track up to three kinds of object types in NMPRouteManager.
There is only one place, where we need to iterate over all objects of
the same type (e.g. all ipv4-routes), and that is nmp_route_manager_sync().
Previously, we only had one GHashTable with all the object, and when
iterating we had to skip over them after checking the type. That has some
overhead, but OK.
The ugliness with iterating over a GHashTable is that the order is non
deterministic. We should have a defined order in which things happen. To
achieve that, track three different CList, one for each object type.
Also, I expect that to be slightly faster, as you only have to iterate
over the list you care about.
(cherry picked from commit f315ca9e84)
NM_HASH_OBFUSCATE_PTR() is some snake-oil to not log raw pointer values.
It obviously makes debugging harder.
But we don't need to generate differently obfuscated pointer values.
At least, let most users use the same obfuscation, so that the values
are comparable.
(cherry picked from commit 3e6c8d220a)
Let's just always allocate the hash tables. We will likely need them,
and three hash tables are relatively cheap.
(cherry picked from commit 5b3e96451b)
Routes of type blackhole, unreachable, prohibit don't have an
ifindex/device. They are thus in many ways similar to routing rules,
as they are global. We need a mediator to keep track which routes
to configure.
This will be very similar to what NMPRulesManager already does for
routing rules. Rename the API, so that it also can be used for routes.
Renaming the file will be done next, so that git's rename detection
doesn't get too confused.
(cherry picked from commit ea4f6d7994)
So far, certain NMObject types could not have an ifindex of zero. Hence,
nmp_lookup_init_object() took such an ifindex to mean lookup all objects
of that type.
Soon, we will support blackhole/unreachable/prohibit route types, which
have their ifindex set to zero. It is still useful to lookup those routes
types via nmp_lookup_init_object().
Change behaviour how to interpret the ifindex. Note that this also
affects various callers of nmp_lookup_init_object(). If somebody was
relying on the previous behavior, it would need fixing.
(cherry picked from commit d4ad9666bd)
_vt_cmd_obj_is_alive_ipx_route() is called by nmp_object_is_alive().
Non-alive objects are not put into the cache.
That certainly makes sense for RTM_F_CLONED routes, because they are
generated ad-hoc during the `ip route get` request.
Checking for the ifindex is not necessary. For one, some route types
(blackhole, unreachable, prohibit) don't have an ifindex. Also, the
purpose of _vt_cmd_obj_is_alive_ipx_route() is not to validate the
object. Just don't create objects without an ifindex, if you think the
route needs an ifindex. Checking here is not useful.
We also don't check that other fields like rt_source are valid, so there
is no need to do it for the ifindex either.
(cherry picked from commit 1123d3a5fb)
Currently, for NMPlatformIP[46]Route always has a gateway, even if it's
possibly set to 0.0.0.0/::. Not sure whether kernel has a further
distinction between no-gateway and all-zero gateway.
Anyway. For us, a gateway of 0.0.0.0/:: means the same as having no
gateway. We cannot differentiate the two (nor do we need to).
Don't print that in nm_platform_ip[46]_route_to_string().
Also, because we are going to add blackhole route types, which cannot
have a next-hop. But we do this change for all routes types, because
it makes sense in general (and also what `ip route show` prints).
(cherry picked from commit b58711f20d)
The variable with this purpose is usually called "IS_IPv4".
It's upper case, because usually this is a const variable, and because
it reminds of the NM_IS_IPv4(addr_family) macro. That letter case
is unusual, but it makes sense to me for the special purpose that this
variable has.
Anyway. The naming of this variable is a different point. Let's
use the variable name that is consistent and widely used.
(cherry picked from commit 8085c0121f)
_nm_ip_route_attribute_validate_all() validates all attributes together.
As such, it calls to nm_ip_route_attribute_validate(), which in turn
validates one attribute at a time.
Such full validation needs to check that (potentially conflicting)
attributes are valid together. Hence, _nm_ip_route_attribute_validate_all()
needs again peek into the attributes.
Refactor the code, so that we can extract the pieces that we need and
not need to parse them twice.
(cherry picked from commit 0413b1bf8a)
First of all, all of NMVariantAttributeSpec is internal API. We only
expose the typedef itself as public API, but not its fields nor
their meaning. So we can change things.
Change "str_type" to "type_detail", so that it can work for any kind of
attribute, not only for strings. Usually, we want to avoid special
cases and treat all attributes the same, based on their GVariant type.
But sometimes, it is necessary to do something special with an
attribute. This is what the "type_detail" encodes, but it's not only
relevant for strings.
(cherry picked from commit 6f277d8fa6)
Usually the normalization (canonicalize) and validation of the IP
address string both requires to parse the string. As we always do
validation first, we can use the parsed address and don't need to parse
it a second time.
(cherry picked from commit 00e4f21629)
Order the fields by their size, to minimize the alignment gaps.
I guess, that doesn't matter because the alignment of the heap
allocation is larger than what we can safe here. Still, there is
on reason to do it any other way.
Also, it's not possible via API to set family/prefix to values outside
their range, so an 8bit integer is always sufficient. And we don't want
that invariant to change. We don't ever want to allow the caller to set
values that are clearly invalid, and will assert against that early (g_return()).
Point is, we can do this and there is no danger of future problems.
And even if we will support larger values, it's all an implementation
detail anyway.
(cherry picked from commit 6208a1bb84)
`git bisect run` is peculiar about the exit code:
error: bisect run failed: exit code 134 from '...' is < 0 or >= 128
If we just "exec" the test, it usually will fail on an assert. That results
in SIGABRT or exit code 134. So out of the box that is annoying with
git-bisect.
Work around that and let the test wrapper always coerce any test failure
to exit code 1.
(cherry picked from commit f65747f6e9)
We made the choice, that NMPlatformIPRoute does not contain the actual
route table, instead it contains a "remapped" number: table_coerced.
That remapping done, so that the default (which we want semantically to
be 254, RT_TABLE_MAIN) is numerical zero so that struct initialization
doesn't you require to explicitly set the default.
Hence, we must always distinguish whether we have the real table number
or the "table_coerced", and you must convert back and forth between the
two.
Now, the parameter of nm_l3_config_data_merge() are real table numbers
(as also indicated by their name not having the term "coerced"). So
usually they are set to actually 254.
When we set the field of NMPlatformIPRoute, we must coerce it. This was
wrong, and we would see wrong table numbers in the log:
l3cfg[17b98e59a477b0f4,ifindex=2]: obj-state: track: [2a32eca99405767e, ip4-route, type unicast table 0 0.0.0.0/0 via ...
Fixes: b4aa35e72d ('l3cfg: extend nm_l3cfg_add_config() to accept default route table and metric')
(cherry picked from commit e23ebe9183)