Commit graph

753 commits

Author SHA1 Message Date
Lubomir Rintel
897977e960 ovs-port: avoid removing the OVSDB entry if we're shutting down
Since commit ecc73eb239 ('ovs-port: always remove the OVSDB entry on
slave release'), ovs port were removing the ovsdb entry upon being
un-enslaved, no matter what the reason for un-enslavement was. The idea
was to remove the stale ovsdb entry upon forcible device removal.

This cleanup is specific to OpenVSwitch, since for other device types,
the device master is the property of the slave and thus goes away along
with the device.

Turns out we're now removing the ovsdb entry even when the device
actually doesn't go away, but we're pretending it does because the
daemon is shutting down.

To add insult to injury, we generally end up removing one entry,
because the other ovsdb calls end up in a queue and don't get serviced
before the daemon shuts down. The result is a mess. (This patch
doesn't solve that -- if someone terminates the daemon with in-flight
ovsdb calls they're still out of luck).

Let's do the cleanup now only if the device was actually physically
removed.

Fixes-test: @NM_reboot_openvswitch_vlan_configuration
Fixes: ecc73eb239 ('ovs-port: always remove the OVSDB entry on slave release')

https://bugzilla.redhat.com/show_bug.cgi?id=2055665
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1117
2022-02-22 18:58:47 +01:00
Lubomir Rintel
a05de15414 ovsdb: register a shutdown objects for in-flight calls
Once the shutdown logic is in place, we don't want to shut down until
the OVSDB calls are serviced.

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1118
2022-02-22 18:57:36 +01:00
Thomas Haller
dab2ee8ac5
all: suppress wrong gcc-12 warning "-Wdangling-pointer"
gcc-12.0.1-0.8.fc36 is annoying with false positives.
It's related to g_error() and its `for(;;) ;`.

For example:

    ../src/libnm-glib-aux/nm-shared-utils.c: In function 'nm_utils_parse_inaddr_bin_full':
    ../src/libnm-glib-aux/nm-shared-utils.c:1145:26: error: dangling pointer to 'error' may be used [-Werror=dangling-pointer=]
     1145 |                     error->message);
          |                          ^~
    /usr/include/glib-2.0/glib/gmessages.h:343:32: note: in definition of macro 'g_error'
      343 |                                __VA_ARGS__);         \
          |                                ^~~~~~~~~~~
    ../src/libnm-glib-aux/nm-shared-utils.c:1133:31: note: 'error' declared here
     1133 |         gs_free_error GError *error = NULL;
          |                               ^~~~~
    /usr/include/glib-2.0/glib/gmessages.h:341:25: error: dangling pointer to 'addrbin' may be used [-Werror=dangling-pointer=]
      341 |                         g_log (G_LOG_DOMAIN,         \
          |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      342 |                                G_LOG_LEVEL_ERROR,    \
          |                                ~~~~~~~~~~~~~~~~~~~~~~~
      343 |                                __VA_ARGS__);         \
          |                                ~~~~~~~~~~~~
    ../src/libnm-glib-aux/nm-shared-utils.c:1141:13: note: in expansion of macro 'g_error'
     1141 |             g_error("unexpected assertion failure: could parse \"%s\" as %s, but not accepted by "
          |             ^~~~~~~
    ../src/libnm-glib-aux/nm-shared-utils.c:1112:14: note: 'addrbin' declared here
     1112 |     NMIPAddr addrbin;
          |              ^~~~~~~

I think the warning could potentially be useful and prevent real bugs.
So don't disable it altogether, but go through the effort to suppress it
at the places where it currently happens.

Note that NM_PRAGMA_WARNING_DISABLE_DANGLING_POINTER macro only expands
to suppressing the warning with __GNUC__ equal to 12. The purpose is to
only suppress the warning where we know we want to. Hopefully other gcc
versions don't have this problem.

I guess, we could also write a NM_COMPILER_WARNING() check in
"m4/compiler_options.m4", to disable the warning if we detect it. But
that seems too cumbersome.
2022-02-21 19:50:52 +01:00
Thomas Haller
f18bf17dea
wifi: cleanup ensure_hotspot_frequency()
wifi: choose a (stable) random channel for Wi-Fi hotspot

The channel depends on the SSID.

Based-on-patch-by: xiangnian <xiangnian@uniontech.com>

See-also: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1054

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1099
2022-02-21 16:03:24 +01:00
Lubomir Rintel
47ff99515f ppp-manager: give PPP more time to terminate
pppd is a delicate flower. On orderly shutdown, it likes to tell the
other side. This seems to take at least a second even when no real
network latency is at play, on busy systems 1.5 seconds easily ends up
being inadequate.

A violent shutdown is generally okay apart from that it can leave
garbage (port lock) behind and the other side potentially confused for a
while.

As it happens, this interacts badly with modemu.pl which is used for
testing: the pseudo terminal in PPP line discipline mode has no idea
that the remote disconnected and while ModemManager is learning that
something wrong the hard way (AT command timing out, because the remote
still expects to talk PPP), the test times out.

Let's increase the timeout to something more reasonable.

https://bugzilla.redhat.com/show_bug.cgi?id=2049596
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1103
2022-02-19 13:34:02 +01:00
Beniamino Galvani
413e522867 core: fix DNS configuration type for wireguard connections
The DNS configuration for a wireguard connection should be added with
type "VPN".

Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1102
2022-02-16 22:49:28 +01:00
Beniamino Galvani
f15b3f15a7 device: delay IP ready state until all objects are committed
Don't progress to the IP ready state until all objects are committed
to platform. Note that l3cfg has a 20 seconds timeout after which
unavailable objects are considered "definitely unavailable" and are
removed from the list.

Fixes-test: @ipv6_routes_with_src
https://bugzilla.redhat.com/show_bug.cgi?id=2043133
2022-02-16 15:12:52 +01:00
Beniamino Galvani
9a090fdf7b core: do a commit after all addresses complete ACD/DAD
l3cfg has a "temp_not_available" list of objects that couldn't be
added to platform, but can be added once some preconditions become
true (for example, a IPv6 route with a "src" attribute requires a
non-tentative src address to be present).

Retry to commit those objects once all addresses have completed
ACD/DAD.
2022-02-16 15:12:52 +01:00
Thomas Haller
a2c8a3228b
device: fix crash for shared IPv6 method in nm_device_copy_ip6_dns_config()
nm_l3_config_data_get_nameservers() returns a pointer to "struct in6_addr". Not
a pointer to pointers.

  #0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:389
  #1  0x00007f8060dd9109 in memcpy (__len=<optimized out>, __src=0xfd, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:29
  #2  g_array_append_vals (len=1, data=0xfd, farray=0x55dd69332130) at ../glib/garray.c:522
  #3  g_array_append_vals (farray=0x55dd69332130, data=0xfd, len=1) at ../glib/garray.c:509
  #4  0x000055dd68d2a27d in _garray_inaddr_add (p_arr=<optimized out>, addr_family=<optimized out>, addr=0xfd) at src/core/nm-l3-config-data.c:295
  #5  0x000055dd68ef6510 in nm_l3_config_data_add_nameserver (nameserver=<optimized out>, addr_family=10, self=0x55dd6949f900) at src/core/nm-l3-config-data.c:1442
  #6  nm_device_copy_ip6_dns_config (self=0x55dd693c4420, from_device=<optimized out>) at src/core/devices/nm-device.c:10468
  #7  0x00007f8060f28aba in _g_closure_invoke_va (param_types=0x0, n_params=<optimized out>, args=0x7fffed43d610, instance=0x55dd693c4420, return_value=0x0, closure=0x55dd693cdb10)
      at ../gobject/gclosure.c:893
  #8  g_signal_emit_valist (instance=0x55dd693c4420, signal_id=<optimized out>, detail=0, var_args=var_args@entry=0x7fffed43d610) at ../gobject/gsignal.c:3406
  #9  0x00007f8060f28c03 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at ../gobject/gsignal.c:3553
  #10 0x000055dd68efd1fb in _dev_ipac6_start (self=0x55dd693c4420) at src/core/devices/nm-device.c:11348
  #11 0x000055dd68efd698 in _dev_ipac6_start_continue (self=0x55dd693c4420) at src/core/devices/nm-device.c:11373
  #12 _dev_ipll6_set_llstate (self=0x55dd693c4420, llstate=<optimized out>, lladdr=<optimized out>) at src/core/devices/nm-device.c:10576
  #13 0x000055dd68e7915e in _emit_changed_on_idle_cb (user_data=user_data@entry=0x55dd6941ca50) at src/core/nm-l3-ipv6ll.c:221
  #14 0x00007f8060e0639b in g_idle_dispatch (source=0x55dd693eea30, callback=0x55dd68e78fd0 <_emit_changed_on_idle_cb>, user_data=0x55dd6941ca50) at ../glib/gmain.c:5897
  #15 0x00007f8060e0a05f in g_main_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:3381
  #16 g_main_context_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:4099
  #17 0x00007f8060e5f2a8 in g_main_context_iterate.constprop.0 (context=0x55dd6922c800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4175
  #18 0x00007f8060e09773 in g_main_loop_run (loop=0x55dd69211010) at ../glib/gmain.c:4373
  #19 0x000055dd68d09c7b in main (argc=<optimized out>, argv=<optimized out>) at src/core/main.c:509

Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
2022-02-16 10:32:51 +01:00
Thomas Haller
dc64cff166
core/tests: make test_machine_id_read() more robust against the test system
test_machine_id_read() is a flawed unit test, as it reads the machine-id
of the machine where it's running. That means the test depends on the
test machine, which is obviously a problem.

If you had no /etc/machine-id but a /var/lib/dbus/machine-id, then previously
the test would fail. If the file exists, assume we are able to read a
valid machine-id.

On test systems that have a bogus /etc/machine-id or /var/lib/dbus/machine-id,
the test would still fail. Just don't do that.
2022-02-10 19:06:33 +01:00
Thomas Haller
eaa0b533da
core/trivial: add code comment to _set_hostname() 2022-02-10 18:24:07 +01:00
Fernando Fernandez Mancera
d904f37022 nm-l3cfg: fix check on timestamp for assuming probing is good
https://bugzilla.redhat.com/show_bug.cgi?id=2028751

Fixes: db0d84f13a (“l3cfg: fix handling "instance-reset" ACD event”)
2022-02-10 10:51:57 +01:00
Thomas Haller
9ab53e561a
core/l3cfg: let NML3Cfg handle nodev (blackhole) routes
Certain route types (blackhole, unreachable, prohibit) are not tied to
an interface. They are thus global and we need to track them system wide
(or better: per network namespace). That is done by NMPRouteManager.

For the routing rules, it's NMDevice itself to track/untrack the rules.
That is done for historical reasons, at the time, NML3Cfg did not exit.
Now with NML3Cfg, it seems that also NML3Cfg should be the part that
handles nodev routes. One reason is that we want to move IP
functionality out of NMDevice. So callers (NMDevice) would just add
blackhole routes to the NML3ConfigData and let NML3Cfg handle them.

Still, to handle these routes is rather different from regular routes.
Normally, NML3Cfg tracks an object state (ObjStateData) for each address/route,
and it hooks into platform signals to update the os_plobj field. Those signals
are dispatched by NMNetns and are only per-ifindex. Hence, NML3Cfg
wouldn't be notified about those nodev routes. Consequently, there
os_plobj could not be (efficiently) maintained and there is no
ObjStateData for such routes.

Instead, all that NML3Cfg does is have the routes in the NML3ConfigData and
tell NMPRouteManager about them. Seems simple enough. The only question
is when should NMPRouteManager sync? For now, we sync when the
track/untracking brings any changes and during reapply. Which is
probably fine.
2022-02-09 19:13:05 +01:00
Thomas Haller
6255e0dcac
core: handle blackhole/unreachable/prohibit route types in core
Specifically, in nm_utils_ip_route_attribute_to_platform() and in
_l3_config_data_add_obj() handle such new route type. For the moment,
they cannot be stored in a valid NMSettingIPConfig, but later this will
be necessary.
2022-02-09 19:13:05 +01:00
Thomas Haller
e32bc6d248
core/l3cfg: rework generating list of routes in _l3_commit_one()
This will be required next, when we will have also routes without a
device. Split the generation of the route list out.
2022-02-09 19:13:05 +01:00
Thomas Haller
9e90bb0817
platform: improve way to prune dirty route-manager entries
The general idea is that when we have entries tracked by the
route-manager, that we can mark them all as dirty. Then, calling the
"track" function will reset the dirty flag. Finally, there is a method
to delete all dirty entries.

As we can lookup an entry with O(1) (using dictionaries), we can
sync the list of tracked objects with O(n). We just need to track
all the ones we care about, and then delete those that were not touched
(that is, are still dirty).

Previously, we had to explicitly mark all entries as dirty. We can do
better. Just let nmp_route_manager_untrack_all() mark the survivors as
dirty right away. This way, we can save iterating the list once.

It also makes sense because the only purpose of the dirty flag is to
aid this prune mechanism with track/untrack-all. So, untrack-all can
just help out, and leave the remaining entries dirty, so that the next
track does the right thing.
2022-02-09 19:13:05 +01:00
Thomas Haller
7c27c63bec
platform: extend NMPRouteManager to work for routes 2022-02-09 19:13:04 +01:00
Thomas Haller
3996933c57
platform: rename "nmp-route-manager.h" to "nmp-rules-manager.h" 2022-02-09 19:13:03 +01:00
Thomas Haller
ea4f6d7994
platform: rename NMPRulesManager API to NMPRouteManager
Routes of type blackhole, unreachable, prohibit don't have an
ifindex/device. They are thus in many ways similar to routing rules,
as they are global. We need a mediator to keep track which routes
to configure.

This will be very similar to what NMPRulesManager already does for
routing rules. Rename the API, so that it also can be used for routes.

Renaming the file will be done next, so that git's rename detection
doesn't get too confused.
2022-02-09 19:13:03 +01:00
Thomas Haller
92f51c6b43
platform: add support for blackhole,unreachable,prohibit route type 2022-02-09 19:13:03 +01:00
Thomas Haller
7ad14b86f8
platform: add nm_platform_route_type_is_nodev() helper 2022-02-09 19:13:03 +01:00
Thomas Haller
d4ad9666bd
platform: don't treat ifindex zero special in nmp_lookup_init_object()
So far, certain NMObject types could not have an ifindex of zero. Hence,
nmp_lookup_init_object() took such an ifindex to mean lookup all objects
of that type.

Soon, we will support blackhole/unreachable/prohibit route types, which
have their ifindex set to zero. It is still useful to lookup those routes
types via nmp_lookup_init_object().

Change behaviour how to interpret the ifindex. Note that this also
affects various callers of nmp_lookup_init_object(). If somebody was
relying on the previous behavior, it would need fixing.
2022-02-09 19:13:03 +01:00
Thomas Haller
596d1645e8
core: use IS_IPv4 variable in nm_utils_ip_route_attribute_to_platform()
It's what we do at many other places. Consistency.
2022-02-09 19:13:03 +01:00
Thomas Haller
26e9deb831
core/rfkill: cleanup arguments for "rfkill-changed" signal
The signal parameters are G_TYPE_UINT. We should not assume that our
enums are a compatible integer type.

In practice of course they always were. Let's just be clear about when
we have an integer and when we have an enum.
2022-02-08 18:58:54 +01:00
Thomas Haller
aa15ec6090
core/rfkill: use nm_streq() in "nm-rfkill-manager.c" 2022-02-08 18:58:54 +01:00
Thomas Haller
e3e17f0b4b
core/rfkill: use CList to track Rfkill killswitches
GSList is almost always the wrong choice. We can just embed
the next pointers in the structure using the intrusive CList.
2022-02-08 18:58:53 +01:00
Thomas Haller
bc7a387072
core/rfkill: cleanup initializing Killswitch struct
- use slice allocator
- use designated initializers
- first determine all parameters in killswitch_new() before
  setting ks.
- drop unnecessary memset().
2022-02-08 18:58:53 +01:00
Thomas Haller
96abd25c72
core/rfkill: don't assert for valid string in rfkill_type_to_enum()
"str" comes from udev. This is not a trusted component, and we must not assert
for valid input.
2022-02-08 18:58:53 +01:00
Thomas Haller
a19ca08bcf
core/rfkill: rename rfkill_state_to_desc() to nm_rfkill_state_to_string() 2022-02-08 18:58:53 +01:00
Thomas Haller
6b001b0685
core/rfkill: add nm_rfkill_type_to_string() helper 2022-02-08 18:58:53 +01:00
Thomas Haller
3b7d861eef
core/rfkill: use "nm_auto_close" in _rfkill_update_system() to close FD 2022-02-08 18:58:53 +01:00
Thomas Haller
467b21afa7
core/rfkill: rename Rfkill related functions in "nm-manager.c"
Naming is important. Especially when we have a 8k LOC monster, that
manages everything.

Rename things related to Rfkill to give them a common prefix.

Also, move code around so it's beside each other.
2022-02-08 18:58:53 +01:00
Thomas Haller
858a0ee47f
core/rfkill: move immutable type description of Rfkill to global field
RadioState contained both the mutable user-data and meta-data about the
radio state. The latter is immutable. The parts that cannot change, should
be separate from those that can change. Move them to a separate array.

Of course, we only have on NMManager instance, so having these fields embedded
in NMManagerPrivate did not waste memory. This change is done, because immutable
fields should be const. In this case, they are now const global data, which
is even protected by the linker from accidental mutation.
2022-02-08 18:58:53 +01:00
Thomas Haller
454992ed85
core/rfkill: add "nm" prefix to RfKillState and RfKillType enums
Names in header files should have an "nm" prefix. We do that pretty
consistently. Fix the offenders RfKillState and RfKillType.

Also, rename the RfKillState enums to follow the type name. For example,
NM_RFKILL_STATE_SOFT_BLOCKED instead of RFKILL_SOFT_BLOCKED.

Also, when we camel-case a typedef (NMRfKillState) we would want that
the lower-case names use underscore between the words. So it should be
`nm_rf_kill_state_to_string()`. But that looks awkward. So the right solution
here is to also rename "RfKill" to "Rfkill". That make is consistent
with the spelling of the existing `NMRfkillManager` type and the
`nm-rfkill-manager.h` file.
2022-02-08 18:58:53 +01:00
Thomas Haller
99ceb205e9
core/rfkill: cleanup rfkill handling in NMManager
- use "const RadioState" where possible

- use "bool" bitfields in RadioState (boolean flags in structs
  should be `bool` types for consistency) and order fields by
  alignment.

- break lines for variable declaration in manager_rfkill_update_one_type().

- return (and nm_assert()) from update_rstate_from_rfkill(). By
  not adding a default case, compiler would warn if we forget to
  handle an enum value. We can easily do that, by just returning,
  and let the "default" case be handled by nm_assert_not_reached()
  -- which unlike g_warn_if_reached() compiles to nothing in
  release build.

- add nm_assert() that `priv->radio_states[rtype]` is not out of range.

- use designated initializers for priv->radio_states[].
2022-02-08 18:58:52 +01:00
Thomas Haller
165224b485
core/rfkill: move rfkill_type property to NMDeviceClass
GObject Properties are flexible and powerful. In practice, NMDevicePrivate.rfkill_type
was only set once via the construct-only property NM_DEVICE_RFKILL_TYPE.
Which in turn was always set to a well-known value, only depending on the device
type.

We don't need this flexibility. The rfkill-type only depends on the
device type and doesn't change. Replace the property by a field in
NMDeviceClass.

For one, construct properties have an overhead, that the property setter is
called whenever we construct a NMDevice. But the real reason for this
change, is that a property give a notion as this could change during the
lifetime of a NMDevice (which it in fact did not, being construct-only).
Or that the type depends on something more complex, when instead it only
depends on the device type. A non-mutated class property is simpler,
because it's clear that it does not depend on the device instance,
only on the type/class.

Also, `git grep -w rfkill_type` now nicely shows the (few) references to
this variable and its easier to understand.
2022-02-08 18:58:52 +01:00
Beniamino Galvani
02106df3be device: fix required-timeout evaluation
Once the required-timeout expires, we should evaluate whether the
*other* address family is ready.

Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
Fixes-test: @dracut_NM_NFS_root_nfs_ip_dhcp_dhcp6_slow_ip6

https://bugzilla.redhat.com/show_bug.cgi?id=2051904
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1090
2022-02-08 18:12:07 +01:00
Thomas Haller
a2c4f071e4
all: drop /*<skip>*/ annotations for enums
We don't run glib-mkenums for certain sources like "core" and
"libnm-glib-aux".

These annotations have no effect. Drop them.
They also mess with the automated formatting.
2022-02-08 11:14:01 +01:00
Thomas Haller
e23ebe9183
l3cfg: fix setting default route table during nm_l3_config_data_merge()
We made the choice, that NMPlatformIPRoute does not contain the actual
route table, instead it contains a "remapped" number: table_coerced.
That remapping done, so that the default (which we want semantically to
be 254, RT_TABLE_MAIN) is numerical zero so that struct initialization
doesn't you require to explicitly set the default.

Hence, we must always distinguish whether we have the real table number
or the "table_coerced", and you must convert back and forth between the
two.

Now, the parameter of nm_l3_config_data_merge() are real table numbers
(as also indicated by their name not having the term "coerced"). So
usually they are set to actually 254.

When we set the field of NMPlatformIPRoute, we must coerce it. This was
wrong, and we would see wrong table numbers in the log:

  l3cfg[17b98e59a477b0f4,ifindex=2]: obj-state: track: [2a32eca99405767e, ip4-route, type unicast table 0 0.0.0.0/0 via ...

Fixes: b4aa35e72d ('l3cfg: extend nm_l3cfg_add_config() to accept default route table and metric')
2022-02-07 16:08:28 +01:00
Lubomir Rintel
c8ba636d0e dhcp-options: change "dhcp6_ntp_server" to plural form
The option can be present multiple times.

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1085
2022-02-04 15:54:15 +01:00
Thomas Haller
3399e19df8
device: update metered status when getting DHCP lease
The metered status can depend on the DHCP lease, as we accept the
ANDROID_METERED vendor option. That means, on a DHCP update we need
to re-evaluate the metered flag.

This fixes a potential race, where IPv6 might succeed first and
activation completes (with GUESS_NO metered flag). A subsequent
DHCPv4 update requires to re-evaluate that decision.

Fixes-test: @connection_metered_guess_yes

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1080
2022-02-03 18:46:38 +01:00
Lubomir Rintel
ecc73eb239 ovs-port: always remove the OVSDB entry on slave release
When the link is externally removed, the OVSDB entry will be left
behind. That's not cool -- we need to remove it.

https://bugzilla.redhat.com/show_bug.cgi?id=1935026
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1084
2022-02-03 13:38:21 +01:00
Beniamino Galvani
d72a292005 device: fix assuming connections when platform-init arrives late
When the NM_UNMANAGED_PLATFORM_INIT flag is cleared last in
device_link_changed(), a recheck-assume is scheduled and then the
device goes immediately to UNAVAILABLE. During the state transition,
addresses and routes are removed from the interface. Then,
recheck-assume finds that the device can be assumed but it's too late
since the device was already deconfigured.

This is a problem as the whole point of assuming a device is to
activate a connection while leaving the device untouched.

In the NMCI "dracut_NM_vlan_over_bridge and dracut_NM_vlan_over_bond"
test, NM in real root tries to assume a vlan device that was activated
in initrd. When the interface gets deconfigured in UNAVAILABLE, the
connection to the NFS server breaks and the rootfs becomes
inaccessible.

The fix to this problem is to delay state transitions in
device_link_changed() to a idle handler, so that recheck-assume can
run before.

Fixes-test: @dracut_NM_vlan_over_bridge
Fixes-test: @dracut_NM_vlan_over_bond

https://bugzilla.redhat.com/show_bug.cgi?id=2047302
2022-02-03 09:17:20 +01:00
Beniamino Galvani
b2e1bd4436 device: remove unused if branch in device_link_changed()
nm_device_set_unmanaged_by_user_settings() does nothing when the
device is unmanaged by platform-init. Remove the if branch to make
this more explicit.
2022-02-03 09:16:39 +01:00
Thomas Haller
db0d84f13a
l3cfg: fix handling "instance-reset" ACD event
The ACD state handling is unfortunately very complicated. That is, because
we obviously need to track state about how ACD is going (the acd_data, and
in particular NML3AcdAddrState). Then there are various things that can
happen, which are the AcdStateChangeMode enums. All these state-changes
come together in one function: _l3_acd_data_state_change(), which is
therefore complicated (I don't think that it would become simpler by
spreading this code out to different functions, on the contrary).
Anyway.

So, what happens when we need to reset the n-acd instance? For example,
because the MAC address of the link changed or some error. I guess, we
need to restart probing.

Previously, I think this was not handled properly. We already tried to
fix this several times, the last was commit b331606386 ('l3cfg: on
n-acd instance-reset clear also ready ACD state'). There is still an
issue ([1]).

The bug [1] is, that we are in state NM_L3_ACD_ADDR_STATE_READY, during
 ACD_STATE_CHANGE_MODE_TIMEOUT event. That leads to an assertion
failure.

  #5  0x00007f23be74698f in g_assertion_message_expr (domain=0x5629aca70359 "nm", file=0x5629aca62aab "src/core/nm-l3cfg.c", line=2395, func=0x5629acb26b30 <__func__.72.lto_priv.4> "_l3_acd_data_state_change", expr=<optimized out>) at ../glib/gtestutils.c:3091
  #6  0x00005629ac937e46 in _l3_acd_data_state_change (self=0x5629add69790, acd_data=0x5629add8d520, state_change_mode=ACD_STATE_CHANGE_MODE_TIMEOUT, sender_addr=0x0, p_now_msec=0x7ffded506460) at src/core/nm-l3cfg.c:2395
  #7  0x00005629ac939f4d in _l3_acd_data_timeout_cb (user_data=user_data@entry=0x5629add8d520) at src/core/nm-l3cfg.c:1933
  #8  0x00007f23be71c5a1 in g_timeout_dispatch (source=0x5629addd7a80, callback=0x5629ac939ee0 <_l3_acd_data_timeout_cb>, user_data=0x5629add8d520) at ../glib/gmain.c:4889
  #9  0x00007f23be71bd4f in g_main_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:3337
  #10 g_main_context_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:4055

That can only happen, (I think) when we scheduled the timeout
during an earlier ACD_STATE_CHANGE_MODE_INSTANCE_RESET event. Meaning,
we need to handle instance-reset better.

Instead, during instance-reset, switch always back to state PROBING, and
let the timeout figure it out.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2047788
2022-02-02 11:08:31 +01:00
Thomas Haller
d7a951b947
l3cfg: add comment to _acd_data_collect_tracks_data() about linear search 2022-02-02 11:08:30 +01:00
Thomas Haller
f00980e5ca
l3cfg: add assertion for scheduling timeout in _l3_acd_data_timeout_schedule()
The first point is that ACD timeout is strongly tied to the current state. That
is (somewhat) obvious, because _l3_acd_data_state_set_full() will clear any pending
timeout. So you can only schedule a timeout *after* setting the state,
and setting the next state, will clear the timeout.

Likewise, note that l3_acd_data_state_change() for the event
ACD_STATE_CHANGE_MODE_TIMEOUT asserts that it is only called in the few
states where that is expected. See rhbz#2047788 where that assertion
gets hit.

The first point means that we must only schedule a timer when we are
also in a state that supports that. Add an assertion for that at the
point when scheduling the timeout. The assert at this point is useful,
because it catches the moment when we do the wrong thing (instead of
getting the assertion later during the timeout, when we no longer know
where the error happened).

See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2047788
2022-02-02 11:08:30 +01:00
Thomas Haller
6012853010
l3cfg: use nm_g_timeout_add_source() helpers in "nm-l3cfg.c"
It's shorter, and brevity lets us focus on the important things in the
code.
2022-02-02 11:08:30 +01:00
Thomas Haller
88e7ea20c3
l3cfg: only constructing log message when needed in _l3_acd_data_state_set_full() 2022-02-02 11:08:30 +01:00
Thomas Haller
5cbf666279
core: increase l3cfg merge priority for VPN config
Now that higher priorities numbers really mean more important,
it seems that the VPN configuration should be rather important.

Bump the number, also in relation to NMDevice's L3ConfigDataType.

It might not matter too much, because usually the VPN tunnel device does
not have NDevice to add other l3cds and those from VPN might be alone.
Except, maybe with routing VPN (libreswan) that is different. Dunno.
2022-02-01 19:22:02 +01:00