NetworkManager/src/core/dhcp
Beniamino Galvani 24461954d0 dhcp: reset IPv6 DAD flag on lease update
If the client was waiting for IPv6 DAD to complete and the lease was
updated or lost, `wait_ipv6_dad` needs to be cleared; otherwise, at
the next platform change the client will try to evaluate the DAD state
with a different or no lease. In particular if there is no lease the
client will try to decline it because there are no valid addresses,
leading to an assertion failure:

 ../src/core/dhcp/nm-dhcp-client.c:997:_dhcp_client_decline: assertion failed: (l3cd)

Backtrace:

  __GI_raise ()
  __GI_abort ()
  g_assertion_message ()
  g_assertion_message_expr ()
  _dhcp_client_decline (self=0x1af13b0, l3cd=0x0, error_message=0x8e25e1 "DAD failed", error=0x7ffec2c45cb0) at ../src/core/dhcp/nm-dhcp-client.c:997
  l3_cfg_notify_cb (l3cfg=0x1bc47f0, notify_data=0x7ffec2c46c60, self=0x1af13b0) at ../src/core/dhcp/nm-dhcp-client.c:1190
  g_closure_invoke ()
  g_signal_emit_valist ()
  g_signal_emit ()
  _nm_l3cfg_emit_signal_notify () at ../src/core/nm-l3cfg.c:629
  _nm_l3cfg_notify_platform_change_on_idle () at ../src/core/nm-l3cfg.c:1390
  _platform_signal_on_idle_cb () at ../src/core/nm-netns.c:411
  g_idle_dispatch ()

Fixes: 393bc628ff ('dhcp: wait DAD completion for DHCPv6 addresses')

https://bugzilla.redhat.com/show_bug.cgi?id=2179890
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1594
2023-04-06 15:56:59 +02:00
..
tests dhcp: fix test for out-of-tree build 2023-01-11 10:54:01 +01:00
meson.build build: remove shared/ directory 2021-02-24 12:49:13 +01:00
nm-dhcp-client-logging.h format: reformat source tree with clang-format 13.0 2021-11-29 09:31:09 +00:00
nm-dhcp-client.c dhcp: reset IPv6 DAD flag on lease update 2023-04-06 15:56:59 +02:00
nm-dhcp-client.h dhcp: support prefix delegation hint 2023-04-03 16:04:55 +02:00
nm-dhcp-dhclient-utils.c dhcp/dhclient: fix honoring "ipv6.dhcp-duid" when explicitly set 2022-12-19 11:29:19 +01:00
nm-dhcp-dhclient-utils.h dhcp/dhclient: fix honoring "ipv6.dhcp-duid" when explicitly set 2022-12-19 11:29:19 +01:00
nm-dhcp-dhclient.c dhcp: support prefix delegation hint 2023-04-03 16:04:55 +02:00
nm-dhcp-dhcpcanon.c all: use _NM_G_TYPE_CHECK_INSTANCE_CAST() for internal uses 2022-12-16 10:55:03 +01:00
nm-dhcp-dhcpcd.c all: use _NM_G_TYPE_CHECK_INSTANCE_CAST() for internal uses 2022-12-16 10:55:03 +01:00
nm-dhcp-helper-api.h all: move "src/" directory to "src/core/" 2021-02-08 09:56:41 +01:00
nm-dhcp-helper.c log,dhcp: avoid deprecated GTimeVal API and use g_get_real_time() 2023-03-21 10:21:28 +01:00
nm-dhcp-listener.c dhcp/dhclient: implement accept/decline (ACD) for dhclient plugin 2022-05-31 18:32:36 +02:00
nm-dhcp-listener.h all: use _NM_G_TYPE_CHECK_INSTANCE_CAST() for internal uses 2022-12-16 10:55:03 +01:00
nm-dhcp-manager.c dhcp: add and use _NMLOG() macro for "nm-dhcp-manager.c" 2022-05-31 18:32:35 +02:00
nm-dhcp-manager.h all: use _NM_G_TYPE_CHECK_INSTANCE_CAST() for internal uses 2022-12-16 10:55:03 +01:00
nm-dhcp-nettools.c dhcp: add "static_key" argument to nm_dhcp_option_add_option() etc. 2023-02-21 09:13:09 +01:00
nm-dhcp-options.c dhcp: export the prefix delegation 2023-04-03 16:04:55 +02:00
nm-dhcp-options.h dhcp: export the prefix delegation 2023-04-03 16:04:55 +02:00
nm-dhcp-systemd.c dhcp: export the prefix delegation 2023-04-03 16:04:55 +02:00
nm-dhcp-utils.c clang-format: reformat code with clang-format 15.0.4-1.fc37 2022-11-23 09:17:21 +01:00
nm-dhcp-utils.h dhcp: log messages about invalid DHCP options 2022-05-16 09:49:06 +02:00
README.next.md core: rework IP configuration in NetworkManager using layer 3 configuration 2021-11-18 16:21:29 +01:00

NMDhcpClient

Using NMDhcpClient still requires a lot of logic in NMDevice. The main goal is to simplify NMDevice, so NMDhcpClient must become more complicated to provide a simpler (but robust) API.

NMDevice has basically two timeouts (talking about IPv4, but it applies similarly to IPv6): ipv4.dhcp-timeout and ipv4.required-timeout. They control how long NMDevice is willing to try, before failing the activation altogether. Note that with ipv4.may-fail=yes, we may very well never want to fail the activation entirely, regardless how DHCP is doing. In that case we want to stay up, but also constantly retrying whether we cannot get a lease and recover.

Currently, if NMDhcpClient signals a failure, then it's basically up to NMDevice to schedule and retry. That is complicated, and we should move the complexity out of NMDevice.

NMDhcpClient should have a simpler API:

  • nm_dhcp_manager_start_ip[46](): creates (and starts) a NMDhcpClient instance. The difference is, this function tries really hard not to fail to create an NMDhcpClient. There is no explicit start(), but note that the instance must not emit any signals before the next maincontext iteration. That is, it only will call back the user after a timeout/idle or some other IO event, which happens during a future iteration of the maincontext.

  • nm_dhcp_client_stop(): when NMDevice is done with the NMDhcpClient instance, it will stop it and throw it away. This method exists because NMDhcpClient is a GObject and ref-counted. Thus, we don't want to rely on the last unref to stop the instance, but have an explicit stop. After stop, the instance is defunct and won't emit any signals anymore. The class does not need to support restarting a stopped instance. If NMDevice wants to restart DHCP, it should create a new one. NMDevice would only want to do that, if the parameters change, hence a new instance is in order (and no need for the complexity of restart in NMDhcpClient).

  • as already now, NMDhcpClient is not very configurable. You provide most (all) parameters during nm_dhcp_manager_start_ip[46](), and then it keeps running until stop.

  • NMDhcpClient exposes a simple state to the user:

    1. "no lease, but good". When starting, there is no lease, but we are optimistic to get one. This is the inital state, but we can also get back to this state after we had a lease (which might expire).

    2. "has a lease". Here there is no need to distinguish whether the current lease was the first we received, or whether this was an update. In this state, the instance has a lease and we are good.

    3. "no lease, but bad". NMDhcpClient tries really hard, and "bad" does not mean that it gave up. It will keep retrying, it's just that there is little hope of getting a new lease. This happens, when you try to run DHCP on a Layer3 link (WireGuard). There is little hope to succeed, but NMDhcpClient (theoretically) will retry and may recover from this. Another example is when we fail to start dhclient because it's not installed. In that case, we are not optimistic to recover, however NMDhcpDhclient will retry (with backoff timeout) and might still recover from this. For most cases, NMDevice will treat the no-lease cases the same, but in case of "bad" it might give up earlier.

When a lease expires, that does not necessarily mean that we are now in a bad state. It might mean that the DHCP server is temporarily down, but we might recover from that easily. "bad" really means, something is wrong on our side which prevents us from getting a lease. Also, imagine dhclient dies (we would try to restart, but assume that fails too), but we still have a valid lease, then possibly NMDhcpClient should still pretend all is good and we still have a lease until it expires. It may be we can recover before that happens. The point of all of this, is to hide errors as much as possibly and automatically recover. NMDevice will decide to tear down, if we didn't get a lease after ipv4.dhcp-timeout. That's the main criteria, and it might not even distinguish between "no lease, but good" and "no lease, but bad".

  • NMDhcpClient will also take care of the ipv4.dhcp-timeout grace period. That timeout is provided during start, and starts ticking whenever there is no lease. When it expires, a timeout signal gets emitted. That's it. This is independent from the 3 states above, and only saves NMDevice from scheduling this timer themselves. This is NM_DHCP_CLIENT_NOTIFY_TYPE_NO_LEASE_TIMEOUT notification.

  • for nettools, nm_dhcp_client_can_accept() indicates that when we receive a lease, we need to accept/decline it first. In that case, NMDevice optionally does ACD first, then configures the IP address first and calls nm_dhcp_client_accept(). In case of ACD conflict, it will call nm_dhcp_client_decline() (which optimally causes NMDhcpClient to get a different lease). With this, the above state "has a lease" has actually three flavors: "has a lease but not yet ACD probed" and "has a lease but accepted/declined" (but NM_DHCP_CLIENT_SIGNAL_STATE_CHANGED gets only emitted when we get the lease, not when we accept/decline it). With dhclient, when we receive a lease, it means "has a lease but accepted" right away.

  • for IPv6 prefix delegation, there is also needed_prefixes and NM_DHCP_CLIENT_NOTIFY_TYPE_PREFIX_DELEGATED. Currently needed_prefixes needs to be specified during start (which simplifies things). Maybe needed_prefixes should be changable at runtime. Otherwise, whether we have prefixes is similar to whether we have a lease, and the simple 3 states apply.

When NetworkManager quits, it may want to leave the interface up. In that case, we still always want to stop the DHCP client, but possibly not deconfiguring the interface. I don't think that this concerns NMDhcpClient, because NMDhcpClient only provides the lease information and NMDevice is responsible to configure it.