This is an interface to the Checkpoint/Restore functionality that's
available for quite some time. It runs a command with a checkpoint taken
and rolls back unless success is confirmed before the checkpoint times
out:
$ nmcli dev checkpoint eth0 -- nmcli dev dis eth0
Device 'eth0' successfully disconnected.
Type "Yes" to commit the changes: No
Checkpoint was removed.
The details about how it's used are documented in nmcli(1) and
nmcli-examples(7).
When the input ends, we indeed eventually want to shut down.
Nevertheless, it might be that we terminated the input *because* we're
already shutting down and want do do our cleanup. Let's not take the
shortcut to nmc_exit() in case the main loop is no longer running.
This doesn't affect existing uses of nmc_readline(), but will be useful
in a future patch.
This makes get_device_list() return an array of NMDevices with a
reference taken and a destroy notifier that unhooks disconnect_state_cb,
so that it could replace the GSList of the same utility used by
disconnect/delete commands.
Suggested-by: Thomas Haller <thaller@redhat.com>
A pointer array is slightly more efficient here, since we don't really
need the ability to insert elements in the middle. In fact, we'd prefer
if we could just add to the end, so that we'd spare some callers from a
need to do a g_slist_reverse().
Even though that alone being a good reason to use a GPtrArray instead of
GSList, I'm doing this for so that I could actually use the returned value
as-is in a call to nm_client_checkpoint_create() in a future patch.
Don't consider "--" a device name. Instead, treat it as a signal to stop
reading the device list.
If a caller expects nothing beyond the device names, it now has to
check.
Prior to this patch, get_device_list() would give the caller no clue
about how many options did it consume. That is okay -- it would always
process all argument until the end, so the no callers would really care.
In a further patch, I'd like to allow termination of the device name
list (with a "--" arguments), so it will be possible to specify further
arguments.
Let's change the protype of this routine to use pointers to argc/argv,
that it will be possible to adjust them.
When we're deactivating an externally created device that has a master
because we're activating a connection on it, actually remove the device
from the master. Otherwise unpleasant things happen:
active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
device (dummy0): disconnecting for new activation request.
device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
Note the "no-config" above. We'set priv->master = NULL, but didn't
communicate the change to the platform. I believe this is not good.
This patch changes that.
device (br0): bridge port dummy0 was detached
device (dummy0): released from master device br0
active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
platform: (dummy0) emit signal link-changed changed: 102: dummy0
<NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
Now the platform sent us a new link, the "master" property is still set.
device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
device (br0): master: add one slave 0a458361f9fed8f5/dummy0
Amidst the new activation we're processing the netlink message we got.
We set priv->master back, effectively nullifying the release above. Sad.
device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
active-connection[0x55ed7ba78400]: set state activating (was unknown)
manager: NetworkManager state is now CONNECTING
active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
platform: (dummy0) link: releasing 102 from master 'br0' (101)
device (br0): detached bridge port dummy0
Now things go south. The stage1 cleans the device up, removing it from
the master and the device itself decides it should deactivate itself
because it lots its master regardless of the fact that it should not
have one and it's in fact an unwanted carryover from previous activation.
I believe this is also wrong.
device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
device[0a458361f9fed8f5] (dummy0): queue-state[deactivating, reason:connection-assumed, id:298]: queue state change
device[0a458361f9fed8f5] (dummy0): activation-stage: synchronously invoke activate_stage2_device_config
device (dummy0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Now things are really weird. We synchronously go to config, effectively
overriding the queued deactivation. We've really messed up.
Sometimes weird things happen.
Let dummy0 be an externally created device that has a master. We decide
to activate a connection that has no master on it:
active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
device (dummy0): disconnecting for new activation request.
device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
Note the "no-config" above. We'set priv->master = NULL, but didn't
communicate the change to the platform. I believe this is not good.
device (br0): bridge port dummy0 was detached
device (dummy0): released from master device br0
active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
platform: (dummy0) emit signal link-changed changed: 102: dummy0
<NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
Now the platform sent us a new link, the "master" property is still set.
device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
device (br0): master: add one slave 0a458361f9fed8f5/dummy0
Amidst the new activation we're processing the netlink message we got.
We set priv->master back, effectively nullifying the release above.
device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
active-connection[0x55ed7ba78400]: set state activating (was unknown)
manager: NetworkManager state is now CONNECTING
active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
platform: (dummy0) link: releasing 102 from master 'br0' (101)
device (br0): detached bridge port dummy0
Now stage1 cleans the device up, removing it from the master.
device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
We decide to deal with this by enqueuing a deactivation. That is not
great -- we shouldn't even have had this master!
This patch takes the deactivation path only if we were willingly
enslaved to the master in question.
The @bond_mode_8023ad test has been seen failing, with a log like this:
<debug> [...3.0484] device[...] (eth1): Activation: connection 'bond0.0' master deactivated
<debug> [...3.0484] device[...] (eth1): add_pending_action (2): 'queued-state-change-deactivating'
<debug> [...3.0484] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: queue state change
What happened is that eth1 has been activating. It was already enslaved
to a bond and was in an ip-config state when the bond was removed.
A change to "deactivating" state has been enqueued. But then this
happened:
<trace> [...3.0942] device[...] (eth1): ip4: check-state: state done => done, is_failed=0, is_pending=0,
is_started=0 temp_na=0, may-fail-4=1, may-fail-6=1; disabled4; manualip4=done; ignore6 manualip6=done
<trace> [...3.0942] device[...] (eth1): ip: check-state: (combined) state pending => done
<debug> [...3.0943] device[...] (eth1): ip: set (combined) state done (was pending, reason: check-ip-state)
<info> [...3.0943] device (eth1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
<debug> [...3.0943] device[...] (eth1): add_pending_action (3): 'in-state-change'
<debug> [...3.0943] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: clear queued state change
The IP config succeeded and the queued "deactivating" change was
overriden by the IP4 check result, prompting a change to "ip-check".
With the master still missing. Not good.
Let's terminate the appempts to check the IP state when we cancel the
activation, so that it doesn't override the enqueued state change.
Fixes-test: @bond_mode_8023ad
https://bugzilla.redhat.com/show_bug.cgi?id=2080928https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1245
pppd also tries to configure addresses by itself through some
ioctls. If we remove between those calls an address that was added,
pppd fails and quits.
To avoid this race condition, don't remove addresses while IPCP and
IPV6CP are running. Once pppd sends an IP configuration, it has
finished configuring the interface and we can proceed normally.
https://bugzilla.redhat.com/show_bug.cgi?id=2085382
Currently we call nm_device_update_dynamic_ip_setup() in
carrier_changed() every time the carrier goes up again and the device
is activating, to kick a restart of DHCP.
Since we process link events in a idle handler, it can happen that the
handler is called only once for different events; in particular
device_link_changed() might be called once for a link-down/link-up
sequence.
carrier_changed() is "level-triggered" - it cares only about the
current carrier state. nm_device_update_dynamic_ip_setup() should
instead be "edge-triggered" - invoked every time the link goes from
down to up. We have a mechanism for that in device_link_changed(), use
it.
Fixes-test: @ipv4_spurious_leftover_route
https://bugzilla.redhat.com/show_bug.cgi?id=2079406https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1250
ipv6 DNS received on ppp interface were being ignored because their
priority was not set.
Fix this by using default priority in impl_ppp_manager_set_ip6_config(),
as was done for ip4_config in b2e559fab2 ("core: initialize l3cd
dns-priority for ppp and wwan")
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1022
Yes, we anyway log the timestamps for every log message. So one could
always calculate the offset. However, when you read a logfile, it can be
cumbersome to stop looking at where you currently are to find the
start/end of a call. For convenience, log the duration explicitly.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1251
- add code comments explaining some things.
- for NM_CMP_FIELD*() variants have a corresponding NM_CMP_DIRECT*()
macro and use it (aside the "memcmp" variants, which don't translate
directly).
l3cd instances must be removed from the old l3cfg before calling
_cleanup_ip_pre(). Otherwise, _cleanup_ip_pre() unregisters them from
the device, and later _dev_l3_register_l3cds(self, l3cfg_old, FALSE,
FALSE) does nothing because the device doesn't have any l3cd.
Previously the l3cds would linger in the l3cfg, keeping a reference to
it and causing a memory leak; the leak was not detected by valgrind
because the l3cfg was still referenced by the NMNetns.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
Fixes-test: @stable_mem_consumption2
https://bugzilla.redhat.com/show_bug.cgi?id=2083453https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1252
7db7dc4bab53 probe: merge branch 'th/decline-fixes'
bb61737788dd probe: fix internal state after declining lease
c5d0f38ab7a9 probe: maintain the probe's lease list in "n-dhcp4-c-probe.c"
48bf2788336e probe: return error when calling accept/decline/select in unexpected state
git-subtree-dir: src/n-dhcp4
git-subtree-split: 7db7dc4bab5312218135464d8550a86845ca6fdd
On python2 the following error is raised:
`LookupError: unknown encoding: unicode`
Seems like `unicode` is a correct encoding in Python 3 but not 2.
Fix:
1. Change encoding to `utf-8`
2. Pass output path string instead of opening file and passing
opened file object. Python2 and 3 might need different file
modes, passing just path lets ElementTree select appropriate
file mode.
Fixes: f00e90923c ('tools: Use ElementTree to write XML in generate-docs-nm-settings-docs-gir.py')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1249
Currently, for all tests we have python3 installed. So effectively,
even on CentOS 7, we would test the build with python3 only.
The package build on CentOS7/epel7 however uses python2. This happens
for example for our copr builds.
Also test that configuration in gitlab-ci.
This was working for internal plugin in the past, but broken by l3cfg
rework with 1.36. Re-add it. Not it also works with dhclient. For other
plugins, it's not really working, because we can't decline.
Now NMDhcpClient does ACD (using NML3Cfg) and abstracts that from
the caller (NMDevice).
It is complicated. Because there is state involved, meaning, we need
to remember the current state for ACD and react on and handle a
multitude of events. Getting this right, is non-trivial.
What we want is that if ACD fails, we decline the lease (and don't use
it).
https://bugzilla.redhat.com/show_bug.cgi?id=1713380
dhclient itself doesn't do ACD. However, it expects the dhclient-script
to exit with non-zero status, which causes dhclient to send a DECLINE.
`man dhclient-script`:
BOUND:
Before actually configuring the address, dhclient-script should
somehow ARP for it and exit with a nonzero status if it receives a
reply. In this case, the client will send a DHCPDECLINE message to
the server and acquire a different address. This may also be done in
the RENEW, REBIND, or REBOOT states, but is not required, and indeed may
not be desirable.
See also Fedora's dhclient-script ([1]).
https://gitlab.isc.org/isc-projects/dhcp/-/issues/67#note_9722633226f2d76/client/dhclient.c (L1652)
[1] a8f6fd046f/f/dhclient-script (_878)https://bugzilla.redhat.com/show_bug.cgi?id=1713380
- assign the result of NM_DHCP_CLIENT_GET_CLASS() to a local variable.
It feels nicer to only call the macro once. Of course, the macro
expands to plain pointer dereferences, so there is little difference
in terms of executed code.
- handle the default case with no virtual function first.
It's pretty pointless to log
<trace> [1653389116.6288] dhcp4 (br0): client event 7
<debug> [1653389116.6288] dhcp4 (br0): received OFFER of 192.168.121.110 from 192.168.121.1
where the obscure event #7 is only telling you that we are going
to log something. Handle logging events first.
In general, drop the "client event %d" message and make sure that all
code paths log something (useful), so we can see in the log that the
event was reached.
When we accept/decline a lease, then that only works if we are in state
GRANTED. n-dhcp4 API also requires us, to provide the exact lease, that
we were announced earlier.
As such, we need to make sure that we don't accept/decline in the wrong
state. That means, to keep track of what we are doing more carefully.
The functions _dhcp_client_accept()/_dhcp_client_decline() now take
a l3cd argument, the one that we announced earlier. And we check that it
still matches.
They are no longer used from outside, NMDhcpClient fully handles this.
Make them static and internal.
Also, decline is currently unused. It will be used soon, with ACD
support.
Previously, during decline we would clear probe->current_lease,
however leave the state at GRANTED.
That is a wrong state, and can easily lead to a crash later.
For example, on the next timeout we will end up at
n_dhcp4_client_dispatch_timer(), then current-lease gets
accessed unconditionally:
case N_DHCP4_CLIENT_PROBE_STATE_GRANTED:
if (ns_now >= probe->current_lease->lifetime) {
Instead, return to INIT state and schedule a timer. As suggested
by RFC 2131, section 3.1, 5) ([1]).
[1] https://datatracker.ietf.org/doc/html/rfc2131#section-3.1
The lease list and the probe's state are strongly related. That is
evidenced by the fact that sometimes we check the state and then
access probe->current_lease without further checking.
The code in "n-dhcp4-c-probe.c" (select_lease, accept, decline) already
changes and maintains the state, it should also maintain the lease list.
Move the code.
The caller is supposed to call accept/decline/select with the lease that
was just announced. Calling it in the wrong state or with the wrong
lease is a user error.
Return an error when called in the wrong state, so that the user
notices they did something wrong.