The general idea is that when we have entries tracked by the
route-manager, that we can mark them all as dirty. Then, calling the
"track" function will reset the dirty flag. Finally, there is a method
to delete all dirty entries.
As we can lookup an entry with O(1) (using dictionaries), we can
sync the list of tracked objects with O(n). We just need to track
all the ones we care about, and then delete those that were not touched
(that is, are still dirty).
Previously, we had to explicitly mark all entries as dirty. We can do
better. Just let nmp_route_manager_untrack_all() mark the survivors as
dirty right away. This way, we can save iterating the list once.
It also makes sense because the only purpose of the dirty flag is to
aid this prune mechanism with track/untrack-all. So, untrack-all can
just help out, and leave the remaining entries dirty, so that the next
track does the right thing.
Routes of type blackhole, unreachable, prohibit don't have an
ifindex/device. They are thus in many ways similar to routing rules,
as they are global. We need a mediator to keep track which routes
to configure.
This will be very similar to what NMPRulesManager already does for
routing rules. Rename the API, so that it also can be used for routes.
Renaming the file will be done next, so that git's rename detection
doesn't get too confused.
So far, certain NMObject types could not have an ifindex of zero. Hence,
nmp_lookup_init_object() took such an ifindex to mean lookup all objects
of that type.
Soon, we will support blackhole/unreachable/prohibit route types, which
have their ifindex set to zero. It is still useful to lookup those routes
types via nmp_lookup_init_object().
Change behaviour how to interpret the ifindex. Note that this also
affects various callers of nmp_lookup_init_object(). If somebody was
relying on the previous behavior, it would need fixing.
The signal parameters are G_TYPE_UINT. We should not assume that our
enums are a compatible integer type.
In practice of course they always were. Let's just be clear about when
we have an integer and when we have an enum.
- use slice allocator
- use designated initializers
- first determine all parameters in killswitch_new() before
setting ks.
- drop unnecessary memset().
Naming is important. Especially when we have a 8k LOC monster, that
manages everything.
Rename things related to Rfkill to give them a common prefix.
Also, move code around so it's beside each other.
RadioState contained both the mutable user-data and meta-data about the
radio state. The latter is immutable. The parts that cannot change, should
be separate from those that can change. Move them to a separate array.
Of course, we only have on NMManager instance, so having these fields embedded
in NMManagerPrivate did not waste memory. This change is done, because immutable
fields should be const. In this case, they are now const global data, which
is even protected by the linker from accidental mutation.
Names in header files should have an "nm" prefix. We do that pretty
consistently. Fix the offenders RfKillState and RfKillType.
Also, rename the RfKillState enums to follow the type name. For example,
NM_RFKILL_STATE_SOFT_BLOCKED instead of RFKILL_SOFT_BLOCKED.
Also, when we camel-case a typedef (NMRfKillState) we would want that
the lower-case names use underscore between the words. So it should be
`nm_rf_kill_state_to_string()`. But that looks awkward. So the right solution
here is to also rename "RfKill" to "Rfkill". That make is consistent
with the spelling of the existing `NMRfkillManager` type and the
`nm-rfkill-manager.h` file.
- use "const RadioState" where possible
- use "bool" bitfields in RadioState (boolean flags in structs
should be `bool` types for consistency) and order fields by
alignment.
- break lines for variable declaration in manager_rfkill_update_one_type().
- return (and nm_assert()) from update_rstate_from_rfkill(). By
not adding a default case, compiler would warn if we forget to
handle an enum value. We can easily do that, by just returning,
and let the "default" case be handled by nm_assert_not_reached()
-- which unlike g_warn_if_reached() compiles to nothing in
release build.
- add nm_assert() that `priv->radio_states[rtype]` is not out of range.
- use designated initializers for priv->radio_states[].
GObject Properties are flexible and powerful. In practice, NMDevicePrivate.rfkill_type
was only set once via the construct-only property NM_DEVICE_RFKILL_TYPE.
Which in turn was always set to a well-known value, only depending on the device
type.
We don't need this flexibility. The rfkill-type only depends on the
device type and doesn't change. Replace the property by a field in
NMDeviceClass.
For one, construct properties have an overhead, that the property setter is
called whenever we construct a NMDevice. But the real reason for this
change, is that a property give a notion as this could change during the
lifetime of a NMDevice (which it in fact did not, being construct-only).
Or that the type depends on something more complex, when instead it only
depends on the device type. A non-mutated class property is simpler,
because it's clear that it does not depend on the device instance,
only on the type/class.
Also, `git grep -w rfkill_type` now nicely shows the (few) references to
this variable and its easier to understand.
We don't run glib-mkenums for certain sources like "core" and
"libnm-glib-aux".
These annotations have no effect. Drop them.
They also mess with the automated formatting.
We made the choice, that NMPlatformIPRoute does not contain the actual
route table, instead it contains a "remapped" number: table_coerced.
That remapping done, so that the default (which we want semantically to
be 254, RT_TABLE_MAIN) is numerical zero so that struct initialization
doesn't you require to explicitly set the default.
Hence, we must always distinguish whether we have the real table number
or the "table_coerced", and you must convert back and forth between the
two.
Now, the parameter of nm_l3_config_data_merge() are real table numbers
(as also indicated by their name not having the term "coerced"). So
usually they are set to actually 254.
When we set the field of NMPlatformIPRoute, we must coerce it. This was
wrong, and we would see wrong table numbers in the log:
l3cfg[17b98e59a477b0f4,ifindex=2]: obj-state: track: [2a32eca99405767e, ip4-route, type unicast table 0 0.0.0.0/0 via ...
Fixes: b4aa35e72d ('l3cfg: extend nm_l3cfg_add_config() to accept default route table and metric')
The metered status can depend on the DHCP lease, as we accept the
ANDROID_METERED vendor option. That means, on a DHCP update we need
to re-evaluate the metered flag.
This fixes a potential race, where IPv6 might succeed first and
activation completes (with GUESS_NO metered flag). A subsequent
DHCPv4 update requires to re-evaluate that decision.
Fixes-test: @connection_metered_guess_yes
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1080
When the NM_UNMANAGED_PLATFORM_INIT flag is cleared last in
device_link_changed(), a recheck-assume is scheduled and then the
device goes immediately to UNAVAILABLE. During the state transition,
addresses and routes are removed from the interface. Then,
recheck-assume finds that the device can be assumed but it's too late
since the device was already deconfigured.
This is a problem as the whole point of assuming a device is to
activate a connection while leaving the device untouched.
In the NMCI "dracut_NM_vlan_over_bridge and dracut_NM_vlan_over_bond"
test, NM in real root tries to assume a vlan device that was activated
in initrd. When the interface gets deconfigured in UNAVAILABLE, the
connection to the NFS server breaks and the rootfs becomes
inaccessible.
The fix to this problem is to delay state transitions in
device_link_changed() to a idle handler, so that recheck-assume can
run before.
Fixes-test: @dracut_NM_vlan_over_bridge
Fixes-test: @dracut_NM_vlan_over_bond
https://bugzilla.redhat.com/show_bug.cgi?id=2047302
nm_device_set_unmanaged_by_user_settings() does nothing when the
device is unmanaged by platform-init. Remove the if branch to make
this more explicit.
The ACD state handling is unfortunately very complicated. That is, because
we obviously need to track state about how ACD is going (the acd_data, and
in particular NML3AcdAddrState). Then there are various things that can
happen, which are the AcdStateChangeMode enums. All these state-changes
come together in one function: _l3_acd_data_state_change(), which is
therefore complicated (I don't think that it would become simpler by
spreading this code out to different functions, on the contrary).
Anyway.
So, what happens when we need to reset the n-acd instance? For example,
because the MAC address of the link changed or some error. I guess, we
need to restart probing.
Previously, I think this was not handled properly. We already tried to
fix this several times, the last was commit b331606386 ('l3cfg: on
n-acd instance-reset clear also ready ACD state'). There is still an
issue ([1]).
The bug [1] is, that we are in state NM_L3_ACD_ADDR_STATE_READY, during
ACD_STATE_CHANGE_MODE_TIMEOUT event. That leads to an assertion
failure.
#5 0x00007f23be74698f in g_assertion_message_expr (domain=0x5629aca70359 "nm", file=0x5629aca62aab "src/core/nm-l3cfg.c", line=2395, func=0x5629acb26b30 <__func__.72.lto_priv.4> "_l3_acd_data_state_change", expr=<optimized out>) at ../glib/gtestutils.c:3091
#6 0x00005629ac937e46 in _l3_acd_data_state_change (self=0x5629add69790, acd_data=0x5629add8d520, state_change_mode=ACD_STATE_CHANGE_MODE_TIMEOUT, sender_addr=0x0, p_now_msec=0x7ffded506460) at src/core/nm-l3cfg.c:2395
#7 0x00005629ac939f4d in _l3_acd_data_timeout_cb (user_data=user_data@entry=0x5629add8d520) at src/core/nm-l3cfg.c:1933
#8 0x00007f23be71c5a1 in g_timeout_dispatch (source=0x5629addd7a80, callback=0x5629ac939ee0 <_l3_acd_data_timeout_cb>, user_data=0x5629add8d520) at ../glib/gmain.c:4889
#9 0x00007f23be71bd4f in g_main_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:3337
#10 g_main_context_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:4055
That can only happen, (I think) when we scheduled the timeout
during an earlier ACD_STATE_CHANGE_MODE_INSTANCE_RESET event. Meaning,
we need to handle instance-reset better.
Instead, during instance-reset, switch always back to state PROBING, and
let the timeout figure it out.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2047788
The first point is that ACD timeout is strongly tied to the current state. That
is (somewhat) obvious, because _l3_acd_data_state_set_full() will clear any pending
timeout. So you can only schedule a timeout *after* setting the state,
and setting the next state, will clear the timeout.
Likewise, note that l3_acd_data_state_change() for the event
ACD_STATE_CHANGE_MODE_TIMEOUT asserts that it is only called in the few
states where that is expected. See rhbz#2047788 where that assertion
gets hit.
The first point means that we must only schedule a timer when we are
also in a state that supports that. Add an assertion for that at the
point when scheduling the timeout. The assert at this point is useful,
because it catches the moment when we do the wrong thing (instead of
getting the assertion later during the timeout, when we no longer know
where the error happened).
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2047788
Now that higher priorities numbers really mean more important,
it seems that the VPN configuration should be rather important.
Bump the number, also in relation to NMDevice's L3ConfigDataType.
It might not matter too much, because usually the VPN tunnel device does
not have NDevice to add other l3cds and those from VPN might be alone.
Except, maybe with routing VPN (libreswan) that is different. Dunno.
NM_L3CFG_CONFIG_PRIORITY_IPV6LL and L3_CONFIG_DATA_TYPE_LL_6 are
somewhat related. They probably should be numerically identical.
Now, L3_CONFIG_DATA_TYPE_LL_6 cannot be zero (because
L3_CONFIG_DATA_TYPE_LL_4 is zero and L3ConfigDataType values must be
distinct). Therefore, also bump NM_L3CFG_CONFIG_PRIORITY_IPV6LL to 1.
Of course, NM_L3CFG_CONFIG_PRIORITY_IPV6LL is not obviously more important
than NM_L3CFG_CONFIG_PRIORITY_IPV4LL. But to be consistent with
L3_CONFIG_DATA_TYPE_LL_6 is probably more important here.
There was a disagreement whether higher priority numbers mean more/less
important. NMDevice and callers of nm_l3cfg_add_config() assumed that
higher numbers are more important, while _l3_config_datas_get_sorted_cmp()
did the inverse.
There is no obvious right or wrong. We only need to agree. It seems
slightly more sensible to keep the caller's interpretation.
This commit does not seem correct. The enum was moved with the declared
intent to make manual IP configuration preferred. But the code comment
in L3ConfigDataType states that higher numbers are more important.
Also, all the other values are intentionally ordered so that more
important method have higher numbers. For example, LL_4 < DHCP_4 < SHARED_4 <
DEV_4 (in increasing priority). While it's often not clear whether to
prefer one method over the other, or what the actual effect of (LL_4 < DHCP_4)
is, the numbers were chosen intentionally.
Only moving MANUALIP first, counters the relative order of the other values.
If there is the problem, that the code comment is wrong (and lower
numbers mean more important), then we would have to reverse all
enum values.
The real problem is that NML3Cfg's _l3_config_datas_get_sorted_cmp()
has the wrong order/understanding. So the real fix is there and will
be done next. That is, unless we would agree that _l3_config_datas_get_sorted_cmp()
is in the right, and prefer lower numbers -- in which case all values had to be
reversed.
This reverts commit af1903fe3f.
Works for the internal DHCP client only as sd-dhcp does the option
parsing for us.
The option 56 is not understood by dhclient so we would need to parse it
ourselves. Let's not do it for now, as the RFC seems to written in a
somewhat poor taste.
https://bugzilla.redhat.com/show_bug.cgi?id=2047415#c2
We have global data NMDhcpOption that describes the DHCP meta data.
There is a consistency check with NM_MORE_ASSERTS.
Improve the error message when the meta data is inconsistent to
help finding the bug.
When NetworkManager shuts down, it is not done properly. We cannot
ensure that all pending async operations are cancelled and therefore
there are possible device leaks. This means that not all the devices
will be disposed and finalized because a different component is handling
a reference to it.
Since the l3cfg refactor, this is affecting to ovs ports. When shutting
down sometimes there are ovs-interface or ovs-port devices that are not
being cleared correctly. When starting NetworkManager back, these
devices are not going to be created again because they already exist and
the existing compatible connections will be instruct to use the device.
But currently, ovsdb is only unrealizing a device after removal if the
state is UNMANAGED. This is wrong, because it will left an inconsistent
state in NetworkManager and the ovs-port/ovs-interface connection won't
be activated.
The interfaces removed by ovsdb must be unrealized if they are on
UNAVAILABLE state.
https://bugzilla.redhat.com/show_bug.cgi?id=2029937
The properties NM_DEVICE_MODEM_CAPABILITIES and
NM_DEVICE_MODEM_CURRENT_CAPABILITIES are uint, with a range from zero to
G_MAXUINT32.
nm_modem_get_capabilities() passes the "untrusted" flags from
ModemManager. Ensure that they fit into 32 bit, and don't cause an
assertion failure due to the range check.
Yes, in practice, on all platforms where we build (known to me), guint
is always the same as guint32. So this has little effect.
Also, cast to guint. Previously, this was just a C enum
NMDeviceModemCapabilities, which theoretically has implementation
defined sizes. Yes, in practice, they too are of size guint, and this
was not an actual bug. But be specific about converting between
different integer types.
Macros should (where possible and sensible) behave function-like.
That means for example, that they evaluate arguments only once, and
use parentheses around code that expands, so that unexpected uses
work correctly. The parentheses was missing.
Instead, just use the NM_FLAGS_ANY() macro which gets this right.