In systemd, it's common that a D-Bus activatable service references
`SystemdService=dbus-$BUSNAME.service` instead the real service name.
Together with an `[Install].Alias=dbus-$BUSNAME.service` directive,
this allowed to enable/disable D-Bus activation without uninstalling the
service altogether ([1]).
Currently, when we install the RPM then `nm-priv-helper.service` is not
enabled, consequently the alias is not created, and D-Bus activation
does not work. I guess, we should fix that by enabling the service in
the %post section or via a systemd preset? Dunno.
Anyway. It seems that nm-priv-helper.service is more of an
implementation detail of NetworkManager. It makes not sense for the user
to interact directly, or to enable/disable D-Bus activation (because
that is how it works).
So, drop the alias.
See-also: [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Systemd/#activation_dbus
"-" is not allowed as D-Bus path and interface name, and discouraged as
bus name. This cause nm-priv-helper to crash, because GDBus asserts the
the object path is valid.
Replace the '-' with '_'. This way, it's consistent with
"nm_dispatcher".
Fixes: d68ab6b8f0 ('nm-sudo: rename to nm-priv-helper')
The signal parameters are G_TYPE_UINT. We should not assume that our
enums are a compatible integer type.
In practice of course they always were. Let's just be clear about when
we have an integer and when we have an enum.
- use slice allocator
- use designated initializers
- first determine all parameters in killswitch_new() before
setting ks.
- drop unnecessary memset().
Naming is important. Especially when we have a 8k LOC monster, that
manages everything.
Rename things related to Rfkill to give them a common prefix.
Also, move code around so it's beside each other.
RadioState contained both the mutable user-data and meta-data about the
radio state. The latter is immutable. The parts that cannot change, should
be separate from those that can change. Move them to a separate array.
Of course, we only have on NMManager instance, so having these fields embedded
in NMManagerPrivate did not waste memory. This change is done, because immutable
fields should be const. In this case, they are now const global data, which
is even protected by the linker from accidental mutation.
Names in header files should have an "nm" prefix. We do that pretty
consistently. Fix the offenders RfKillState and RfKillType.
Also, rename the RfKillState enums to follow the type name. For example,
NM_RFKILL_STATE_SOFT_BLOCKED instead of RFKILL_SOFT_BLOCKED.
Also, when we camel-case a typedef (NMRfKillState) we would want that
the lower-case names use underscore between the words. So it should be
`nm_rf_kill_state_to_string()`. But that looks awkward. So the right solution
here is to also rename "RfKill" to "Rfkill". That make is consistent
with the spelling of the existing `NMRfkillManager` type and the
`nm-rfkill-manager.h` file.
- use "const RadioState" where possible
- use "bool" bitfields in RadioState (boolean flags in structs
should be `bool` types for consistency) and order fields by
alignment.
- break lines for variable declaration in manager_rfkill_update_one_type().
- return (and nm_assert()) from update_rstate_from_rfkill(). By
not adding a default case, compiler would warn if we forget to
handle an enum value. We can easily do that, by just returning,
and let the "default" case be handled by nm_assert_not_reached()
-- which unlike g_warn_if_reached() compiles to nothing in
release build.
- add nm_assert() that `priv->radio_states[rtype]` is not out of range.
- use designated initializers for priv->radio_states[].
GObject Properties are flexible and powerful. In practice, NMDevicePrivate.rfkill_type
was only set once via the construct-only property NM_DEVICE_RFKILL_TYPE.
Which in turn was always set to a well-known value, only depending on the device
type.
We don't need this flexibility. The rfkill-type only depends on the
device type and doesn't change. Replace the property by a field in
NMDeviceClass.
For one, construct properties have an overhead, that the property setter is
called whenever we construct a NMDevice. But the real reason for this
change, is that a property give a notion as this could change during the
lifetime of a NMDevice (which it in fact did not, being construct-only).
Or that the type depends on something more complex, when instead it only
depends on the device type. A non-mutated class property is simpler,
because it's clear that it does not depend on the device instance,
only on the type/class.
Also, `git grep -w rfkill_type` now nicely shows the (few) references to
this variable and its easier to understand.
The problem was that glib-mkenums requires all enum values on a separate
line. And clang-format would put all on the same line, unless the last
value has a trailing comma. Which is the better solution here.
We don't run glib-mkenums for certain sources like "core" and
"libnm-glib-aux".
These annotations have no effect. Drop them.
They also mess with the automated formatting.
We made the choice, that NMPlatformIPRoute does not contain the actual
route table, instead it contains a "remapped" number: table_coerced.
That remapping done, so that the default (which we want semantically to
be 254, RT_TABLE_MAIN) is numerical zero so that struct initialization
doesn't you require to explicitly set the default.
Hence, we must always distinguish whether we have the real table number
or the "table_coerced", and you must convert back and forth between the
two.
Now, the parameter of nm_l3_config_data_merge() are real table numbers
(as also indicated by their name not having the term "coerced"). So
usually they are set to actually 254.
When we set the field of NMPlatformIPRoute, we must coerce it. This was
wrong, and we would see wrong table numbers in the log:
l3cfg[17b98e59a477b0f4,ifindex=2]: obj-state: track: [2a32eca99405767e, ip4-route, type unicast table 0 0.0.0.0/0 via ...
Fixes: b4aa35e72d ('l3cfg: extend nm_l3cfg_add_config() to accept default route table and metric')
gcc-4.8.5-44.el7.x86_64 warns:
In file included from ./src/libnm-systemd-shared/src/basic/hashmap.h:10:0,
from ./src/libnm-systemd-shared/src/shared/dns-domain.h:10,
from src/libnm-systemd-shared/nm-sd-utils-shared.c:12:
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u64':
./src/libnm-systemd-shared/src/basic/util.h:30:20: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2ULL(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2ULL(x), NONCONST_LOG2ULL(x))
^
./src/libnm-systemd-shared/src/basic/util.h:34:16: note: in expansion of macro 'LOG2ULL'
return LOG2ULL(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2i':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:56:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:60:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
gcc-4.8.5-44.el7.x86_64 warns:
In file included from ./src/libnm-systemd-shared/src/basic/hashmap.h:10:0,
from ./src/libnm-systemd-shared/src/shared/dns-domain.h:10,
from src/libnm-systemd-shared/nm-sd-utils-shared.c:12:
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u64':
./src/libnm-systemd-shared/src/basic/util.h:30:20: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2ULL(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2ULL(x), NONCONST_LOG2ULL(x))
^
./src/libnm-systemd-shared/src/basic/util.h:34:16: note: in expansion of macro 'LOG2ULL'
return LOG2ULL(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2i':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:56:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:60:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
The metered status can depend on the DHCP lease, as we accept the
ANDROID_METERED vendor option. That means, on a DHCP update we need
to re-evaluate the metered flag.
This fixes a potential race, where IPv6 might succeed first and
activation completes (with GUESS_NO metered flag). A subsequent
DHCPv4 update requires to re-evaluate that decision.
Fixes-test: @connection_metered_guess_yes
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1080
We always call nl_recv() in a loop. If it would be necessary to clear
the variable, then it would need to happen inside the loop. But it's
not necessary.
Instead of allocating a receive buffer for each nl_recv() call, re-use a
pre-allocated buffer.
The buffer is part of NMPlatform and kept around. As we would not have more
than one NMPlatform instance per netns, we waste a limited amount of
memory.
The buffer gets initialized with 32k, which should be large enough for
any rtnetlink message that we might receive. As before, if the buffer
would be too small, then the first large message would be lost (as we don't
peek). But then we would allocate a larger buffer, resync the platform cache
and recover.
Add parameter to accept a pre-allocated buffer for nl_recv(). In
practice, rtnetlink messages are not larger than 32k, so we can always
pre-allocate it and avoid the need to heap allocate it.
In the past, nl_recv() was libnl3 code and thus used malloc()/realloc() to
allocate the buffer. That meant, we should free the buffer with libc's free()
function instead of glib's g_free(). That is what nm_auto_free is for.
Nowadays, nl_recv() is forked and glib-ified, and uses the glib wrappers to
allocate the buffer. Thus the buffer should also be freed with g_free()
(and gs_free).
In practice there is no difference, because glib's allocation directly
uses libc's malloc/free. This is purely a matter of style.
When the NM_UNMANAGED_PLATFORM_INIT flag is cleared last in
device_link_changed(), a recheck-assume is scheduled and then the
device goes immediately to UNAVAILABLE. During the state transition,
addresses and routes are removed from the interface. Then,
recheck-assume finds that the device can be assumed but it's too late
since the device was already deconfigured.
This is a problem as the whole point of assuming a device is to
activate a connection while leaving the device untouched.
In the NMCI "dracut_NM_vlan_over_bridge and dracut_NM_vlan_over_bond"
test, NM in real root tries to assume a vlan device that was activated
in initrd. When the interface gets deconfigured in UNAVAILABLE, the
connection to the NFS server breaks and the rootfs becomes
inaccessible.
The fix to this problem is to delay state transitions in
device_link_changed() to a idle handler, so that recheck-assume can
run before.
Fixes-test: @dracut_NM_vlan_over_bridge
Fixes-test: @dracut_NM_vlan_over_bond
https://bugzilla.redhat.com/show_bug.cgi?id=2047302
nm_device_set_unmanaged_by_user_settings() does nothing when the
device is unmanaged by platform-init. Remove the if branch to make
this more explicit.
The ACD state handling is unfortunately very complicated. That is, because
we obviously need to track state about how ACD is going (the acd_data, and
in particular NML3AcdAddrState). Then there are various things that can
happen, which are the AcdStateChangeMode enums. All these state-changes
come together in one function: _l3_acd_data_state_change(), which is
therefore complicated (I don't think that it would become simpler by
spreading this code out to different functions, on the contrary).
Anyway.
So, what happens when we need to reset the n-acd instance? For example,
because the MAC address of the link changed or some error. I guess, we
need to restart probing.
Previously, I think this was not handled properly. We already tried to
fix this several times, the last was commit b331606386 ('l3cfg: on
n-acd instance-reset clear also ready ACD state'). There is still an
issue ([1]).
The bug [1] is, that we are in state NM_L3_ACD_ADDR_STATE_READY, during
ACD_STATE_CHANGE_MODE_TIMEOUT event. That leads to an assertion
failure.
#5 0x00007f23be74698f in g_assertion_message_expr (domain=0x5629aca70359 "nm", file=0x5629aca62aab "src/core/nm-l3cfg.c", line=2395, func=0x5629acb26b30 <__func__.72.lto_priv.4> "_l3_acd_data_state_change", expr=<optimized out>) at ../glib/gtestutils.c:3091
#6 0x00005629ac937e46 in _l3_acd_data_state_change (self=0x5629add69790, acd_data=0x5629add8d520, state_change_mode=ACD_STATE_CHANGE_MODE_TIMEOUT, sender_addr=0x0, p_now_msec=0x7ffded506460) at src/core/nm-l3cfg.c:2395
#7 0x00005629ac939f4d in _l3_acd_data_timeout_cb (user_data=user_data@entry=0x5629add8d520) at src/core/nm-l3cfg.c:1933
#8 0x00007f23be71c5a1 in g_timeout_dispatch (source=0x5629addd7a80, callback=0x5629ac939ee0 <_l3_acd_data_timeout_cb>, user_data=0x5629add8d520) at ../glib/gmain.c:4889
#9 0x00007f23be71bd4f in g_main_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:3337
#10 g_main_context_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:4055
That can only happen, (I think) when we scheduled the timeout
during an earlier ACD_STATE_CHANGE_MODE_INSTANCE_RESET event. Meaning,
we need to handle instance-reset better.
Instead, during instance-reset, switch always back to state PROBING, and
let the timeout figure it out.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2047788
The first point is that ACD timeout is strongly tied to the current state. That
is (somewhat) obvious, because _l3_acd_data_state_set_full() will clear any pending
timeout. So you can only schedule a timeout *after* setting the state,
and setting the next state, will clear the timeout.
Likewise, note that l3_acd_data_state_change() for the event
ACD_STATE_CHANGE_MODE_TIMEOUT asserts that it is only called in the few
states where that is expected. See rhbz#2047788 where that assertion
gets hit.
The first point means that we must only schedule a timer when we are
also in a state that supports that. Add an assertion for that at the
point when scheduling the timeout. The assert at this point is useful,
because it catches the moment when we do the wrong thing (instead of
getting the assertion later during the timeout, when we no longer know
where the error happened).
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2047788