Run:
./contrib/scripts/nm-code-format.sh -i
./contrib/scripts/nm-code-format.sh -i
Yes, it needs to run twice because the first run doesn't yet produce the
final result.
Signed-off-by: Antonio Cardace <acardace@redhat.com>
Several macros are used to define function. They had a "_STATIC" variant,
to define the function as static.
I think those macros should not try to abstract entirely what they do.
They should not accept the function scope as argument (or have two
variants per scope). This also because it might make sense to add
additional __attribute__(()) to the function. That only works, if
the macro does not pretend to *not* define a plain function.
Instead, embrace what the function does and let the users place the
function scope as they see fit.
This also follows what is already done with
static NM_CACHED_QUARK_FCN ("autoconnect-root", autoconnect_root_quark)
curl_multi_setopt() accepts CURLMOPT_* options, not CURLOPT_*
ones. Found by GCC 10:
clients/cloud-setup/nm-http-client.c:700:38: error: implicit conversion from ‘enum <anonymous>’ to ‘CURLMoption’ [-Werror=enum-conversion]
700 | curl_multi_setopt (priv->mhandle, CURLOPT_VERBOSE, 1);
Fixes: 69f048bf0c ('cloud-setup: add tool for automatic IP configuration in cloud')
The abbreviations "ns" and "ms" seem not very clear to me. Spell them
out to nsec/msec. Also, in parts we already used the longer abbreviations,
so it wasn't consistent.
I guess, if you write portable applications, then GIOChannel makes a lot of sense.
But we know that this is on Linux. We don't need to pretend that we
cannot poll on the file descriptor directly.
Curl documents about CURLMOPT_TIMERFUNCTION:
The timer_callback will only be called when the timeout expire time is
changed.
That means, we should not cancel the timeout when it happend, but
only when the callback is called again (or during cleanup).
See-also: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
The platform is used to detect whether to skip the connectivity check right away.
It should be an optional argument, so one could avoid this pre-check.
If the interface has no carrier, no addresses or no routes there is no
point in starting a connectivity check on it because it will fail.
Moreover, doing the check on a device without routes causes the
addition of a negative entry in the ARP table for each of the
addresses associated with the connectivity check host; this can lead
to poor network performances.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/181
We no longer add these. If you use Emacs, configure it yourself.
Also, due to our "smart-tab" usage the editor anyway does a subpar
job handling our tabs. However, on the upside every user can choose
whatever tab-width he/she prefers. If "smart-tabs" are used properly
(like we do), every tab-width will work.
No manual changes, just ran commands:
F=($(git grep -l -e '-\*-'))
sed '1 { /\/\* *-\*- *[mM]ode.*\*\/$/d }' -i "${F[@]}"
sed '1,4 { /^\(#\|--\|dnl\) *-\*- [mM]ode/d }' -i "${F[@]}"
Check remaining lines with:
git grep -e '-\*-'
The ultimate purpose of this is to cleanup our files and eventually use
SPDX license identifiers. For that, first get rid of the boilerplate lines.
We will use the D-Bus connection of our NMDBusManager singleton more.
Use a macro.
- it's shorter to type and it's one distinct word.
- the name indicates what this is: the main D-Bus connection singleton.
By searching for this name we can find all users that care about using
this singleton.
Note that various components (NMFirewallManager, NMAuthManager,
NMConnectivity, etc.pp) all request their own GDBusConnection from
glib's G_BUS_TYPE_SYSTEM singleton.
In the future, let them instead use the D-Bus connection that
NMDBusManager already has.
- NMDBusManager also uses g_bus_get(G_BUS_TYPE_SYSTEM), so in practice this
is just the same GDBusConnection instance.
- if it would not be the same GDBusConnection instance, it would
be more correct/logical to use the one that NMDBusManager uses.
- NMDBusManager already aquired the GDBusConnection synchronously
and it's ready for use. On the other hand, g_bus_get()/g_bus_get_sync()
has the notion that getting the singleton cannot be done without
waiting/blocking. So at least it involves locking or even dispatching
the async reply on D-Bus.
- in "configure-and-quit=initrd" we really don't have D-Bus available.
NMDBusManager should control whether the other components use D-Bus
or not. For example, NMFirewallManager should not ask glib for a
G_BUS_TYPE_SYSTEM singleton only to later find out that it doesn't work.
So if these components would reuse NMDBusManager's GDBusConnection,
then it must have the connection also in regular "configure-and-quit=true"
mode. In this case, we are in late boot and want do connectivity
checking and talk to firewalld.
Every (failed) attempt to D-Bus activate a service results in log-messages
from dbus-daemon. It must be avoided to spam the logs that way.
Let connectivity check not only ask whether systemd-resolved is enabled
(and NetworkManager would like to push information there), but also
whether it looks like the service is actually available. That is,
either it has a name-owner or it's not blocked from starting.
The previous workaround was to configure main.systemd-resolved=no
in NetworkManager.conf. But that requires explict configuration.
We don't need to remember (and compare) all the bytes that we received.
We can just compare them right away, and remember how many good bytes
we received.
Since we only compare that the HTTP response starts with the expected
response, we need to handle the empty expected response specially
(because, every response has "" as prefix).
So now if connectivity.response is set to "" (empty) we accept:
- HTTP status code 204. We ignore and accept any extra data that we
might receive.
- HTTP status code 200 and an empty (or no) body.
If the URI does not specify a port, we always assumed "80". That is
wrong for https. Arguably, https is discouraged for connectivity checking,
but we still shouldn't break it.
Fixes: 9664f284a1
The settings of NMConnectivity can change any time, by reloading the
configuration.
When reloading the configration, we don't want to interrupt or cancel
the pending reuqests, they should just complete with the old settings with
which they started. Note, that NMDevice is smart enough, that when a
newer request completes earlier, it invalidates all older, still pending
requests.
Anyway, that means, we cannot rely on the value to stay alive. Fix that,
by adding adding a new ref-counted struct for these parameters.
Fixes: 2cec94bacc
During a config change notification, we determine a "changed" value,
to know whether things significantly changed.
Also, we want to log a warning about invalid configuration,
only when the config actually changed. Previously, when the URI was
invalid, on every reload (SIGHUP) we would log an error message,
even if the configuration did not change. There is no need to
warn multiple times about the same thing.
Keep track of the original URI in priv->uri. Whenever that changed,
we know the user reconfigured something. But also, now the URI might
be set to an invalid value. That means, we need to remember whether
the URI is valid.
Also, log a warning if we fail to parse the host and port. Already
before, such an URI was considered invalid and we would effectively
not to connectivity checking.
- use cleanup attribute except explicit free/unref.
- check that the result has the expected address family.
Arguably, it should always, unless there is a bug in systemd-resolved.
By that reasoning, we also wouldn't have to check the address length
either.
- don't use strndup() for values that are later freed by g_free().
We should always agree whether to malloc/free or g_malloc/g_free.
- don't use strcasecmp(). Always use the locale independent g_ascii_strcasecmp()
instead.
- use nm_utils_inet_ntop() instead of inet_ntop(). It's our preferred
wrapper, which as a stricter semantic (for example, it cannot fail
and it's input arguments are stricter defined).
- use nm_clear_g_free() instead of g_clear_pointer().
If the user disabled systemd-resolved, two things seem apparent:
- the user does not want us to use systemd-resolved
- NetworkManager is not pushing the DNS configuration to
systemd-resoved.
It seems to me, we should not consult systemd-resolved in that case.
While at it, replace "AF_INET" with "IPv". The connectivity check
logging line is already much too long. Save a few characters. Also,
I think the meaning of "AF_INET" is less clear (to a novice user) than
"IPv4".
This is dead ugly and works around what probably is a libcurl (7.59) bug:
It shares the resolver cache among easy handles in a multi handle when
it shouldn't:
When we create two easy handles, one for IPv4 and one for IPv6 in a
single mhandle:
curl_easy_setopt (ehandle6, CURLOPT_RESOLVE,
"fedoraproject.org:2605:bc80:3010:600:dead:beef:cafe:feda"
"fedoraproject.org:2604:1580:fe00:0:dead:beef:cafe:fed1")
curl_easy_setopt (ehandle4, CURLOPT_RESOLVE,
"fedoraproject.org:8.43.85.67"
"fedoraproject.org:152.19.134.198")
curl_multi_add_handle (mhandle, ehandle6);
curl_multi_add_handle (mhandle, ehandle4);
Both end up connecting to the same (either v4 or v6) address. None of
CURLOPT_FORBID_REUSE(3), CURLOPT_FRESH_CONNECT(3),
CURLOPT_DNS_USE_GLOBAL_CACHE(3) and CURLOPT_DNS_CACHE_TIMEOUT(3) make
any difference. Nor does CURLOPT_DNS_LOCAL_IP[46].
Nothing changes practically, as the NMDevice still starts this with
AF_UNSPEC. That is going to change in the following commit.
The ugly part:
priv->concheck_x[0] in few places. I believe we shouldn't be using union
aliasing here, and instead of indexing the v4/v6 arrays by a boolean it
should be an enum. I'm not fixing it here, but I eventually plan to if
this gets an ACK.
This allows us to use the correct DNS server for the particular interface
independent of what the system resolver is configured to use.
The ugly part:
Unfortunately, it is not all that easy. The libc's libresolv.so API does
not provide means for influencing neither interface nor name servers used
for DNS resolving.
Curl can also be compiled with c-ares resolver backend that does provide
the necessary functionality, but it requires and extra library and the
Linux distributions don't seem to enable it. (Fedora doesn't, which is a
good sign we don't have an option of relying on it.)
systemd-resolved does provide everything we need. If we take care to
keep its congfiguration up to date, we can use it to do the resolving on
a particular interface with that interface's DNS configuration. Great!
There's one more problem: Curl doesn't provide callbacks for resolving
host names. It doesn't, however, allow us to pass in the pre-resolved
hostnames in form of an CURLOPT_RESOLVE(3) option. This means we have to
parse the host name out of the URL ourselves. Fair enough I guess...
libcurl does not allow removing easy-handles from within a curl
callback.
That was already partly avoided for one handle alone. That is, when
a handle completed inside a libcurl callback, it would only invoke the
callback, but not yet delete it. However, that is not enough, because
from within a callback another handle can be cancelled, leading to
the removal of (the other) handle and a crash:
==24572== at 0x40319AB: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==24572== by 0x52DDAE5: Curl_close (url.c:392)
==24572== by 0x52EC02C: curl_easy_cleanup (easy.c:825)
==24572== by 0x5FDCD2: cb_data_free (nm-connectivity.c:215)
==24572== by 0x5FF6DE: nm_connectivity_check_cancel (nm-connectivity.c:585)
==24572== by 0x55F7F9: concheck_handle_complete (nm-device.c:2601)
==24572== by 0x574C12: concheck_cb (nm-device.c:2725)
==24572== by 0x5FD887: cb_data_invoke_callback (nm-connectivity.c:167)
==24572== by 0x5FD959: easy_header_cb (nm-connectivity.c:435)
==24572== by 0x52D73CB: chop_write (sendf.c:612)
==24572== by 0x52D73CB: Curl_client_write (sendf.c:668)
==24572== by 0x52D54ED: Curl_http_readwrite_headers (http.c:3904)
==24572== by 0x52E9EA7: readwrite_data (transfer.c:548)
==24572== by 0x52E9EA7: Curl_readwrite (transfer.c:1161)
==24572== by 0x52F4193: multi_runsingle (multi.c:1915)
==24572== by 0x52F5531: multi_socket (multi.c:2607)
==24572== by 0x52F5804: curl_multi_socket_action (multi.c:2771)
Fix that, by never invoking any callbacks when we are inside a libcurl
callback. Instead, the handle is marked for completion and queued. Later,
we complete all queue handles separately.
While at it, drop the @error argument from NMConnectivityCheckCallback.
It was only used to signal cancellation. Let's instead signal that via
status NM_CONNECTIVITY_CANCELLED.
https://bugzilla.gnome.org/show_bug.cgi?id=797136https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1792745https://bugzilla.opensuse.org/show_bug.cgi?id=1107197https://github.com/NetworkManager/NetworkManager/pull/207
Fixes: d8a31794c8
It seems, curl_multi_socket_action() can fail with
connectivity check failed: 4
where "4" means CURLM_INTERNAL_ERROR.
When that happens, it also seems that the file descriptor may still have data
to read, so the glib IO callback _con_curl_socketevent_cb() will be called in
an endless loop. Thereby, keeping the CPU busy with doing nothing (useful).
Workaround by disabling polling on the file descriptor when something
goes wrong.
Note that optimally we would cancel the affected connectivity-check
right away. However, due to the design of libcurl's API, from within
_con_curl_socketevent_cb() we don't know which connectivity-checks
are affected by a failure on this file descriptor. So, all we can do
is avoid polling on the (possibly) broken file descriptor. Note that
we anyway always schedule a timeout of last resort for each check. Even
if something goes very wrong, we will fail the check within 15 seconds.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903996
On non-Windows, libcurl's "curl_socket_t" type is just a typedef for
int. We rely on that, because we use it as file descriptor.
Add a compile time check to ensure that.
Since this is "C" there are not namespaces and libraries commonly choose
a particular name prefix for their symbols.
In case of libcurl, that is "curl_".
We should avoid using the same name prefix, and choose something distinct.
We commonly don't use the glib typedefs for char/short/int/long,
but their C types directly.
$ git grep '\<g\(char\|short\|int\|long\|float\|double\)\>' | wc -l
587
$ git grep '\<\(char\|short\|int\|long\|float\|double\)\>' | wc -l
21114
One could argue that using the glib typedefs is preferable in
public API (of our glib based libnm library) or where it clearly
is related to glib, like during
g_object_set (obj, PROPERTY, (gint) value, NULL);
However, that argument does not seem strong, because in practice we don't
follow that argument today, and seldomly use the glib typedefs.
Also, the style guide for this would be hard to formalize, because
"using them where clearly related to a glib" is a very loose suggestion.
Also note that glib typedefs will always just be typedefs of the
underlying C types. There is no danger of glib changing the meaning
of these typedefs (because that would be a major API break of glib).
A simple style guide is instead: don't use these typedefs.
No manual actions, I only ran the bash script:
FILES=($(git ls-files '*.[hc]'))
sed -i \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>\( [^ ]\)/\1\2/g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\> /\1 /g' \
-e 's/\<g\(char\|short\|int\|long\|float\|double\)\>/\1/g' \
"${FILES[@]}"
It might happen, that connectivitiy is lost only for a moment and
returns soon after. Based on that assumption, when we loose connectivity
we want to have a probe interval where we check for returning
connectivity more frequently.
For that, we handle tracking of the timeouts per-device.
The intervall shall start with 1 seconds, and double the interval time until
the full interval is reached. Actually, due to the implementation, it's unlikely
that we already perform the second check 1 second later. That is because commonly
the first check returns before the one second timeout is reached and bumps the
interval to 2 seconds right away.
Also, we go through extra lengths so that manual connectivity check
delay the periodic checks. By being more smart about that, we can reduce
the number of connectivity checks, but still keeping the promise to
check at least within the requested interval.
The complexity of book keeping the timeouts is remarkable. But I think
it is worth the effort and we should try hard to
- have a connectivity state as accurate as possible. Clearly,
connectivity checking means that we probing, so being more intelligent
about timeout and backoff timers can result in a better connectivity
state. The connectivity state is important because we use it for
the default-route penaly and the GUI indicates bad connectivity.
- be intelligent about avoiding redundant connectivity checks. While
we want to check often to get an accurate connectivity state, we
also want to minimize the number of HTTP requests, in case the
connectivity is established and suppossedly stable.
Also, perform connectivity checks in every state of the device.
Even if a device is disconnected, it still might have connectivity,
for example if the user externally adds an IP address on an unmanaged
device.
https://bugzilla.gnome.org/show_bug.cgi?id=792240
The main issue is that `nmcli networking connectivity check` uses
nm_client_check_connectivity(), which has a timeout of 25 seconds.
Using a timeout of 30 seconds server side, means that if the requests
don't complete in time, the client side will time out and abort
with a failure. That is not right.
Fix that by using a shorter timeout server side. 20 seconds is still
plenty for a small HTTP request. If the network takes longer than that,
it's fair to call that LIMITED connectivity.