The allocated buffes are not known to be written. It is unnecessary to
clear them.
If the user writes sensitive data to those locations, without using
the NMStrBuf API, then it is up to the user to bzero the memory
accordingly.
When we have a buffer that we want to grow exponentially with
nm_utils_get_next_realloc_size(), then there are certain buffer
sizes that are better suited.
For example, if you have an empty NMStrBuf (len == 0), and you
want to allocate roughly one kilobyte, then 1024 is a bad choice,
because nm_utils_get_next_realloc_size() will give you 2024 bytes.
NM_UTILS_GET_NEXT_REALLOC_SIZE_1000 might be better in this case.
NM_MORE_ASSERTS 0 means that more assertions are disabled.
NM_MORE_ASSERT_ONCE() should never be triggered when more
assertions are disabled altogether. It is thus not allowed
to called "if (NM_MORE_ASSERT_ONCE (0))", because that code
would always be enabled.
If you have a LIST with 7 elements, and you lookup a value that
is not in the (sorted) list and would lie before the first element,
the binary search will dig down to imin=0, imid=0, imax=0 and
strcmp will give positive cmp value (indicating that the searched
value is sorted before).
Then, we would do "imax = imid - 1;", which wrapped to G_MAXUINT,
and the following "if (G_UNLIKELY (imin > imax))" would not hit,
resulting in an out of bound access next.
The easy fix is to not used unsigned integers.
The binary search was adapted from nm_utils_array_find_binary_search()
and nm_utils_ptrarray_find_binary_search(), which already used signed
integers to avoid this problem.
Fixes: 17d9b852c8 ('shared: explicitly implement binary search in NM_UTILS_STRING_TABLE_LOOKUP_DEFINE*()')
Add flags to explicitly escape leading or trailing spaces. Note
that we were already escaping trailing spaces.
This will be used later when supporting backslash escapes for
option parameters for nmcli (vpn.data).
In the next commit, GString will be replaced by NMStrBuf. Then, we will
pre-allocate a string buffer with 16 bytes, and measure the performance
difference. To have it comparable, adjust the pre-allocation size also
with GString.
nm_utils_buf_utf8safe_unescape() is almost the same as g_strcompress(),
with the only difference is that if the string contains NUL escapes "\000",
it will be handled correctly.
In other words, g_strcompress() and nm_utils_str_utf8safe_unescape() can only
unescape values, that contain no NUL escapes. That's why we added our
own binary unescape function.
As we already have our g_strcompress() variant, use it. It just gives it more
testing and usage. Also, we have full control over it's behavior. For example,
g_strcompress() issues a g_warning() when encountering a trailing '\\'. I
think this makes it unsuitable to unescape untrusted data. Either the function
should fail, or just make the best of it. Currently, our implementation
does the latter.
Our own implementation of a string buffer like GString.
Advantages (in decreasing relevance):
- Since we are in control, we can easily let it nm_explicit_bzero()
the memory. The regular GString API cannot be used in such a case.
While nm_explicit_bzero() may or may not be of questionable benefit,
the problem is that if the underlying API counteracts the aim of
clearing memory, it gets impossible. As API like NMStrBuf supports
it, clearing memory is a easy as enable the right flag.
This would for example be useful for example when we read passwords
from a file or file descriptor (e.g. try_spawn_vpn_auth_helper()).
- We have API like
nmp_object_to_string (const NMPObject *obj,
NMPObjectToStringMode to_string_mode,
char *buf,
gsize buf_size);
which accept a fixed size output buffer. This has the problem of
how choosing the right sized buffer. With NMStrBuf such API could
be instead
nmp_object_to_string (const NMPObject *obj,
NMPObjectToStringMode to_string_mode,
NMStrBuf *buf);
which can automatically grow (using heap allocation). It would be
easy to extend NMStrBuf to use a fixed buffer or limiting the
maximum string length. The point is, that the to-string API wouldn't
have to change. Depending on the NMStrBuf passed in, you can fill
an unbounded heap allocated string, a heap allocated string up to
a fixed length, or a static string of fixed length. NMStrBuf currently
only implements the unbounded heap allocate string case, but it would
be simple to extend.
Note that we already have API like nm_utils_strbuf_*() to fill a buffer
of fixed size. GString is not useable for that (efficiently), hence
this API exists. NMStrBuf could be easily extended to replace this API
without usability or performance penalty. So, while this adds one new
API, it could replace other APIs.
- GString always requires a heap allocation for the container. In by far
most of the cases where we use GString, we use it to simply construct
a string dynamically. There is zero use for this overhead. If one
really needs a heap allocated buffer, NMStrBuf can easily embedded
in a malloc'ed memory and boxed that way.
- GString API supports inserting and removing range. We almost never
make use of that. We only require append-only, which is simple to
implement.
- GString needs to NUL terminate the buffer on every append. It
has unnecessary overhead for allowing a usage of where intermediate
buffer contents are valid strings too. That is not the case with
NMStrBuf: the API requires the user to call nm_str_buf_get_str() or
nm_str_buf_finalize(). In most cases, you would only access the string
once at the end, and not while constructing it.
- GString always grows the buffer size by doubling it. I don't think
that is optimal. I don't think there is one optimal approach for how
to grow the buffer, it depends on the usage patterns. However, trying
to make an optimal choice here makes a difference. QT also thinks so,
and I adopted their approach in nm_utils_get_next_realloc_size().
When growing a buffer by appending a previously unknown number
of elements, the often preferable strategy is growing it exponentially,
so that the amortized runtime and re-allocation costs scale linearly.
GString just always increases the buffer length to the next power of
two. That works.
I think there is value in trying to find an optimal next size. Because
while it doesn't matter in terms of asymptotic behavior, in practice
a better choice should make a difference. This is inspired by what QT
does ([1]), to take more care when growing the buffers:
- QString allocates 4 characters at a time until it reaches size 20.
- From 20 to 4084, it advances by doubling the size each time. More
precisely, it advances to the next power of two, minus 12. (Some memory
allocators perform worst when requested exact powers of two, because
they use a few bytes per block for book-keeping.)
- From 4084 on, it advances by blocks of 2048 characters (4096 bytes).
This makes sense because modern operating systems don't copy the entire
data when reallocating a buffer; the physical memory pages are simply
reordered, and only the data on the first and last pages actually needs
to be copied.
Note that a QT is talking about 12 characters, so we use 24 bytes
head room.
[1] https://doc.qt.io/qt-5/containers.html#growth-strategies
- add more code comments
- refactor the code flow in _get_hash_key_init() to follow a simpler
code path.
- use c_siphash_hash() instead of 3 separate steps.
- Drop "?: static_seed" from nm_hash_static(). It's not useful, because
the only _get_hash_key() for which _get_hash_key()^static_seed is zero
is ~static_seed. That means, only one value of all the static seeds
can result in zero here. At that point, we can just coerce that value
to 3679500967u directly.
Sometimes these function may set errno to unexpected values like EAGAIN.
This causes confusion. Avoid that by using our own wrappers that retry
in that case. For example, in rhbz#1797915 we have failures like:
errno = 0;
v = g_ascii_strtoll ("10", 0, &end);
if (errno != 0)
g_assert_not_reached ();
as g_ascii_strtoll() would return 10, but also set errno to EAGAIN.
Work around that by using wrapper functions that retry. This certainly
should be fixed in glib (or glibc), but the issues are severe enough to
warrant a workaround.
Note that our workarounds are very defensive. We only retry 2 times, if
we get an unexpected errno value. This is in the hope to recover from
a spurious EAGAIN. It won't recover from other errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1797915
NMTST_SWAP() used memcpy() for copying the value, while NM_SWAP() uses
a temporary variable with typeof(). I think the latter is preferable.
Also, the macro is essentially doing the same thing.
NM_ASCII_SPACES contains the ASCII characters according to
g_ascii_isspace().
Add NM_ASCII_WHITESPACES which differs from NM_ASCII_SPACES by not
including "\f". In some cases, that character shall be excluded.
For example, this is what systemd uses as "WHITESPACE" define at
various places.
Also, reorder the spaces string so that plain space comes first. It is
expected that ' ' is much more frequently than newlines or tabs. While
the order here shouldn't matter, it seems preferably to order frequent
characters in front.
I think it's preferable to use nm_clear_g_free() instead of
g_clear_pointer(, g_free). The reasons are not very strong,
but I think it is overall preferable to have a shorthand for this
frequently used functionality.
sed 's/\<g_clear_pointer *(\([^;]*\), *\(g_free\) *)/nm_clear_g_free (\1)/g' $(git grep -l g_clear_pointer) -i
Move the assertion for valid LIST first. It only checks static data,
and regardless of the entry_cmd, it should be done first.
Fixes: f4d12f7b59 ('shared: add NM_UTILS_STRING_TABLE_LOOKUP_STRUCT_DEFINE() macro for lookup of structs')