Currently the cache just deletes enough items when the cache is
full to make room for the new item being stored. This hasn't
been too much of a problem in practice but for things like running
piglit where we have thousands of unique shaders and all threads
being utilised we end up with a pretty big bottle neck.
With this change rather than just brute forcing our way to having
enough room for the new item, we instead grab 10% of the least
recently used items in the random directory we chose and delete
them all. This should only be around 0.04% of total cache items
but should hopefully releave some of the pressure on system calls
like fstatat().
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11523>
Statement (void*)debug_name when FreeBSD is defined has no use. Removed
it to fix compiler warnings.
Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11203>
On Linux ENODATA is defined but on BSD, and MacOSX ENOATTR is used
instead. Defined ENODATA to be ENOATTR when the system is not Linux.
v2: Replaced ENODATA and ENOATTR with -EFAULT that is exists everywhere
and added a comment (Ian Romanick)
Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11203>
Apparently the quantization math isn't cheap.
This further reduces overhead by 2% for drawoverhead/8 textures.
The improvement is measured by looking at the sysprof percentage delta and
multiplying by 2 (because we have the frontend and gallium threads with
equal overhead, so the benefit is doubled compared to 1 thread).
Both per-sampler and per-unit lod bias values are quantized.
The difference in behavior is that both values are quantized separately
and then added up, instead of first added up and then quantized.
The worst case error is +- 1/256 in the reduced precision, i.e. off by one
in a fixed-point representation, which should be fine.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11339>
MSVC's qsort_s behaves similarly to sort_r. Unfortunately, qsort_s's
compare function has the "context"/"args" as its first argument. BSD's
qsort_r has a different order than GNU's qsort_r. Finally, C11 added
qsort_s's which look like GNU's gsort_r.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10989>
Android and MSVC don't have qsort_r() so let's provide a util wrapper
that uses the old qsort and thread-local storage. We use C++ for this
because thread_local is built into C++11 and we can't rely on C11
everywhere.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10989>
This thing is entirely opt-in wrt caring about it when writing to
a file anyway. Since we also lock the two at the same time and they
have an 1-1 relation we can just lock one of the two files. Saves
some syscalls.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11485>
This avoids all locks for reads and using lock only while actually
writing.
This is enabled by doing two things:
1) Reading the index incrementally. This way we get new entries
written by other processes and do not write duplicate entries.
2) Taking the lock only during writes, and applying the incremental
read while holding the lock so we always append to the actual end of the file.
Fixes: eca6bb9540 ("util/fossilize_db: add basic fossilize db util to read/write shader caches")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11485>
Often disassemblers and things in our drivers want to be able to
incrementally printf together a line, but that gets in the way of
Android's logging that wants to see a whole line all at once. Make a
little wrapper to do the ralloc_asprintf_rewrite_tail() and flushing lines
as they appear.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
Instead of spawning 4 threads when the cache is created,
spawn 1 and let u_queue grow the number of threads if
needed.
I wrote this patch because when running piglit's quick_shader
profile I had lots of samples in disk cache threads - mostly
in native_queued_spin_lock_slowpath kernel function.
Since these tests shouldn't really stress the cache, I assumed
it was caused only by thread creations.
After writing the patch and redoing the measurement, I got an
improvement but I still more hits in the same function for
shader_runner:$disk0 thread so something was wrong.
After digging more, I found out that my shader cache index was
corrupted: the on-disk size was 29MB but the index reported it
was way more than 1GB. So each disk cache thread was spending
a lot of time trying to evict files. Given that my cache had
a really low count of files, the LRU method based on randomly
generating subfolder names failed, so evicting was very slow.
Now that my cache index is fixed, the disk cache threads are
mostly idle but I still think it makes sense to grow the
number of threads instead of spawning 4 at the program start.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11296>
This flag allow to create a single thread initially, but set
max_thread to the request thread count.
If the queue is full and num_threads is lower than max_threads,
we spawn a new thread to help process the queue faster.
This avoid creating N threads at queue creation time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11296>
this better enables object-specific (e.g., context) queues where the owner
of the queue will always be needed and various pointers will be passed in
for tasks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11312>
A similar path can be used on at least FreeBSD using cpuset_getaffinity.
This is how Ninja determines the number of available CPUs on that
platform. See the GetProcessorCount function in util.cc:
https://github.com/ninja-build/ninja/blob/master/src/util.cc
v2: Fix counting the number of available CPUs. The CPU_COUNT API does
not work the way I thought it did. :face_palm: Noticed by Marek.
Reviewed-by: Adam Jackson <ajax@redhat.com> [v1]
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> [v1]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11228>
This prevents problems when some CPUs are offline. In a four CPU
system, if CPUs 1 and 2 are offline, the cache topology code would
only examine CPUs 0 and 1... giving incorrect information.
The types are changed to int16_t so that the offset of num_L3_caches
does not change. This triggered a STATIC_ASSERT failure:
STATIC_ASSERT(offsetof(struct util_cpu_caps_t, num_L3_caches) == 5 * sizeof(uint32_t));
I'm assuming there's some assembly code or something that depends on
this offset, and I don't feel like messing with it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11228>
In the current code, this prevents a very unlikely corner case. More
importantly, it should prevent the next commit from breaking the
universe.
Imagine a system with 64 CPUs configured, but first 32 CPUs are offline.
_SC_NPROCESSORS_CONF will return 32. All of the surrounding code will
interpret this as meaning CPUs 0 through 31, but all of those CPUs are
offline. Nothing good can happen then.
The problem cases require systems with more than 32 CPUs because
util_cpu_caps.num_cpu_mask_bits is always rounded up to a multiple of
32.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11228>
memset operates in bytes, and there are 8-bits in a byte. This is a
very easy to miss typo. :(
Fixes: 9758b1d416 ("util: add util_set_thread_affinity helpers including Windows support")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11228>
Debugging fd mix-ups (ie. where, possibly via close()ing the original
fd, etc, you end up with something that is a valid fd but not a valid
*fence* fd) can be difficult. Fortunately we can use the FILE_INFO
ioctl, which will return an error if the fd is not a fence fd.
For android, we instead use the libsync API, which does a similar thing
on modern kernels, but has a fallback path for older android kernels.
Note that the FILE_INFO ioctl has existed upstream since at least prior
to destaging of sync_file.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11202>
Now that we have an idea of how many regs the conflicting allocation uses,
we can just skip to the next one and save repeated tests to find the same
conflicting neighbor again.
shadowrun-returns shader-db time on skl -1.62821% +/- 1.58079% (n=679),
now there's no statistically significant change from the start of the series
(n=420)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
In the fully general case of register classes, to expose an allocation
class of unaligned 2-contiguous-regs allocations, for example, you'd have
your base individual regs (128 on intel), and another set of 127 regs that
each conflicted with the corresponding pair of the base regs. Single-reg
nodes would allocate in the 128, and double-reg nodes would allocate in
the 127 and the user would remap from the 127 down to the base regs with
some irritating table.
If you need many different contiguous allocation sizes (16 is a pretty
common number across drivers), your number of regs explodes, wasting
memory and making the q computation expensive at startup.
If all the user has is contiguous-reg classes, we can easily compute the q
value up front (as found in the intel driver and nouveau, for example),
and we only have to change a couple of places in the conflict-checking
logic so the contiguous-reg classes can use the base registers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
This was bumped in 7e584a70c4 ("gallium: increase table size for fast
log/pow functions") presumably to fix conformance of tgsi_exec, but we
don't need that much accuracy in the only place it's used in the tree any
more: softpipe texture sampling.
softpipe glmark2 -b texture:texture-filter=linear FPS +0.335748% +/-
0.220111% (n=20)
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11173>
It's disabled due to non-conformance with no configuration knob to turn it
on, and if you care about swrast performance you're on llvmpipe anyway.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11173>
DXVK 1.8.1 marks position as always invariant but it's disabled for
SotTR because it introduces rendering issues on NV. The DX12 version
also likely needs that.
Fixes a similar foliage issue initially found with the native version.
Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11006>
No more error-prone encoding of swizzles in the .csv for non-planar
formats!
No change to generated u_format_table.c
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>
Once you read enough of them, there's an obvious pattern that we can just
write a little code for instead of making every dev write it out each time.
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>
Just check against the CSV (which has its codegen now tested with
u_format_test in CI) for now, so we know that our computed channels are
correct.
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>
It notably didn't fit the pattern of RGB5_A1_UNORM, and violated the
general pattern for bitmask format BE channels (channels are ordered
right-to-left in the BE columns in the CSV due to the parser walking them
in that order for historical reasons).
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>
These tests passed for LE, and the BE channel ordering specified obviously
didn't fit the pattern of the other BE formats (channels are listed
right-to-left in the BE columns for historical reasons).
Note that we can't write pure-integer format tests in u_format_tests.c
currently.
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>
Z32_FLOAT_S8X24_UINT and X32_S8X24_UINT are in fact the only non-bitmask
formats that have BE swizzles specified, but sorting out those two is
harder.
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>
I wanted to do the next set BE changes here where I have Format's helper
functions available.
No changes in generated u_format_table.c.
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>