On Mali, GPU timestamp cycle counts are mapped to the arch counter, and
so advance at the same rate as CNTVCT (with a fixed offset). The kernel
applies gradual NTP adjustments to CLOCK_BOOTTIME by modifying the rate
of the cycle->ns conversion slightly from the nominal frequency of the
clock, which causes it to drift from the GPU clock's ns values (which
just use the nominal frequency). On a rock5b, I measured this drift in
the 25-30µs/s range.
Perfetto's clock synchronization applies a fixed offset between each
clock snapshot, and so does not handle clocks with significantly
different rates and infrequent snapshots well. For panvk, we emit
snapshots once per second, and so the drift results in an error of
~25µs right before the next snapshot. This is significant for measuring
the latency of CPU<->GPU operations, and shows up as a sawtooth pattern
on the measured latency distribution over time.
CLOCK_MONOTONIC_RAW does not have the NTP adjustment, and so the only
source of drift is error in the shift/mult approximation that the kernel
uses for cycle->ns. This error is very small, and so by emitting CPU
trace events against CLOCK_MONOTONIC_RAW instead of CLOCK_BOOTTIME, we
can get much more accurate synchronization.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34390>
Everything is currently using CLOCK_BOOTTIME, which is perfetto's
default, and matches the previous behavior. On some hardware, different
clocks may be better synchronized with the gpu clock.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34390>
DRM syncobjs always let you wait repeatedly on them, so we can set the
flag in the core instead of having each driver override it once they try
to enable the emulated timeline semaphores.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36563>
util_perfetto_init() was called in some places, util_cpu_trace_init()
was called in other places, and some places used tracing without ever
calling either of them
util_cpu_trace_init() is now guaranteed to be called:
* on gallium screen create
* on VK instance create
thus no driver/frontend/etc should ever need to call this manually
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36628>
Stage and access masks often go together. Representing the two
with a single value lets us have neater code.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36176>
Currently the only thing this function ever does is clear AFBC
metadata when transitioning away from UNDEFINED.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36176>
Make get_cs_deps accumulate the dependencies. This is useful when
implementing layout transitions. Layout transitions away from
UNDEFINED are implemented with a compute dispatch sandwiched
in-between src and dst, specified in the image barrier. When
implementing device event wait and the event dependency needs a
layout transition, we can defer emitting the barrier that syncs
layout transition dispatches with dst until after we have emitted
all dispatches, avoiding sync between the said transition
dispatches.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36176>
This reorder things a bit, ensure we attempt more agressive vectorisation,
attempt to optimize cf and more.
Inspiration from NAK's optimize_nir function.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
We can end up with conversion instructions to the same type of integer
with nir_lower_bool_to_bitsize so let's make
bifrost_nir_lower_algebraic_late handle those cases.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
We can benifit from it, let's allow it when we aren't forced to follow
robustness2.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
Embrace modernity and consistency between alu and vectorization passes.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36629>
Unless the command we're tracing produces the data we want to capture,
there is no need to wait for its timestamp write.
Currently, there is no such case, so make it the responsibility of the
caller instead of implicitly adding the wait.
Also remove and unnecessary wait for the LS scoreboard after the copy
buffer store.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36519>
The buffer is only an IUB if it's within the size of the resource entry.
Otherwise, it might just be a buffer that landed just after the
descriptor allocation.
Fixes: fb38f10240 ("panvk: Handle IUBs in decoder")
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36519>
This can be removed now that we do compute shader minmax. Fixes failed
mmap errors on PPSSPP and some other vulkan applications:
MESA: error: mmap(..., size=4194304, prot=2, flags=0x1) failed: Invalid argument
Fixes: e25064c026 ("panvk: Use indirect path for indexed draw on JM")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36695>
That pass tries to remove blocks that are not needed:
blocks with only 1 predecessor and 1 successor containing no
instruction or only 1 branch instruction.
Also remove unnecessary branch at the end of block with only 1
successor which is the next block
Run this pass at the end of the compiler flow once all optimisations
have been applied.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36021>
Use bi_make_vec_to to allow to use only 1 MKVEC.v2i8 when possible.
Also add support for all swizzles instead of only mono-byte ones,
using bi_swizzle_to_byte_channels.
Update assert in bi_byte.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35643>
When making a vector of 4 elements, try to make it with only 1
instruction instead of 2 at the moment, if the last 2 elements respect
the pattern supported by MKVEC.v2i8
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35643>
If a meta operation (like a blit or clear) happens while occlusion
queries are active, we temporarily disable the query. Unfortunately
the code for this did not clear out the `syncobj` field. In rare
combinations of circumstances this could cause an attempt to issue
a write back of the occlusion query values, and since we've zeroed
the `ptr` field it writes to a NULL value, causing a bus fault and
device lost error.
Fixes: 61534faf4e ("panvk: Wire occlusion queries to internals")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36525>
While the shaderCoreBuiltins feature is only supported on v9+, the
extension itself provides some useful physical device properties and is
supported on all GPUs.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36019>
It's not only for GL, change to a generic name.
Use command:
find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
This parallelize min max index search and avoid running that logic per
layer.
This should speed up indexed draw a bit.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>
Now that we know that indirect draw works, we can switch to the new
indexed draw codepath.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>
On JM hardware, we need to allocate a buffer depending on vertex count.
As a result, for indirect and indexed draw we allocate a large buffer with
alloc on fault set.
The size of that buffer is calculated assuming a max of 2 millions vertices
and 18 attributes per vertex (16 user attributes, 2 specials)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>
Most of cmd_draw logic could be shared, let's move what we can.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>
We are going to integrate the helper CL shader.
This shader requires certain extra infos that we need to provide, this
patch adds logic to allocate and fill those infos.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>
For indirect draws, we need to depends on some previous job in the chain
so we use the grid info to pass this.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>