panvk/perfetto: improve clock synchronization using CLOCK_MONOTONIC_RAW

On Mali, GPU timestamp cycle counts are mapped to the arch counter, and
so advance at the same rate as CNTVCT (with a fixed offset). The kernel
applies gradual NTP adjustments to CLOCK_BOOTTIME by modifying the rate
of the cycle->ns conversion slightly from the nominal frequency of the
clock, which causes it to drift from the GPU clock's ns values (which
just use the nominal frequency). On a rock5b, I measured this drift in
the 25-30µs/s range.

Perfetto's clock synchronization applies a fixed offset between each
clock snapshot, and so does not handle clocks with significantly
different rates and infrequent snapshots well. For panvk, we emit
snapshots once per second, and so the drift results in an error of
~25µs right before the next snapshot. This is significant for measuring
the latency of CPU<->GPU operations, and shows up as a sawtooth pattern
on the measured latency distribution over time.

CLOCK_MONOTONIC_RAW does not have the NTP adjustment, and so the only
source of drift is error in the shift/mult approximation that the kernel
uses for cycle->ns. This error is very small, and so by emitting CPU
trace events against CLOCK_MONOTONIC_RAW instead of CLOCK_BOOTTIME, we
can get much more accurate synchronization.

Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34390>
This commit is contained in:
Olivia Lee 2025-04-04 21:02:04 -07:00 committed by Marge Bot
parent 78d3b9cd0a
commit e278a89fdd

View file

@ -100,8 +100,8 @@ emit_clock_snapshot_packet(struct panvk_device *dev,
const struct panvk_utrace_perfetto *utp = &dev->utrace.utp;
const uint64_t gpu_ns = get_gpu_time_ns(dev);
const uint32_t cpu_clock_id =
perfetto::protos::pbzero::BUILTIN_CLOCK_BOOTTIME;
const uint64_t cpu_ns = perfetto::base::GetBootTimeNs().count();
perfetto::protos::pbzero::BUILTIN_CLOCK_MONOTONIC_RAW;
const uint64_t cpu_ns = perfetto::base::GetWallTimeRawNs().count();
MesaRenderpassDataSource<PanVKRenderpassDataSource, PanVKRenderpassTraits>::
EmitClockSync(ctx, cpu_ns, gpu_ns, cpu_clock_id, utp->gpu_clock_id);
@ -306,6 +306,12 @@ panvk_utrace_perfetto_init(struct panvk_device *dev, uint32_t queue_count)
for (uint32_t i = 0; i < ARRAY_SIZE(utp->stage_iids); i++)
utp->stage_iids[i] = next_iid++;
/* Mali GPU timestamps map to the system arch counter. CLOCK_MONOTONIC_RAW
* is therefore better for synchronization with the GPU timestamps than the
* default CLOCK_BOOTTIME, which drifts from the arch counter's rate
* slightly due to NTP adjustment. */
util_perfetto_set_default_clock(CLOCK_MONOTONIC_RAW);
static once_flag register_ds_once = ONCE_FLAG_INIT;
call_once(&register_ds_once, register_data_source);
}