intel/dev: fix timebase_scale ticks-to-ns precision loss across 2^32

Android CTS CtsGpuProfilingDataTest#testProfilingDataProducersAvailable
intermittently fails with "Render stages reported before their
VkQueueSubmit events". Root cause is in the Perfetto clock correlation:
render-stage timestamps go through intel_device_info_timebase_scale()
while VkQueueSubmit packets use BOOTTIME directly, so any drift in the
scaler shows up as render stages preceding their submits.

intel_device_info_timebase_scale() scales the upper and lower halves
of the raw timestamp separately and recombines them, but silently
drops the upper-half division's remainder. When the frequency doesn't
evenly divide 1e9, every wrap past 2^32 loses a fixed number of ns
and shows up as a step in Perfetto's GPU-vs-BOOTTIME snapshot offset.

Carry the upper-half remainder into the lower-half numerator before
dividing, so no precision is lost. All intermediates still fit in
uint64_t.

Cc: mesa-stable
Signed-off-by: Nemallapudi, Jaikrishna <nemallapudi.jaikrishna@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41630>
This commit is contained in:
Nemallapudi, Jaikrishna 2026-05-15 07:34:49 +00:00 committed by Marge Bot
parent 0a6816d860
commit e47ed60ee6

View file

@ -171,8 +171,11 @@ intel_device_info_timebase_scale(const struct intel_device_info *devinfo,
/* Try to avoid going over the 64bits when doing the scaling */
uint64_t upper_ts = gpu_timestamp >> 32;
uint64_t lower_ts = gpu_timestamp & 0xffffffff;
uint64_t upper_scaled_ts = upper_ts * 1000000000ull / devinfo->timestamp_frequency;
uint64_t lower_scaled_ts = lower_ts * 1000000000ull / devinfo->timestamp_frequency;
uint64_t upper_num = upper_ts * 1000000000ull;
uint64_t upper_scaled_ts = upper_num / devinfo->timestamp_frequency;
uint64_t upper_remainder = upper_num % devinfo->timestamp_frequency;
uint64_t lower_scaled_ts = ((upper_remainder << 32) + lower_ts * 1000000000ull) /
devinfo->timestamp_frequency;
return (upper_scaled_ts << 32) + lower_scaled_ts;
}