Format draw and draw_indexed Perfetto events with their vertex count.
For draw_indirect and draw_indexed_indirect, include the draw count
when indirect tracing is enabled (MESA_GPU_TRACES=indirects), otherwise
fall back to the static name.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>
Format compute events as compute(x,y,z) using the end-payload group
dimensions. Trailing dimensions that equal 1 are omitted to keep labels
concise — e.g. compute(128,1,1) becomes compute(128).
For compute_indirect, the dispatch dimensions are not known at command
record time since they live in GPU memory as a VkDispatchIndirectCommand.
The u_trace framework reads them back at trace flush time via the
is_indirect mechanism: the GPU address is recorded alongside the
tracepoint, and u_trace copies the pointed-to struct into indirect_data
once the GPU has finished. The same trailing-1 trimming is applied when
indirect tracing is enabled (MESA_GPU_TRACES=indirects); otherwise the
event falls back to the static "compute_indirect" name.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>
Add a separate end_event_dyn() that takes a std::string by value for
dynamic event names. The [=] lambda capture deep-copies the string into
the closure, avoiding a dangling pointer when the Trace() continuation
runs after the caller's stack frame is gone.
The existing end_event() with const char* remains for string literals
and long-lived pointers (e.g. payload->str), where no copy is needed.
CREATE_DUAL_EVENT_CALLBACK_DYN formats the event name via snprintf and
passes the result as a std::string to end_event_dyn(). Follow-up patches
will use this macro to label events with runtime dimensions.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41374>
Promote DBG() failure paths in enumerate_sysfs_metrics() to mesa_logw()
so users see why OA metrics are unavailable without needing INTEL_DEBUG.
Also log in PPS when no OA queries are available after initialization.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40898>
When observation_paranoid (xe) or perf_stream_paranoid (i915) prevents
unprivileged access to OA metrics, the existing code silently returns no
OA queries. PPS then fails with just a segfault.
This patch adds INTEL_PERF_FEATURE_OA_BLOCKED_BY_POLICY to
intel_perf_features, set by both KMD backends when the paranoid sysctl
exists but lacks sufficent privilage. PPS checks this flag immediately
after initialising intel_perf and returns an error before attempting
metric-set selection.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40898>
Previously, we only checked if the hardware duration was greater
than the requested sample period by 1000 ns. This can lead the
hardware duration to be rejected and use the next cycle, which
is double the size of the current duration.
At larger requested sample size, this can mean getting a hardware
duration of 1.7 ms for a requested sample period of 1 ms.
To fix this, we'll scale the check so that it uses 67% of the
requested sample period as the reject threshold. This way, if the
hardware duration is below 67%, it's guaranteed to be within
100%-133% of the requested sample period on the next hardware interval.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40735>
The previous commit enable different command buffers to program the
same 3DSTATE_BINDING_TABLE_POOL_ALLOC instruction even though they
allocated different chunks of binding tables.
Now we can just predicate this programming and skip the stalling,
flushing & invalidation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39527>
We skip the stall emission for STATE_BASE_ADDRESS since this one can
be skipped on Gfx12.5+ and instead add a new sba tracepoint that has
valid timestamps.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0147908a89 ("anv: predicate emission of STATE_BASE_ADDRESS")
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38256>
pps initializes perf counter multiple times, once from
GpuDataSource::register_data_source and once from
GpuDataSource::OnSetup. This is fine, except we should replace
failing assert with skip on second call.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38224>
os_get_option() is a wrapper for getenv() that checks properties in
Android. It should be a no-op for other OS but will allow full use of
env vars in Android.
The environment variable names are automatically renamed by
os_get_option() and the order of precedence thus becomes:
1. getenv (non-Android)
2. debug.mesa.* (Android)
3. vendor.mesa.* (Android)
4. mesa.* (Android, as a fallback for older versions)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37587>
When CPU clock is the same with the authoritative trace clock (normally
default to CLOCK_BOOTTIME), perfetto drops the non-monotonic snapshots
to ensure validity of the global source clock in the resolution graph.
When they are different, the clocks are marked invalid and the rest of
the clock syncs will fail during trace processing.
There's no central daemon emitting consistent snapshots for
synchronization between CPU and GPU clocks on behalf of renderstages and
counters producers. The sequence-scoped clock (64 <= ID < 128) is unique
per producer + writer pair within the tracing session. So we can use
sequence-scoped clock for gpu clock whenever applicable, and fallback to
use global clock for dynamic minor allocated >= 192.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
In short, perfetto doesn't require the initial clock snapshot to be
earlier than the timestamp to be converted. So we don't have to do
complex handling for it.
With this change:
- renderstage event requires clock sync, so we'd only emit clock
snapshots on the traceq thread that handles the callbacks
- drops redundant sync_timestamp calls as well as sync_gpu_ts tracking
- no need to reset next_clock_sync_ns when tracing is disabled, since a
snapshot is always emitted right after the initial interned data emit
upon tracing start
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
The object name is part of the VkDebugUtilsObjectName event messages.
When the trace buffer is full and the ring buffer fill policy is chosen,
the debug obj events can be overwritten (lost), which is why we need the
RefreshSetDebugUtilsObjectNameEXT.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.
Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.
More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:
* gpu.renderstages disabled: 48.044293667
* gpu.renderstages enabled: 38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
Everything is currently using CLOCK_BOOTTIME, which is perfetto's
default, and matches the previous behavior. On some hardware, different
clocks may be better synchronized with the gpu clock.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34390>
util_perfetto_init() was called in some places, util_cpu_trace_init()
was called in other places, and some places used tracing without ever
calling either of them
util_cpu_trace_init() is now guaranteed to be called:
* on gallium screen create
* on VK instance create
thus no driver/frontend/etc should ever need to call this manually
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36628>
This means the perfetto UI will have a non-zero/NULL handle/name in the UI
on these renderstages. Unfortunately, intel/ds is outside of vulkan so
unless we pull in anv headers, we can't just pass in the anv_cmd_buffer.
This also means it would be much more painful to pass the cmd buffer to
the rest of the events, so they'll still have unset command buffers.
Still, being able to see the name of the command buffer in at least one of
the events should be useful once that's glued together.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22350>
A few Android tools are based on/assume the datasource names
gpu.renderstages and gpu.counters. It is less effort to align with that
naming for Android builds than to chase down those tools and fix them,
not to mention account for new tools that may be created in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34330>
This patch exposes shader hashes (computes and draws) to Perfetto and
utrace. By including these hashes in traces, developers can correlate
compute and draw calls with their assoicated ASM dumps when analyzing
the traces.
To achieve this, intel_tracepoint.py has been reworked to preprocess
tracepoint arguments dynamically. Any argument containing "hash" in its
variable name is now forrmated as hexadecimal before being passed to the
tracepoint definition.
Signed-off-by: Michael <michael.cheng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32708>
When the runtime is going to potentially emit no dispatch, we need to
have a way to capture a timestamp. Add a new flag for this to tell
whether we don't have a HW instruction to capture the timestamp and
rely on MI_STORE_REGISTER_MEM instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: de00fe3f66 ("anv: add BVH building tracking through u_trace")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12382
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32835>
Buffer with indirect args wasn't passed to the function which
adds extra event args. Since function definition depends on the
common code, the definition is moved to a single place.
Fixes: 0a17035b5c
("u_trace: add support for indirect data")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31090>
Allows a driver to declare indirect arguments for its tracepoints and
pass an address. u_trace will request a copy of the data which should
be implemented on the command processor.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
We're about to add indirect arguments, having a better way to describe
arguments (as capture/storage) will be useful.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
Remove extra reporting of hdc_flush when viewing a Perfetto trace for
anv.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30312>
Only place that still has i915_drm.h includes in common code is
intel_perf_query.c.
This are the last i915_drm.h includes in headers in common code \o/.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29312>