The previous commit enable different command buffers to program the
same 3DSTATE_BINDING_TABLE_POOL_ALLOC instruction even though they
allocated different chunks of binding tables.
Now we can just predicate this programming and skip the stalling,
flushing & invalidation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39527>
We skip the stall emission for STATE_BASE_ADDRESS since this one can
be skipped on Gfx12.5+ and instead add a new sba tracepoint that has
valid timestamps.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0147908a89 ("anv: predicate emission of STATE_BASE_ADDRESS")
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38256>
In short, perfetto doesn't require the initial clock snapshot to be
earlier than the timestamp to be converted. So we don't have to do
complex handling for it.
With this change:
- renderstage event requires clock sync, so we'd only emit clock
snapshots on the traceq thread that handles the callbacks
- drops redundant sync_timestamp calls as well as sync_gpu_ts tracking
- no need to reset next_clock_sync_ns when tracing is disabled, since a
snapshot is always emitted right after the initial interned data emit
upon tracing start
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
The object name is part of the VkDebugUtilsObjectName event messages.
When the trace buffer is full and the ring buffer fill policy is chosen,
the debug obj events can be overwritten (lost), which is why we need the
RefreshSetDebugUtilsObjectNameEXT.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37425>
The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.
Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.
More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:
* gpu.renderstages disabled: 48.044293667
* gpu.renderstages enabled: 38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
Everything is currently using CLOCK_BOOTTIME, which is perfetto's
default, and matches the previous behavior. On some hardware, different
clocks may be better synchronized with the gpu clock.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34390>
util_perfetto_init() was called in some places, util_cpu_trace_init()
was called in other places, and some places used tracing without ever
calling either of them
util_cpu_trace_init() is now guaranteed to be called:
* on gallium screen create
* on VK instance create
thus no driver/frontend/etc should ever need to call this manually
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36628>
A few Android tools are based on/assume the datasource names
gpu.renderstages and gpu.counters. It is less effort to align with that
naming for Android builds than to chase down those tools and fix them,
not to mention account for new tools that may be created in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34330>
Buffer with indirect args wasn't passed to the function which
adds extra event args. Since function definition depends on the
common code, the definition is moved to a single place.
Fixes: 0a17035b5c
("u_trace: add support for indirect data")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31090>
Allows a driver to declare indirect arguments for its tracepoints and
pass an address. u_trace will request a copy of the data which should
be implemented on the command processor.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>
Remove extra reporting of hdc_flush when viewing a Perfetto trace for
anv.
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30312>
Otherwise u_trace has to think that each submission is a frame,
and that's not great if we want to gather statistics on per real
frame basis.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29220>
Add up to four reasons per flush to perfetto flushes. PC reasons
will help debuggers understand why flushes were required, and
perhaps provide hints as to how they can be avoided.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27400>
Previously all items on a timeline row would have the same name. This
change uses the tracepoint names to put into the timeline instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25730>
Useful to figure out what's the tracepoint name you're implementing.
We'll use this in the intel perfetto integration glue to index into an
array of perfetto iid.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25730>
This will let other drivers use the same way of presenting annotations
without duplicating the whole hash table thing.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22157>
This is way more horrifying than I hoped -- I can't figure out a way to
have the method be on TraceContext, so it's a static method of the
datasource, but then you have to name the templated types over and over.
You have to pass in a TraceContext because intel emits the clock sync
packet within a Trace(), and perfetto just silently corrupts the trace if
you Trace() in a Trace().
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22157>
Deduplicates some code from intel/tu/freedreno, and will be a common place
to put other shared code.
The downside I can see is this logging:
[013.129] tu_perfetto.cc:122 Tracing started
[013.129] intel_driver_ds.cc:133 Tracing started
("oh, huh, apparently data sources for both drivers are registered? wild")
becomes:
[142.906] erfetto_renderpass.h:50 Tracing started
[142.907] erfetto_renderpass.h:50 Tracing started
("huh, why is my driver's data source being started twice?").
Unfortunately we can't easily get a string for the data source type due to
not having rtti.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22157>
u_vector_add() don't keep the returned pointers valid.
After the initial size allocated in u_vector_init() is reached it will
allocate a bigger buffer and copy data from older buffer to the new
one and free the old buffer, making all the previous pointers returned
by u_vector_add() invalid and crashing the application when trying to
access it.
This is reproduced when running
dEQP-VK.synchronization.signal_order.timeline_semaphore.* in DG2 SKUs
that has 4 CCS engines, INTEL_COMPUTE_CLASS=1 is set and of course
perfetto build is enabled.
To fix this issue here I'm moving the storage/allocation of
struct intel_ds_queue to struct anv_queue/iris_batch and using
struct list_head to maintain a chain of intel_ds_queue of the
intel_ds_device.
This allows us to append or remove queues dynamically in future if
necessary.
Fixes: e760c5b37b ("anv: add perfetto source")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20977>
Here adding kmd_type parameter to
intel_gem_read_render_timestamp(), intel_gem_can_render_on_fd() and
intel_gem_supports_protected_context().
Those 3 functions will have Xe implementations, the other functions
in intel_gem.h will not be called by Xe code paths so not adding
kernel_driver_type to it.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20773>