For the derived counters generate functions that read the required
hardware counters, then compute and return the result.
The computations use doubles like in libGPUCounters. It also performs all
computations using floating point, we want to match the output of other
tools using that library.
The equation implementations are deduplicated as there are counters
which have changed their equation over time but not every generation.
libGPUCounters (1) contains all required information to generate the
counter definitions used in mesa for Bifrost+ architectures.
This script gathers the required information from the xml definitions in
libGPUCounters and outputs pan/perf xmls.
It also already includes support for derived counters, meaning counters
which are computed from other counters actually created by HW. For
those, we recursively resolve the variables in the equation until only
HW counters and configuration values are left. It makes sense to do it
here already since the datastructures make it a simple addition and the
codegen doesn't need to handle it at compile time later that way.
Derived counters that require MALI_CONFIG_TIME_SPAN are skipped for now.
libGPUCounters also does not generate the equations for those and it
makes hooking up the derived counters in pan simpler when we don't have
to estimate the duration of a sample in some way.
1) https://github.com/ARM-software/libGPUCounters
_raw no longer takes the "raw" offsets as arguments so the name doesn't
make sense anymore. The old non "_raw" function actually sums over
blocks instead of just reading the passed counter.
This commits fixes the naming of those two functions so they match what
the functions do.
v2:
- Lower the counter_period_ns to match the minimum value in code
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Suggested-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Instead of summing counters from shader cores, and outputting only
the counters from the first l2 slice, use the memory layout provided
from the kmod to output individual counters for each (category, block,
counter) combination.
Co-Authored-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
This makes it so that everything that uses the pan_perf C lib is
hidden inside PanfrostPerf instead of being used directly from the
pps driver.
Co-Authored-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Add the panthor performance counter uAPI, added in v5 of the patch
series "Add performance counters with manual sampling mode",
based on the drm-misc-next kernel, base commit
96c85e428ebaeacd2c640eba075479ab92072ccd
v2:
- the series is now based on the v5 of the kernel patch
The Perfetto spec supports several units that are supported directly by
Mali performance counters, which are not being expressed in the data
source.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Put PanfrostDevice into it's own file to keep pan_pps_perf.cpp focused on
the panfrost specific producer implementation.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Add manually created Mali-Gx10 counter definitions.
v2:
- Added the architecture major field.
v3:
- Swap the order of the shader core and memsys blocks.
v4:
- G710 -> Gx10, to indicate that all GPUs in this generation are
supported
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Co-developed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Using the enum definitions prevents the category indices to get
out of sync from the block types specified in the XML.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
The Mali-Gx10 series (G710, G610 and G510) introduce one new category of
counters which needs to be accounted for in the setup code. Adding this
into an enum ensures relevant structs are updated automatically.
v2:
- Modified generator script to use the enum
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
The source files generated from counter XML files should now contain
a copyright corresponding to the year of generation.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Starting from the Mali Gx10 series, some hardware counters may indicate
the number of interrupts occurring during the sampling period.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
The kernel module is responsible for starting/stopping the counter
collection. It decides the layout of the counters in memory.
The commit adds an API to reflect this. The counter collection
can be started and stopped through the kmod. Counters are dumped into
a buffer also provided by the kmod. This is so that later for panthor
the buffer can be an mmapped bo. It also allows for having a larger
buffer where multiple samples are located internally but pointing data
at the most recent one.
The memory layout of whatever the data pointer points to can be
queried so that the counters can be extracted from it without
going through the kmod vtable.
Fixes the following building errors:
../src/amd/vulkan/radv_rra.c:1369:43: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_bvh_stats_gfx12 stats = {};
^
../src/amd/vulkan/radv_rra.c:1376:45: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_bvh_stats_gfx10_3 stats = {};
^
2 errors generated.
Fixes: 8c10eab1 ("radv: Add an option for dumping BVH stats")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41011>
Fixes the following building errors:
../src/amd/vulkan/radv_shader.c:3460:42: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_shader_debug_info debug = {};
^
1 error generated.
../src/amd/vulkan/radv_shader_args.c:975:43: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct user_sgpr_info user_sgpr_info = {};
^
1 error generated.
Fixes: 480a94fb ("radv: Gather debug info about shader args")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41011>
With autotune allocating counters low-to-high, the conflict with
PERFORMANCE_QUERY_KHR will happen if any CP-based counters are
used. This is a temporary workaround which just drops the first
two CP counters from being usable for performance queries.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
This is more consistent with the newly established pattern of the
UMD allocating all locally used performance counters low-to-high
instead of the prior high-to-low order.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
The UMD will be switching to allocating counters from low-to-high,
so to avoid the chances of conflict with this new policy the PPS
driver now allocates the other way around. Additionally, this will
future proof it for the MSM-DRM uAPI for performance counters which
will similarly allocate from high-to-low.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>