Add the panthor performance counter uAPI, added in v5 of the patch
series "Add performance counters with manual sampling mode",
based on the drm-misc-next kernel, base commit
96c85e428ebaeacd2c640eba075479ab92072ccd
v2:
- the series is now based on the v5 of the kernel patch
The Perfetto spec supports several units that are supported directly by
Mali performance counters, which are not being expressed in the data
source.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Put PanfrostDevice into it's own file to keep pan_pps_perf.cpp focused on
the panfrost specific producer implementation.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Add manually created Mali-Gx10 counter definitions.
v2:
- Added the architecture major field.
v3:
- Swap the order of the shader core and memsys blocks.
v4:
- G710 -> Gx10, to indicate that all GPUs in this generation are
supported
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Co-developed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Using the enum definitions prevents the category indices to get
out of sync from the block types specified in the XML.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
The Mali-Gx10 series (G710, G610 and G510) introduce one new category of
counters which needs to be accounted for in the setup code. Adding this
into an enum ensures relevant structs are updated automatically.
v2:
- Modified generator script to use the enum
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
The source files generated from counter XML files should now contain
a copyright corresponding to the year of generation.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Starting from the Mali Gx10 series, some hardware counters may indicate
the number of interrupts occurring during the sampling period.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
The kernel module is responsible for starting/stopping the counter
collection. It decides the layout of the counters in memory.
The commit adds an API to reflect this. The counter collection
can be started and stopped through the kmod. Counters are dumped into
a buffer also provided by the kmod. This is so that later for panthor
the buffer can be an mmapped bo. It also allows for having a larger
buffer where multiple samples are located internally but pointing data
at the most recent one.
The memory layout of whatever the data pointer points to can be
queried so that the counters can be extracted from it without
going through the kmod vtable.
Fixes the following building errors:
../src/amd/vulkan/radv_rra.c:1369:43: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_bvh_stats_gfx12 stats = {};
^
../src/amd/vulkan/radv_rra.c:1376:45: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_bvh_stats_gfx10_3 stats = {};
^
2 errors generated.
Fixes: 8c10eab1 ("radv: Add an option for dumping BVH stats")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41011>
Fixes the following building errors:
../src/amd/vulkan/radv_shader.c:3460:42: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_shader_debug_info debug = {};
^
1 error generated.
../src/amd/vulkan/radv_shader_args.c:975:43: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct user_sgpr_info user_sgpr_info = {};
^
1 error generated.
Fixes: 480a94fb ("radv: Gather debug info about shader args")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41011>
With autotune allocating counters low-to-high, the conflict with
PERFORMANCE_QUERY_KHR will happen if any CP-based counters are
used. This is a temporary workaround which just drops the first
two CP counters from being usable for performance queries.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
This is more consistent with the newly established pattern of the
UMD allocating all locally used performance counters low-to-high
instead of the prior high-to-low order.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
The UMD will be switching to allocating counters from low-to-high,
so to avoid the chances of conflict with this new policy the PPS
driver now allocates the other way around. Additionally, this will
future proof it for the MSM-DRM uAPI for performance counters which
will similarly allocate from high-to-low.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
When preemption optimization is supported then the necessary CP
counters being missing causes a device initialization error which
is unnecessary as support can simply be disabled instead to allow
for a more graceful fail. This also fixes A8XX which doesn't have
performance counters hooked up yet.
Cc: mesa-stable
Signed-off-by: Dhruv Mark Collins <mark@igalia.com>
Assisted-by: OpenAI Codex (GPT-5.4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
Future kernel API for perfcounter management will likely be required for
a8xx and onwards. For a7xx and earlier, cmdstream-based selector and
counter register management is still supported.
Cc: mesa-stable
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40949>
With the pass order shuffling, code like `(x & 0xf) + (x & 0xfffffff0)` gets
optimized to bitfield_select(0xF, x, x). But it would be much better to optimize
simply to x. nir_opt_algebraic would do that for us but we run this pass too
late for algebraic to save us from ourselves, so be smarter.
Observed on dEQP-GLES31.functional.compute.basic.image_atomic_op_local_size_8
with Jay, this saves an instruction there.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40956>
If ac_drm_device_initialize returns -EACCES for the fd passed in.
A render node file description can't have DRM master status, which means
AMDGPU_CTX_PRIORITY_HIGH can't work without CAP_SYS_NICE (which
generally only the root user has).
Fixes: 8f30e90fc1 ("winsys/amdgpu: Prefer render node FD for ac_drm_device_initialize")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40974>
When we unsychronized map a BO, we tell Valgrind that the content is not
initialized yet.
But we forgot to mark it as defined when the map finishes, which leads to
several conditional jump or move depends on uninitialised value(s)
warnings when using Valgrind.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40964>
Applications are required to set NonUniform if the resource is arrayed,
but with VK_DESCRIPTOR_MAPPING_SOURCE_HEAP_WITH_SHADER_RECORD_INDEX_EXT,
the resource is non-arrayed in the shader. So, it's technically not
required to set it. Although, the offset can vary per-lane and
NonUniform is implicit.
Backport-to: 26.1
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40953>
"size" is the allocated size of the array, not the number of immediates
actually used. We could wind up returning a too-large constlen, larger
than 512, and since the binning variant uses the non-binning variant's
constlen as it's max_const we could make binning variants use c512.x and
crash when encoding.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40961>
We need to know the immediate count even after lowering, to compute the
overall const size. Previously we were using the capacity field, but
that's unreliable and won't be available once we switch to a real
dynamic array container instead of (poorly) reinventing one.
Fixes: 86f3c0c4c2 ("ir3: simplify constlen calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40961>
For simple copy, we can copy data with uvec4(16bytes) at once.
When we have serialize/deserialize copy mode, we want to copy out the
instance leave address which are 8byte wide, so we need to jump with
8byte stride instead of 16bytes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40966>
GLES2/3 doesn't expose alpha test, user clip planes, or two-sided color.
These lowers therefore shouldn't disable the one-variant fast path for GLES2/3 contexts.
Keep the lowering itself unchanged and only relax the shader_has_one_variant checks.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40905>
Replace CSO save/restore of atom-backed states with
st_context_invalidate_state() for both PBO upload and download
paths. Only stream outputs, render condition, and pause queries
(no atoms) still use CSO save/restore.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40592>
Replace CSO save/restore of atom-backed states with
st_context_invalidate_state(). Only stream outputs, render
condition, and pause queries (no atoms) still use CSO save/restore.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40592>
Replace CSO save/restore of atom-backed states with
st_context_invalidate_state(). The conditional DSA/blend
invalidation for stencil writes is preserved. Only stream
outputs (no atom) still use CSO save/restore.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40592>
Replace CSO save/restore of atom-backed states with
st_context_invalidate_state(). Only stream outputs and pause
queries (no atoms) still use CSO save/restore.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40592>
Replace CSO save/restore of atom-backed states with
st_context_invalidate_state(). Only stream outputs (no atom)
still use CSO save/restore.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40592>