The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
Along with fixing the grammar, this allows it to be deduplicated since
the properly worded description exists in later generations' XMLs.
Cuts 96 B from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
917613 0 0 917613 e006d meson-generated_.._intel_perf_metrics.c.o (before)
917511 0 0 917511 e0007 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14131044 365708 210004 14706756 e06844 iris_dri.so (before)
14130948 365708 210004 14706660 e067e4 iris_dri.so (after)
text data bss dec hex filename
8124321 214264 22820 8361405 7f95bd libvulkan_intel.so (before)
8124225 214264 22820 8361309 7f955d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
The compiler does a good job of deduplicating strings already, but we
can eliminate the pointers to each string by combining the strings into
a single char array and storing only an index into that array.
The longest of the char arrays is the descriptions array, which is a
little over 45 KiB, so still under MSVC's 64 KiB string literal limit
[0]. Because the string length is under 64 KiB we can use uint16_t as
the index type, which roughly doubles our savings as compared to an int.
This cuts 77 KiB from iris_dri.so (0.5%) and libvulkan_intel.so (0.9%).
text data bss dec hex filename
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (before)
924401 0 0 924401 e1af1 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 391628 210004 14792484 e1b724 iris_dri.so (before)
14137732 365708 210004 14713444 e08264 iris_dri.so (after)
text data bss dec hex filename
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (before)
8131009 214264 22820 8368093 7fafdd libvulkan_intel.so (after)
relinfo:
iris_dri.so (before): 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
[0] https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170&viewFallbackFrom=vs-2019
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
intel_perf_query_counter contains fields for things we can't or don't
want to store in our static data (like runtime-determined max values) or
oa_read_counter function pointers which are dependent on the GPU gen and
would make deduplication very ineffective.
Cuts 16 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (before)
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 408908 210004 14809764 e1faa4 iris_dri.so (before)
14190852 391628 210004 14792484 e1b724 iris_dri.so (after)
text data bss dec hex filename
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (before)
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
And specifically mark it with ATTRIBUTE_NOINLINE. Otherwise it will be
inlined and actually slightly increase code size.
Cuts 505 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
1538720 0 0 1538720 177aa0 meson-generated_.._intel_perf_metrics.c.o (before)
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14751700 365708 210004 15327412 e9e0b4 iris_dri.so (before)
14190852 408908 210004 14809764 e1faa4 iris_dri.so (after)
text data bss dec hex filename
8744913 214264 22820 8981997 890ded libvulkan_intel.so (before)
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (after)
Relocations increase because the counter initializations are moved from
code (in .text) to pointers (in .text) to .rodata, which require
relocations.
relinfo:
iris_dri.so (before): 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
No changes in resulting code (yes, seriously!). GCC constant propagates
the static const arrays into the code, yielding bit for bit identical
results. This will however enable further cleanups.
Before this patch, we emit 11916 different initializations of
intel_perf_query_counter. With this patch we emit an array of 539 and
initialize the intel_perf_query_counters in terms of those.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
This cuts the compile time down for this file on my ryzen from
real 1m4.077s
to
real 0m30.827s
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14630>
On newer HW it will require more work than just reading a dword. It
could also vary depending on the report format.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13831>
INTEL_DEBUG is defined (since 4015e1876a) as:
#define INTEL_DEBUG __builtin_expect(intel_debug, 0)
which unfortunately chops off upper 32 bits from intel_debug
on platforms where sizeof(long) != sizeof(uint64_t) because
__builtin_expect is defined only for the long type.
Fix this by changing the definition of INTEL_DEBUG to be function-like
macro with "flags" argument. New definition returns 0 or 1 when
any of the flags match.
Most of the changes in this commit were generated using:
for c in `git grep INTEL_DEBUG | grep "&" | grep -v i915 | awk -F: '{print $1}' | sort | uniq`; do
perl -pi -e "s/INTEL_DEBUG & ([A-Z0-9a-z_]+)/INTEL_DBG(\1)/" $c
perl -pi -e "s/INTEL_DEBUG & (\([A-Z0-9_ |]+\))/INTEL_DBG\1/" $c
done
but it didn't handle all cases and required minor cleanups (like removal
of round brackets which were not needed anymore).
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13334>
drm_i915_query_perf_config::data is an unsized array and declaring a
struct containing an unsized array that isn't at the end is a GNU
extension which trips up Android builds. Instead, stuff both into a
char array of the appropriate size. This emulates what you'd normally
do to allocate one of these with malloc only on the stack.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12308>
We're currently using the 32bit version which is dropping half the
bits of the 64bits values.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11607>
There is an effort to drop the "Gen" prefix from much of our codebase.
This just applies this to the metrics.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
Adding a register, similar to what was done for RenderBasic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
The availability of those counters depends on the topology.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
A bunch of performance counters rely on register snapshots on top of
the OA reports. Those are already conditional to the query mode in the
equations :
availability="true $QueryMode &&"
This change allows to disable counters that are only available with
additional register snapshots. This will be useful if you only want to
OA reports to extract performance counter values.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10216>
Allow libintel_perf to be included as a dependency from a C++ project by
wrapping some declaration within an extern "C" block, and then add a
function to allow direct reading of the OA stream.
v2: Don't expose internal helpers (Lionel)
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10216>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
This patch renames functions, structures, enums etc. with "gen_"
prefix defined in common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
Changes in this patch include:
- Rename all files in src/intel/common path
- Update the filenames used in source and build files
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
This makes it possible to use a separate ralloc context, not gl context
itself which might not be allocated with ralloc.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8805>