Add optional OA performance counter collection around each execute()
call. Examples:
```
# List all profiles and counters, with descriptions.
$ executor --oa list
# Collect all counters from a profile.
$ executor --oa ComputeBasic file.lua
# Collect a subset of counters from a profile, separated by comma.
$ executor --oa ComputeBasic:GpuTime,AvgGpuCoreFrequency file.lua
# By default use ComputeBasic profile, so counter names only also work.
$ executor --oa GpuTime file.lua
```
The selected counters are printed to stdout after the script finishes,
or written to a file specified by --oa-csv FILENAME.
Assisted-by: Pi coding agent (GPT-5.5)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41610>
Add a tool that programs the hardware the minimum amount to be
able to execute compute shaders and then executes a script that
can perform data manipulation and dispatch execution of the shaders
(written in Xe assembly).
The goal is to have a tool to experiment directly with certain
assembly instructions and the shared units without having to
instrument the drivers.
To make more convenient to write assembly, a few macros (indicated by
the @-symbol) will be processed into the full instruction.
For example, the script
```
local r = execute {
data={ [42] = 0x100 },
src=[[
@mov g1 42
@read g2 g1
@id g3
add(8) g4<1>UD g2<8,8,1>UD g3<8,8,1>UD { align1 @1 1Q };
@write g3 g4
@eot
]]
}
dump(r, 4)
```
produces
```
[0x00000000] 0x00000100 0x00000101 0x00000102 0x00000103
```
There's a help message inside the code that describes the script
environment and the macros for assembly sources.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>