mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-05-05 20:28:04 +02:00
docs: Document RADV/ACO function calls
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29580>
This commit is contained in:
parent
27c0326391
commit
902c7e9df8
1 changed files with 173 additions and 0 deletions
173
docs/drivers/amd/function-calls.rst
Normal file
173
docs/drivers/amd/function-calls.rst
Normal file
|
|
@ -0,0 +1,173 @@
|
|||
:orphan:
|
||||
|
||||
.. _aco-fn-calls:
|
||||
|
||||
Function call support in RADV/ACO
|
||||
=================================
|
||||
|
||||
ACO supports function calls inside shaders - given a function signature and ABI, shaders can call
|
||||
an arbitrary function, even via only a function pointer (i.e. with an unknown function definition).
|
||||
|
||||
This function call support is useful for implementing ray tracing pipelines (by representing individual RT shaders
|
||||
as callable functions), but it also has potential use cases in GPGPU/Compute workloads.
|
||||
|
||||
This page serves to document the concepts involved in implementing function calls as well as an overview of the
|
||||
implementation components.
|
||||
|
||||
Function call representation
|
||||
----------------------------
|
||||
|
||||
In NIR, function calls are represented by a `nir_call_instr`. The instruction takes a `nir_function` representing
|
||||
the function being called, as well as SSA defs for each call parameter.
|
||||
NIR can also represent "indirect calls", i.e. calls where the function being called is
|
||||
unknown - instead, the instruction takes an SSA def containing a function pointer to the callee. In this case, the
|
||||
`nir_function` only serves to provide information about the function signature, i.e. how many and which parameters
|
||||
the function takes.
|
||||
|
||||
Call instructions do not have return values - instead, return values are represented by so-called "return parameters".
|
||||
Instead of an SSA value, these parameters are derefs, and the return value is written into the deref when the callee
|
||||
returns. Return parameters can double as input parameters, too - the callee can read the previous value of the deref
|
||||
before (potentially) overwriting it with a new value.
|
||||
|
||||
ACO's representation of function calls follows this very closely. Calls are described by the `p_call` pseudo-instruction.
|
||||
The operands to this instruction are a function pointer (i.e. the address of the callee), followed by the call
|
||||
parameters. Return parameters are handled differently, though: While the initial value of the return parameter is passed
|
||||
as an operand, the call instruction produces new definitions that refer to the SSA values of the return parameters after
|
||||
the function call returns. There is a special NIR intrinsic ``load_return_param_amd`` that can be used to access these
|
||||
new definitions when lowering return parameter derefs to SSA form.
|
||||
|
||||
.. _div-calls:
|
||||
|
||||
Divergent calls
|
||||
---------------
|
||||
|
||||
On CPUs, a call instruction will only ever jump to a single address. However, GPUs are SIMT, and the value of a function
|
||||
pointer may be divergent, i.e. different threads try calling different functions within the same call instruction. AMD
|
||||
hardware executes one instruction for all threads in lockstep, so the multiple callees have to be executed one after
|
||||
the other.
|
||||
|
||||
This is handled by RADV in ``radv_nir_lower_call_abi``. In addition to the (non-divergent) function pointer to jump to,
|
||||
``radv_nir_lower_call_abi`` prepends another parameter representing the (potentially divergent) function pointer for all
|
||||
lanes. For callable functions, ``radv_nir_lower_call_abi`` wraps the function body in a condition that verifies that the
|
||||
current thread's (divergent) pointer matches the (non-divergent) pointer that is currently being executed. This serves
|
||||
to "mask off" all threads that wanted to jump to a different function than what is currently executing. At the very end,
|
||||
``radv_nir_lower_call_abi`` inserts some code deciding whether to jump to the next callee or to return.
|
||||
|
||||
.. _stack:
|
||||
|
||||
Stack
|
||||
-----
|
||||
|
||||
Supporting arbitrary function calls also means supporting recursion, and recursive functions need a stack.
|
||||
AMD hardware provides instructions for accessing a per-thread scratch memory area in VRAM, and ACO uses this per-thread
|
||||
scratch memory to set up its stack.
|
||||
|
||||
The stack frame for a function consists of all scratch memory allocated for this function in NIR, as well as space to
|
||||
spill VGPRs if that is required. ACO adds a stack pointer as a parameter to every function - this stack pointer is added
|
||||
to the offset inside the scratch space for all scratch loads/stores to make sure they don't overwrite stack frames of
|
||||
caller functions.
|
||||
|
||||
ACO's call instructions take two stack-related operands: The current (caller) stack pointer and the caller's stack size.
|
||||
When converting the call instruction to hardware instructions, ACO will add the caller stack size to the stack pointer
|
||||
for the duration of the call (and subtract it again afterwards). This allows us to re-use the same stack pointer after
|
||||
the call.
|
||||
|
||||
Implicit/System Parameters
|
||||
--------------------------
|
||||
|
||||
In addition to parameters defined by the function signature, both RADV and ACO will insert additional parameters while
|
||||
lowering calls. This is an overview of which lowering passes add which parameters.
|
||||
|
||||
Parameters added by ``radv_nir_lower_call_abi`` (see :ref:`Divergent calls <div-calls>`):
|
||||
- "Uniform"/Non-divergent callee pointer
|
||||
- Divergent function pointer
|
||||
|
||||
Parameters added by ACO: (see :ref:`Stack <stack>`)
|
||||
- Stack pointer (uniform)
|
||||
|
||||
ABI Definition
|
||||
--------------
|
||||
|
||||
The ABI (Application Binary Interface) defines specifics about the interaction between the function caller and the
|
||||
callee (e.g. assignment of registers to parameters or register preservation). In ACO, the primary purpose of the ABI is
|
||||
to define which register ranges are "preserved" (i.e. never overwritten by the callee) or "clobbered" (i.e. potentially
|
||||
overwritten by the callee).
|
||||
|
||||
The caller can use preserved register ranges to store temporaries that are live across a call, and the callee can use
|
||||
clobbered register ranges to store its own temporaries. If the callee wants to use registers from a preserved range,
|
||||
then it needs to back up the value contained in the preserved register beforehand, and restore it when it's done using
|
||||
the preserved register. Similarly, if there are not enough preserved registers for the caller to store all its
|
||||
temporaries, the caller will need to spill excess temporaries to the stack.
|
||||
|
||||
ACO has to cater to different needs when defining ABIs: On one side, ray tracing traversal shaders demand to free up
|
||||
the entire register file for the callee (Ray traversal is a really hot loop, so we don't want to spill anything at all).
|
||||
Besides some parameters like the invocation ID, these shaders should be able to overwrite almost anything. On the other
|
||||
side, RT traversal shaders should not be required to free up the register file when calling any-hit/intersection shaders
|
||||
as this would also cause spilling during traversal. GPGPU compute workloads could fall anywhere between these extremes,
|
||||
so a middle-ground solution is desirable for these.
|
||||
|
||||
ACO's way of defining an ABI divides the register file into "blocks" (``struct aco::ABI::RegisterBlock``). Each block
|
||||
consists of a fixed number of preserved and clobbered registers, and a boolean determining whether the preserved or
|
||||
clobbered registers come first in the block. Preserved and clobbered register ranges are defined by
|
||||
repeating these blocks for as long as there are unassigned registers.
|
||||
|
||||
Some examples of preserved/clobbered register ranges using this approach::
|
||||
|
||||
For all examples, there are 108 SGPRs and 128 VGPRs to assign.
|
||||
|
||||
RegisterBlock:
|
||||
clobbered_size: {16 sgpr, 16 vgpr}
|
||||
preserved_size: {16 sgpr, 16 vgpr}
|
||||
clobbered_first: false
|
||||
results in:
|
||||
v0-v15: preserved
|
||||
v16-v31: clobbered
|
||||
v32-v47: preserved
|
||||
v48-v63: clobbered
|
||||
v64-v79: preserved
|
||||
v80-v95: clobbered
|
||||
v96-v111: preserved
|
||||
v112-v127: clobbered
|
||||
|
||||
s0-s15: preserved
|
||||
s16-s31: clobbered
|
||||
s32-s47: preserved
|
||||
s48-s63: clobbered
|
||||
s64-s79: preserved
|
||||
s80-s95: clobbered
|
||||
s96-s108: preserved
|
||||
|
||||
RegisterBlock:
|
||||
clobbered_size: {128 sgpr, 256 vgpr}
|
||||
preserved_size: {80 sgpr, 80 vgpr}
|
||||
clobbered_first: false
|
||||
results in:
|
||||
v0-v79: preserved
|
||||
v80-v127: clobbered
|
||||
|
||||
s0-s79: preserved
|
||||
s80-s108: clobbered
|
||||
|
||||
An alternating preserved-clobbered-preserved pattern is useful for generic compute workloads, because the ratio of
|
||||
preserved to clobbered registers is roughly the same, no matter how many registers are used by the shaders.
|
||||
|
||||
The latter example where the lower part of the register file is preserved and only some registers high up in the
|
||||
register file are clobbered is suitable for any-hit/intersection shaders - traversal shader temporaries can live in the
|
||||
preserved part low in the register file.
|
||||
|
||||
This block assignment is optional - if no ``RegisterBlock`` is given, the ABI defines the entire register range as
|
||||
clobbered-by-default, although parameters that are not marked as clobbered via ``ACO_NIR_PARAM_ATTRIB_DISCARDABLE``
|
||||
will continue being preserved.
|
||||
|
||||
Parameter Register Assignment
|
||||
-----------------------------
|
||||
|
||||
If a ``RegisterBlock`` defines preserved and clobbered ranges, then parameters are assigned registers from either range
|
||||
depending on ``ACO_NIR_PARAM_ATTRIB_DISCARDABLE`` - if parameters are marked as clobbered with this attribute, then they
|
||||
are assigned a register in a clobbered range, otherwise they are assigned in a register in a preserved range. The order
|
||||
of the parameters in the register file is not necessarily the same order as in the function signature - they may get
|
||||
reordered if it's beneficial to fill gaps or for alignment.
|
||||
|
||||
If there is no ``RegisterBlock``, then registers will be assigned based on alignment only.
|
||||
|
||||
If there is no more space for a parameter in any of its corresponding register ranges, it will be moved to the stack.
|
||||
Loading…
Add table
Reference in a new issue