docs/nvk: Add some notes about mesh shading and ISBE layout
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run

Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
This commit is contained in:
Mary Guillemard 2026-03-11 16:24:07 +01:00 committed by Marge Bot
parent 145b8540e5
commit 8ec56cc0cc
2 changed files with 225 additions and 0 deletions

View file

@ -0,0 +1,66 @@
ISBE Layout
=============================================
The ISBE is a piece of staging memory shared between the :abbr:`PE (Primitive Engine)`
and :abbr:`SM (Streaming Multiprocessor)` containing the attributes for
:abbr:`VTG (Vertex, Tesselation, Geometry)` stages.
In normal cases, no shader needs to access it and instead use ``ALD`` or ``AST``
but in some some cases (like ``ALD`` vtx handling or mesh shaders) direct access to
it is necessary.
This document the observations made about the layout of it.
The ISBE "space" is separated in multiple regions (map, patch, primitive and attribute)
and can be accessed with ``ISBERD`` and ``ISBEWR`` instructions.
There is always two ISBE "space" present: one for inputs and one for outputs.
Input and output ISBE memory can be shared by setting bit 25 of the
:abbr:`SPH (Shader Program Header)`.
**NOTE**: Before Turing, only ``ISBERD`` was supported with the ``.MAP`` flag.
Map region
""""""""""
The Map region contains all the vertex indices for all primitives.
When used as output, it starts with the primitives count.
==================================================== ==================== ===================================================
Byte Range Name Note
==================================================== ==================== ===================================================
0x0 primitive_count 32-bit, Only present as output ISBE
0x0?..(0x0? + primitive_count * $vtx_per_prim_count) primitive_indices[i] 8-bit, Offset is 0x04 as output ISBE otherwise 0x00
==================================================== ==================== ===================================================
Attribute region
""""""""""""""""
The Attribute region contains all attributes and is allocated based on the
:abbr:`SPH (Shader Program Header)` input/output mask and :doc:`mesh shader methods <mesh_shading_notes>`.
With the ``SKEW`` flag applied on ``ISBERD`` and ``ISBEWR``, each type of attribute is packed with
32 values of the same type forming "memory lines" of 128 bytes each.
Additionally, the order of the attribute packs is determined by the unique attribute id.
(see `nak_attr` or :abbr:`SPH (Shader Program Header)` headers definitions for the values)
Finally, if more than 32 values are needed, the layout repeat itself.
Here is an example for 256 vertices being defined with ``ATTR_POINT_SIZE`` (0x6C) and
``ATTR_POSITION_X`` (0x70) active:
==================================================== ========================
Byte Range Name
==================================================== ========================
0x000..0x080 ATTR_POINT_SIZE[0..31]
0x000..0x100 ATTR_POSITION_X[0..31]
0x180..0x200 ATTR_POINT_SIZE[32..63]
0x200..0x280 ATTR_POSITION_X[32..63]
0x280..0x300 ATTR_POINT_SIZE[63..95]
0x300..0x380 ATTR_POSITION_X[63..95]
... ...
0x700..0x780 ATTR_POINT_SIZE[224..255]
0x780..0x800 ATTR_POSITION_X[224..255]
==================================================== ========================

View file

@ -0,0 +1,159 @@
Mesh Shading Notes
=============================================
Mesh shaders support is present on Turing and later.
It reuses the 3D engine and regular graphics stages.
Draw
""""
When performing a draw with a mesh shader bound, ``SET_DRAW_CONTROL_A`` needs to
be used while ignoring the base indices.
This can be done with the following commands:
.. code-block::
NVC597_SET_VERTEX_ID_BASE
.V = (0x0)
NVC597_SET_DRAW_CONTROL_A
.TOPOLOGY = POINTS
.PRIMITIVE_ID = FIRST
.INSTANCE_ID = FIRST
.SPLIT_MODE = NORMAL_BEGIN_NORMAL_END
.INSTANCE_ITERATE_ENABLE = FALSE
.IGNORE_GLOBAL_BASE_VERTEX_INDEX = TRUE
.IGNORE_GLOBAL_BASE_INSTANCE_INDEX = TRUE
NVC597_DRAW_VERTEX_ARRAY_BEGIN_END_A
.START = (0x0)
NVC597_DRAW_VERTEX_ARRAY_BEGIN_END_B
.COUNT = ($groupCount)
Where:
- ``$groupCount`` is the number of local workgroups to dispatch.
**NOTE**: The topology of ``SET_DRAW_CONTROL_A`` will be ignored and
``SET_MESH_SHADER_A`` topology will be used.
Stages
""""""
Before binding a task or mesh shader, ``SET_MESH_CONTROL`` needs to be set.
Shared Memory
^^^^^^^^^^^^^
Shared memory is allocated in the output :doc:`ISBE Attribute region <isbe_layout>` after all attributes.
Part of the shared memory can be made accessible to the next stage
as part of the input :doc:`ISBE Attribute region <isbe_layout>` using ``.OUTPUT_TO_M_S_LINES``.
As such, the task payload is the first part of the shared memory on the task stage.
Task
^^^^
The task shader is always bound as a vertex shader.
Additionally, ``SET_MESH_INIT_SHADER`` is used to set the number of local invocations
to use and the size used for shared memory.
This can be done with the following command:
::
NVC597_SET_MESH_INIT_SHADER
.THREAD_COUNT = ($thread_count)
.LOCAL_BUFFER_LINES = ($local_buffer_lines)
.OUTPUT_TO_M_S_LINES = ($output_to_m_s_lines)
Where:
- ``$thread_count`` is the number of local invocations (up to 32).
- ``$local_buffer_lines`` is the total amount of shared memory lines (including task payload)
to allocate in the output :doc:`ISBE Attribute region <isbe_layout>` after all attributes.
- ``$output_to_m_s_lines`` is the total amount of shared memory lines that will be available
to the next stage.
**NOTE**: The size of a memory line is 128 bytes.
The workgroup_index is implemented using the ``VERTEX_ID`` read from the input :doc:`ISBE Attribute region <isbe_layout>` (with SKEW applied).
``EmitMeshTasksEXT`` is lowered to the equivalent pseudo code:
::
void EmitMeshTasksEXT(uint x, uint y, uint z) {
uint taskCount = x * y * z;
ISBEWR.O.ATTR.32 [0x04], taskCount;
ISBEWR.O.ATTR.32 [0x08], x;
ISBEWR.O.ATTR.32 [0x0C], y;
ISBEWR.O.ATTR.32 [0x10], z;
}
Mesh
^^^^
The mesh shader is bound to the vertex stage if no task shader is present
or the tess control stage otherwise.
Attributes are stored in the output :doc:`ISBE Attribute region <isbe_layout>` **with SKEW applied**.
If any per primitive attributes are in use, they are stored after all per vertex
attributes and the geometry stage will be enabled in passthrough mode with only
its program header present.
For more details about the ISBE Attribute or Map layout, see the dedicated :doc:`ISBE <isbe_layout>` page.
Additionally, ``SET_MESH_SHADER_A`` and ``SET_MESH_SHADER_B`` are used
to set the number of local invocations to use, the size used for shared memory
and topology details.
This can be done with the following command:
::
NVC597_SET_MESH_SHADER_A
.OUTPUT_TOPOLOGY = $topology
.MAX_VERTEX = ($max_vertex)
.MAX_PRIMITIVE = ($max_primitive)
NVC597_SET_MESH_SHADER_B
.SHARED_MEM_LINES = ($shared_mem_lines)
.THREAD_COUNT = ($thread_count)
Where:
- ``$topology`` is the topology to use.
- ``$max_vertex`` is the max count of vertices used by the mesh shader.
- ``$max_primitive`` is the max count primitives used by the mesh shader.
- ``$shared_mem_lines`` is the total amount of shared memory line to allocate
in the output :doc:`ISBE Attribute region <isbe_layout>` after all attributes.
- ``$thread_count`` is the number of local invocations (up to 32).
**NOTE**: The size of a memory line is 128 bytes.
The workgroup_index is implemented in the following way:
- If a task shader is present, the value is read from the input :doc:`ISBE Attribute region <isbe_layout>` **without SKEW applied**.
- Otherwise, it is implemented using the ``VERTEX_ID`` read from the input :doc:`ISBE Attribute region <isbe_layout>` **with SKEW applied**.
``SetMeshOutputsEXT`` is lowered to the equivalent pseudo code:
::
void SetMeshOutputsEXT(uint vertexCount, uint primitiveCount) {
// vertexCount is unused
ISBEWR.O.MAP.32 [0x3], primitiveCount;
}
**NOTE**: This is effectively a write to offset 0x0 but the output :doc:`ISBE Map region <isbe_layout>` process writes in reverse.
All primitive indices are stored as 8-bit indices starting at offset 0x4 in the ouptut :doc:`ISBE Map region <isbe_layout>`.
Hardware limitations
""""""""""""""""""""
* Only up to 32 local invocations are supported.
* The shared memory being part of the :doc:`ISBE Attribute region <isbe_layout>` makes it that
we do not have any atomics for it.
* Task / mesh invocations need to be counted inside the shader.