From 8ec56cc0cc8a0f81904687296f8fd8bee49786bb Mon Sep 17 00:00:00 2001 From: Mary Guillemard Date: Wed, 11 Mar 2026 16:24:07 +0100 Subject: [PATCH] docs/nvk: Add some notes about mesh shading and ISBE layout Signed-off-by: Mary Guillemard Reviewed-by: Mel Henning Part-of: --- docs/drivers/nvk/isbe_layout.rst | 66 ++++++++++ docs/drivers/nvk/mesh_shading_notes.rst | 159 ++++++++++++++++++++++++ 2 files changed, 225 insertions(+) create mode 100644 docs/drivers/nvk/isbe_layout.rst create mode 100644 docs/drivers/nvk/mesh_shading_notes.rst diff --git a/docs/drivers/nvk/isbe_layout.rst b/docs/drivers/nvk/isbe_layout.rst new file mode 100644 index 00000000000..a4c80497b8d --- /dev/null +++ b/docs/drivers/nvk/isbe_layout.rst @@ -0,0 +1,66 @@ +ISBE Layout +============================================= + +The ISBE is a piece of staging memory shared between the :abbr:`PE (Primitive Engine)` +and :abbr:`SM (Streaming Multiprocessor)` containing the attributes for +:abbr:`VTG (Vertex, Tesselation, Geometry)` stages. + +In normal cases, no shader needs to access it and instead use ``ALD`` or ``AST`` +but in some some cases (like ``ALD`` vtx handling or mesh shaders) direct access to +it is necessary. + +This document the observations made about the layout of it. + +The ISBE "space" is separated in multiple regions (map, patch, primitive and attribute) +and can be accessed with ``ISBERD`` and ``ISBEWR`` instructions. + +There is always two ISBE "space" present: one for inputs and one for outputs. +Input and output ISBE memory can be shared by setting bit 25 of the +:abbr:`SPH (Shader Program Header)`. + +**NOTE**: Before Turing, only ``ISBERD`` was supported with the ``.MAP`` flag. + +Map region +"""""""""" + +The Map region contains all the vertex indices for all primitives. + +When used as output, it starts with the primitives count. + +==================================================== ==================== =================================================== +Byte Range Name Note +==================================================== ==================== =================================================== +0x0 primitive_count 32-bit, Only present as output ISBE +0x0?..(0x0? + primitive_count * $vtx_per_prim_count) primitive_indices[i] 8-bit, Offset is 0x04 as output ISBE otherwise 0x00 +==================================================== ==================== =================================================== + +Attribute region +"""""""""""""""" + +The Attribute region contains all attributes and is allocated based on the +:abbr:`SPH (Shader Program Header)` input/output mask and :doc:`mesh shader methods `. + +With the ``SKEW`` flag applied on ``ISBERD`` and ``ISBEWR``, each type of attribute is packed with +32 values of the same type forming "memory lines" of 128 bytes each. + +Additionally, the order of the attribute packs is determined by the unique attribute id. +(see `nak_attr` or :abbr:`SPH (Shader Program Header)` headers definitions for the values) + +Finally, if more than 32 values are needed, the layout repeat itself. + +Here is an example for 256 vertices being defined with ``ATTR_POINT_SIZE`` (0x6C) and +``ATTR_POSITION_X`` (0x70) active: + +==================================================== ======================== +Byte Range Name +==================================================== ======================== +0x000..0x080 ATTR_POINT_SIZE[0..31] +0x000..0x100 ATTR_POSITION_X[0..31] +0x180..0x200 ATTR_POINT_SIZE[32..63] +0x200..0x280 ATTR_POSITION_X[32..63] +0x280..0x300 ATTR_POINT_SIZE[63..95] +0x300..0x380 ATTR_POSITION_X[63..95] +... ... +0x700..0x780 ATTR_POINT_SIZE[224..255] +0x780..0x800 ATTR_POSITION_X[224..255] +==================================================== ======================== diff --git a/docs/drivers/nvk/mesh_shading_notes.rst b/docs/drivers/nvk/mesh_shading_notes.rst new file mode 100644 index 00000000000..fbdcc5266b5 --- /dev/null +++ b/docs/drivers/nvk/mesh_shading_notes.rst @@ -0,0 +1,159 @@ +Mesh Shading Notes +============================================= + +Mesh shaders support is present on Turing and later. +It reuses the 3D engine and regular graphics stages. + +Draw +"""" + +When performing a draw with a mesh shader bound, ``SET_DRAW_CONTROL_A`` needs to +be used while ignoring the base indices. + +This can be done with the following commands: +.. code-block:: + + NVC597_SET_VERTEX_ID_BASE + .V = (0x0) + NVC597_SET_DRAW_CONTROL_A + .TOPOLOGY = POINTS + .PRIMITIVE_ID = FIRST + .INSTANCE_ID = FIRST + .SPLIT_MODE = NORMAL_BEGIN_NORMAL_END + .INSTANCE_ITERATE_ENABLE = FALSE + .IGNORE_GLOBAL_BASE_VERTEX_INDEX = TRUE + .IGNORE_GLOBAL_BASE_INSTANCE_INDEX = TRUE + NVC597_DRAW_VERTEX_ARRAY_BEGIN_END_A + .START = (0x0) + NVC597_DRAW_VERTEX_ARRAY_BEGIN_END_B + .COUNT = ($groupCount) + + +Where: + +- ``$groupCount`` is the number of local workgroups to dispatch. + +**NOTE**: The topology of ``SET_DRAW_CONTROL_A`` will be ignored and +``SET_MESH_SHADER_A`` topology will be used. + +Stages +"""""" + +Before binding a task or mesh shader, ``SET_MESH_CONTROL`` needs to be set. + +Shared Memory +^^^^^^^^^^^^^ + +Shared memory is allocated in the output :doc:`ISBE Attribute region ` after all attributes. + +Part of the shared memory can be made accessible to the next stage +as part of the input :doc:`ISBE Attribute region ` using ``.OUTPUT_TO_M_S_LINES``. + +As such, the task payload is the first part of the shared memory on the task stage. + + +Task +^^^^ + +The task shader is always bound as a vertex shader. + +Additionally, ``SET_MESH_INIT_SHADER`` is used to set the number of local invocations +to use and the size used for shared memory. + +This can be done with the following command: +:: + + NVC597_SET_MESH_INIT_SHADER + .THREAD_COUNT = ($thread_count) + .LOCAL_BUFFER_LINES = ($local_buffer_lines) + .OUTPUT_TO_M_S_LINES = ($output_to_m_s_lines) + +Where: + +- ``$thread_count`` is the number of local invocations (up to 32). +- ``$local_buffer_lines`` is the total amount of shared memory lines (including task payload) + to allocate in the output :doc:`ISBE Attribute region ` after all attributes. +- ``$output_to_m_s_lines`` is the total amount of shared memory lines that will be available + to the next stage. + +**NOTE**: The size of a memory line is 128 bytes. + +The workgroup_index is implemented using the ``VERTEX_ID`` read from the input :doc:`ISBE Attribute region ` (with SKEW applied). + +``EmitMeshTasksEXT`` is lowered to the equivalent pseudo code: +:: + + void EmitMeshTasksEXT(uint x, uint y, uint z) { + uint taskCount = x * y * z; + + ISBEWR.O.ATTR.32 [0x04], taskCount; + ISBEWR.O.ATTR.32 [0x08], x; + ISBEWR.O.ATTR.32 [0x0C], y; + ISBEWR.O.ATTR.32 [0x10], z; + } + +Mesh +^^^^ + +The mesh shader is bound to the vertex stage if no task shader is present +or the tess control stage otherwise. + +Attributes are stored in the output :doc:`ISBE Attribute region ` **with SKEW applied**. + +If any per primitive attributes are in use, they are stored after all per vertex +attributes and the geometry stage will be enabled in passthrough mode with only +its program header present. + +For more details about the ISBE Attribute or Map layout, see the dedicated :doc:`ISBE ` page. + +Additionally, ``SET_MESH_SHADER_A`` and ``SET_MESH_SHADER_B`` are used +to set the number of local invocations to use, the size used for shared memory +and topology details. + +This can be done with the following command: +:: + + NVC597_SET_MESH_SHADER_A + .OUTPUT_TOPOLOGY = $topology + .MAX_VERTEX = ($max_vertex) + .MAX_PRIMITIVE = ($max_primitive) + NVC597_SET_MESH_SHADER_B + .SHARED_MEM_LINES = ($shared_mem_lines) + .THREAD_COUNT = ($thread_count) + +Where: + +- ``$topology`` is the topology to use. +- ``$max_vertex`` is the max count of vertices used by the mesh shader. +- ``$max_primitive`` is the max count primitives used by the mesh shader. +- ``$shared_mem_lines`` is the total amount of shared memory line to allocate + in the output :doc:`ISBE Attribute region ` after all attributes. +- ``$thread_count`` is the number of local invocations (up to 32). + +**NOTE**: The size of a memory line is 128 bytes. + +The workgroup_index is implemented in the following way: + +- If a task shader is present, the value is read from the input :doc:`ISBE Attribute region ` **without SKEW applied**. +- Otherwise, it is implemented using the ``VERTEX_ID`` read from the input :doc:`ISBE Attribute region ` **with SKEW applied**. + +``SetMeshOutputsEXT`` is lowered to the equivalent pseudo code: +:: + + void SetMeshOutputsEXT(uint vertexCount, uint primitiveCount) { + // vertexCount is unused + ISBEWR.O.MAP.32 [0x3], primitiveCount; + } + +**NOTE**: This is effectively a write to offset 0x0 but the output :doc:`ISBE Map region ` process writes in reverse. + +All primitive indices are stored as 8-bit indices starting at offset 0x4 in the ouptut :doc:`ISBE Map region `. + + +Hardware limitations +"""""""""""""""""""" + +* Only up to 32 local invocations are supported. +* The shared memory being part of the :doc:`ISBE Attribute region ` makes it that + we do not have any atomics for it. +* Task / mesh invocations need to be counted inside the shader.