mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-06-10 09:58:18 +02:00
docs/nvk: Add some notes about mesh shading and ISBE layout
Signed-off-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27196>
This commit is contained in:
parent
145b8540e5
commit
8ec56cc0cc
2 changed files with 225 additions and 0 deletions
66
docs/drivers/nvk/isbe_layout.rst
Normal file
66
docs/drivers/nvk/isbe_layout.rst
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
ISBE Layout
|
||||
=============================================
|
||||
|
||||
The ISBE is a piece of staging memory shared between the :abbr:`PE (Primitive Engine)`
|
||||
and :abbr:`SM (Streaming Multiprocessor)` containing the attributes for
|
||||
:abbr:`VTG (Vertex, Tesselation, Geometry)` stages.
|
||||
|
||||
In normal cases, no shader needs to access it and instead use ``ALD`` or ``AST``
|
||||
but in some some cases (like ``ALD`` vtx handling or mesh shaders) direct access to
|
||||
it is necessary.
|
||||
|
||||
This document the observations made about the layout of it.
|
||||
|
||||
The ISBE "space" is separated in multiple regions (map, patch, primitive and attribute)
|
||||
and can be accessed with ``ISBERD`` and ``ISBEWR`` instructions.
|
||||
|
||||
There is always two ISBE "space" present: one for inputs and one for outputs.
|
||||
Input and output ISBE memory can be shared by setting bit 25 of the
|
||||
:abbr:`SPH (Shader Program Header)`.
|
||||
|
||||
**NOTE**: Before Turing, only ``ISBERD`` was supported with the ``.MAP`` flag.
|
||||
|
||||
Map region
|
||||
""""""""""
|
||||
|
||||
The Map region contains all the vertex indices for all primitives.
|
||||
|
||||
When used as output, it starts with the primitives count.
|
||||
|
||||
==================================================== ==================== ===================================================
|
||||
Byte Range Name Note
|
||||
==================================================== ==================== ===================================================
|
||||
0x0 primitive_count 32-bit, Only present as output ISBE
|
||||
0x0?..(0x0? + primitive_count * $vtx_per_prim_count) primitive_indices[i] 8-bit, Offset is 0x04 as output ISBE otherwise 0x00
|
||||
==================================================== ==================== ===================================================
|
||||
|
||||
Attribute region
|
||||
""""""""""""""""
|
||||
|
||||
The Attribute region contains all attributes and is allocated based on the
|
||||
:abbr:`SPH (Shader Program Header)` input/output mask and :doc:`mesh shader methods <mesh_shading_notes>`.
|
||||
|
||||
With the ``SKEW`` flag applied on ``ISBERD`` and ``ISBEWR``, each type of attribute is packed with
|
||||
32 values of the same type forming "memory lines" of 128 bytes each.
|
||||
|
||||
Additionally, the order of the attribute packs is determined by the unique attribute id.
|
||||
(see `nak_attr` or :abbr:`SPH (Shader Program Header)` headers definitions for the values)
|
||||
|
||||
Finally, if more than 32 values are needed, the layout repeat itself.
|
||||
|
||||
Here is an example for 256 vertices being defined with ``ATTR_POINT_SIZE`` (0x6C) and
|
||||
``ATTR_POSITION_X`` (0x70) active:
|
||||
|
||||
==================================================== ========================
|
||||
Byte Range Name
|
||||
==================================================== ========================
|
||||
0x000..0x080 ATTR_POINT_SIZE[0..31]
|
||||
0x000..0x100 ATTR_POSITION_X[0..31]
|
||||
0x180..0x200 ATTR_POINT_SIZE[32..63]
|
||||
0x200..0x280 ATTR_POSITION_X[32..63]
|
||||
0x280..0x300 ATTR_POINT_SIZE[63..95]
|
||||
0x300..0x380 ATTR_POSITION_X[63..95]
|
||||
... ...
|
||||
0x700..0x780 ATTR_POINT_SIZE[224..255]
|
||||
0x780..0x800 ATTR_POSITION_X[224..255]
|
||||
==================================================== ========================
|
||||
159
docs/drivers/nvk/mesh_shading_notes.rst
Normal file
159
docs/drivers/nvk/mesh_shading_notes.rst
Normal file
|
|
@ -0,0 +1,159 @@
|
|||
Mesh Shading Notes
|
||||
=============================================
|
||||
|
||||
Mesh shaders support is present on Turing and later.
|
||||
It reuses the 3D engine and regular graphics stages.
|
||||
|
||||
Draw
|
||||
""""
|
||||
|
||||
When performing a draw with a mesh shader bound, ``SET_DRAW_CONTROL_A`` needs to
|
||||
be used while ignoring the base indices.
|
||||
|
||||
This can be done with the following commands:
|
||||
.. code-block::
|
||||
|
||||
NVC597_SET_VERTEX_ID_BASE
|
||||
.V = (0x0)
|
||||
NVC597_SET_DRAW_CONTROL_A
|
||||
.TOPOLOGY = POINTS
|
||||
.PRIMITIVE_ID = FIRST
|
||||
.INSTANCE_ID = FIRST
|
||||
.SPLIT_MODE = NORMAL_BEGIN_NORMAL_END
|
||||
.INSTANCE_ITERATE_ENABLE = FALSE
|
||||
.IGNORE_GLOBAL_BASE_VERTEX_INDEX = TRUE
|
||||
.IGNORE_GLOBAL_BASE_INSTANCE_INDEX = TRUE
|
||||
NVC597_DRAW_VERTEX_ARRAY_BEGIN_END_A
|
||||
.START = (0x0)
|
||||
NVC597_DRAW_VERTEX_ARRAY_BEGIN_END_B
|
||||
.COUNT = ($groupCount)
|
||||
|
||||
|
||||
Where:
|
||||
|
||||
- ``$groupCount`` is the number of local workgroups to dispatch.
|
||||
|
||||
**NOTE**: The topology of ``SET_DRAW_CONTROL_A`` will be ignored and
|
||||
``SET_MESH_SHADER_A`` topology will be used.
|
||||
|
||||
Stages
|
||||
""""""
|
||||
|
||||
Before binding a task or mesh shader, ``SET_MESH_CONTROL`` needs to be set.
|
||||
|
||||
Shared Memory
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
Shared memory is allocated in the output :doc:`ISBE Attribute region <isbe_layout>` after all attributes.
|
||||
|
||||
Part of the shared memory can be made accessible to the next stage
|
||||
as part of the input :doc:`ISBE Attribute region <isbe_layout>` using ``.OUTPUT_TO_M_S_LINES``.
|
||||
|
||||
As such, the task payload is the first part of the shared memory on the task stage.
|
||||
|
||||
|
||||
Task
|
||||
^^^^
|
||||
|
||||
The task shader is always bound as a vertex shader.
|
||||
|
||||
Additionally, ``SET_MESH_INIT_SHADER`` is used to set the number of local invocations
|
||||
to use and the size used for shared memory.
|
||||
|
||||
This can be done with the following command:
|
||||
::
|
||||
|
||||
NVC597_SET_MESH_INIT_SHADER
|
||||
.THREAD_COUNT = ($thread_count)
|
||||
.LOCAL_BUFFER_LINES = ($local_buffer_lines)
|
||||
.OUTPUT_TO_M_S_LINES = ($output_to_m_s_lines)
|
||||
|
||||
Where:
|
||||
|
||||
- ``$thread_count`` is the number of local invocations (up to 32).
|
||||
- ``$local_buffer_lines`` is the total amount of shared memory lines (including task payload)
|
||||
to allocate in the output :doc:`ISBE Attribute region <isbe_layout>` after all attributes.
|
||||
- ``$output_to_m_s_lines`` is the total amount of shared memory lines that will be available
|
||||
to the next stage.
|
||||
|
||||
**NOTE**: The size of a memory line is 128 bytes.
|
||||
|
||||
The workgroup_index is implemented using the ``VERTEX_ID`` read from the input :doc:`ISBE Attribute region <isbe_layout>` (with SKEW applied).
|
||||
|
||||
``EmitMeshTasksEXT`` is lowered to the equivalent pseudo code:
|
||||
::
|
||||
|
||||
void EmitMeshTasksEXT(uint x, uint y, uint z) {
|
||||
uint taskCount = x * y * z;
|
||||
|
||||
ISBEWR.O.ATTR.32 [0x04], taskCount;
|
||||
ISBEWR.O.ATTR.32 [0x08], x;
|
||||
ISBEWR.O.ATTR.32 [0x0C], y;
|
||||
ISBEWR.O.ATTR.32 [0x10], z;
|
||||
}
|
||||
|
||||
Mesh
|
||||
^^^^
|
||||
|
||||
The mesh shader is bound to the vertex stage if no task shader is present
|
||||
or the tess control stage otherwise.
|
||||
|
||||
Attributes are stored in the output :doc:`ISBE Attribute region <isbe_layout>` **with SKEW applied**.
|
||||
|
||||
If any per primitive attributes are in use, they are stored after all per vertex
|
||||
attributes and the geometry stage will be enabled in passthrough mode with only
|
||||
its program header present.
|
||||
|
||||
For more details about the ISBE Attribute or Map layout, see the dedicated :doc:`ISBE <isbe_layout>` page.
|
||||
|
||||
Additionally, ``SET_MESH_SHADER_A`` and ``SET_MESH_SHADER_B`` are used
|
||||
to set the number of local invocations to use, the size used for shared memory
|
||||
and topology details.
|
||||
|
||||
This can be done with the following command:
|
||||
::
|
||||
|
||||
NVC597_SET_MESH_SHADER_A
|
||||
.OUTPUT_TOPOLOGY = $topology
|
||||
.MAX_VERTEX = ($max_vertex)
|
||||
.MAX_PRIMITIVE = ($max_primitive)
|
||||
NVC597_SET_MESH_SHADER_B
|
||||
.SHARED_MEM_LINES = ($shared_mem_lines)
|
||||
.THREAD_COUNT = ($thread_count)
|
||||
|
||||
Where:
|
||||
|
||||
- ``$topology`` is the topology to use.
|
||||
- ``$max_vertex`` is the max count of vertices used by the mesh shader.
|
||||
- ``$max_primitive`` is the max count primitives used by the mesh shader.
|
||||
- ``$shared_mem_lines`` is the total amount of shared memory line to allocate
|
||||
in the output :doc:`ISBE Attribute region <isbe_layout>` after all attributes.
|
||||
- ``$thread_count`` is the number of local invocations (up to 32).
|
||||
|
||||
**NOTE**: The size of a memory line is 128 bytes.
|
||||
|
||||
The workgroup_index is implemented in the following way:
|
||||
|
||||
- If a task shader is present, the value is read from the input :doc:`ISBE Attribute region <isbe_layout>` **without SKEW applied**.
|
||||
- Otherwise, it is implemented using the ``VERTEX_ID`` read from the input :doc:`ISBE Attribute region <isbe_layout>` **with SKEW applied**.
|
||||
|
||||
``SetMeshOutputsEXT`` is lowered to the equivalent pseudo code:
|
||||
::
|
||||
|
||||
void SetMeshOutputsEXT(uint vertexCount, uint primitiveCount) {
|
||||
// vertexCount is unused
|
||||
ISBEWR.O.MAP.32 [0x3], primitiveCount;
|
||||
}
|
||||
|
||||
**NOTE**: This is effectively a write to offset 0x0 but the output :doc:`ISBE Map region <isbe_layout>` process writes in reverse.
|
||||
|
||||
All primitive indices are stored as 8-bit indices starting at offset 0x4 in the ouptut :doc:`ISBE Map region <isbe_layout>`.
|
||||
|
||||
|
||||
Hardware limitations
|
||||
""""""""""""""""""""
|
||||
|
||||
* Only up to 32 local invocations are supported.
|
||||
* The shared memory being part of the :doc:`ISBE Attribute region <isbe_layout>` makes it that
|
||||
we do not have any atomics for it.
|
||||
* Task / mesh invocations need to be counted inside the shader.
|
||||
Loading…
Add table
Reference in a new issue