mesa/docs/drivers/nvk/external_hardware_docs.rst


External Hardware Documentation and Resources
=============================================

Information about hardware behavior comes from a mix of official and
reverse-engineered sources.

Command buffers
^^^^^^^^^^^^^^^

 * `NVIDIA open-gpu-doc repository`_ is official documentation from NVIDIA that
   has been released to the public. The majority of this documentation comes in
   the form of class headers which describe the class state registers.

 * `NVIDIA open-gpu-kernel-modules repository`_ is the open-source kernel mode
   driver that NVIDIA ships on Turing+ GPUs with GSP. The code here can provide
   examples of how to use some hardware features. If open-gpu-doc is missing a
   class header, sometimes there will be one here.

 * Reverse-engineered command names from `envytools`_ are available in mesa
   under eg. ``src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h``. These are no
   longer updated. nvk instead uses the open-gpu-doc headers

 * `envyhooks`_ is the modern way to dump command sequences from the proprietary
   driver

 * ``nv_push_dump`` is part of mesa and can disassemble command sequences (build
   with ``-D tools=nouveau``, run ``src/nouveau/headers/nv_push_dump`` from the
   build dir)

 .. _NVIDIA open-gpu-doc repository: https://github.com/NVIDIA/open-gpu-doc
 .. _NVIDIA open-gpu-kernel-modules repository: https://github.com/NVIDIA/open-gpu-kernel-modules
 .. _envyhooks: https://gitlab.freedesktop.org/nouveau/envyhooks

Shader ISA
^^^^^^^^^^

 * `NVIDIA PTX documentation`_ is NVIDIA documentation for CUDA's
   intermediate representation. We don't use PTX directly, but this often has
   hints about how underlying hardware instructions work. For example, the PTX
   `redux` instruction is pretty much identical to the hardware instruction of
   the same name.

 * `CUDA Binary Utilities`_ is documentation for CUDA's disassembler,
   `nvdisasm`. It includes a brief description of most hardware instructions.
   There's also an `older version`_ that has older architectures (Kepler through
   Volta).

 * Kuter Dinel has reverse-engineered instruction encodings for the `Hopper
   ISA`_ and `Ada ISA`_ which are autogenerated from his `nv_isa_solver`_
   project.

 * `nv-shader-tools`_ has some additional tools for disassembling and fuzzing
   the hardware ISA

 * Mel has dumped a `list of avaiable instructions`_ and their opcodes on recent
   architectures by scraping nvdisasm error messages.

 * The `Volta whitepaper`_ section "Independent Thread Scheduling" has an
   overview of the control flow model used on Volta+ GPUs.

 * `Dissecting the NVidia Turing T4 GPU via Microbenchmarking`_ has
   reverse-engineered info about the Turing instruction encoding. See especially
   section "2.1 Control information" for an overview of compiler-inserted delays
   and waits on Maxwell and later.

 * `Analyzing Modern NVIDIA GPU cores`_ has additional reverse-engineered info
   about the semantics of compiler-inserted delays and waits.

 * `Control Flow Management in Modern GPUs`_ has more detail about control flow
   reconvergence on Volta+

 * `maxas`_ has some reverse-engineered info on the Maxwell ISA

 * `asfermi`_ has some reverse-engineered info on the older Fermi ISA

 * Red Hat has some NDA'd documentation on instruction latencies from NVIDIA.
   Bother karolherbst or airlied on irc if you're missing a latency class for an
   instruction on recent architectures.

 * Behavior of instructions are tested using the hardware tests in
   ``src/nouveau/compiler/nak/hw_tests.rs`` and the corresponding ``Foldable``
   implementations in ``src/nouveau/compiler/nak/ir.rs`` (build with ``-D
   build-tests=true`` and run ``src/nouveau/compiler/nak hw_tests`` from the
   build dir)

 * NAK's instruction encodings are tested against nvdisasm using
   ``src/nouveau/compiler/nak/nvdisasm_tests.rs`` (build with ``-D
   build-tests=true`` and run ``src/nouveau/compiler/nak nvdisasm_tests`` from
   the build dir)

 * The old GL driver's compiler, under ``src/gallium/drivers/nouveau/codegen``,
   has some information. This is especially useful for graphics-only
   instructions, which are often not covered by other sources.

 * `Compiler explorer`_ is a convenient tool to see what assembly NVIDIA
   generates for a given CUDA program.

 .. _NVIDIA PTX documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
 .. _CUDA Binary Utilities: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#instruction-set-reference
 .. _older version: https://docs.nvidia.com/cuda/archive/11.8.0/cuda-binary-utilities/index.html#instruction-set-ref
 .. _Hopper ISA: https://kuterdinel.com/nv_isa/
 .. _Ada ISA: https://kuterdinel.com/nv_isa_sm89/
 .. _nv_isa_solver: https://github.com/kuterd/nv_isa_solver
 .. _nv-shader-tools: https://gitlab.freedesktop.org/nouveau/nv-shader-tools
 .. _list of avaiable instructions: https://gitlab.freedesktop.org/mhenning/re/-/tree/main/opclass?ref_type=heads
 .. _Volta whitepaper: https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
 .. _Dissecting the NVidia Turing T4 GPU via Microbenchmarking: https://arxiv.org/pdf/1903.07486
 .. _Analyzing Modern NVIDIA GPU cores: https://arxiv.org/pdf/2503.20481
 .. _Control Flow Management in Modern GPUs: https://arxiv.org/pdf/2407.02944
 .. _maxas: https://github.com/NervanaSystems/maxas/wiki
 .. _asfermi: https://github.com/hyqneuron/asfermi/wiki
 .. _Compiler explorer: https://godbolt.org/z/1jrfhq5G7

Misc
^^^^

 * `envytools`_ has reverse-engineered documentation for maxwell and earlier
   hardware.
 * The nvidia architecture whitepapers give a basic overview of what has changed
   between hardware revisions. See eg. the `Blackwell whitepaper`_
 * The nvidia architecture tuning guides often mention how details of a hardware
   generation has changed, often with information about the memory subsystem or
   occupancy. See eg. the `Blackwell tuning guide`_
 * `The Nouveau wiki's CodeNames page`_ is useful for mapping NVIDIA marketing
   names to engineering names
 * `Matching CUDA arch and CUDA gencode for various NVIDIA architectures`_ has a
   useful table comparing SM versions to engineering names

 .. _envytools: https://envytools.readthedocs.io/en/latest/hw/index.html
 .. _Blackwell whitepaper: https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf
 .. _Blackwell tuning guide: https://docs.nvidia.com/cuda/blackwell-tuning-guide/index.html
 .. _The Nouveau wiki's CodeNames page: https://nouveau.freedesktop.org/CodeNames.html
 .. _Matching CUDA arch and CUDA gencode for various NVIDIA architectures: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
docs/nvk: Add a list of external hardware docs Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37828> 2025-10-10 17:45:56 -04:00
			`External Hardware Documentation and Resources`
			`=============================================`

			`Information about hardware behavior comes from a mix of official and`
			`reverse-engineered sources.`

			`Command buffers`
			`^^^^^^^^^^^^^^^`

			* `NVIDIA open-gpu-doc repository`_ is official documentation from NVIDIA that
			`has been released to the public. The majority of this documentation comes in`
			`the form of class headers which describe the class state registers.`

			* `NVIDIA open-gpu-kernel-modules repository`_ is the open-source kernel mode
			`driver that NVIDIA ships on Turing+ GPUs with GSP. The code here can provide`
			`examples of how to use some hardware features. If open-gpu-doc is missing a`
			`class header, sometimes there will be one here.`

			* Reverse-engineered command names from `envytools`_ are available in mesa
			under eg. ``src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h``. These are no
			`longer updated. nvk instead uses the open-gpu-doc headers`

			* `envyhooks`_ is the modern way to dump command sequences from the proprietary
			`driver`

			* ``nv_push_dump`` is part of mesa and can disassemble command sequences (build
			with ``-D tools=nouveau``, run ``src/nouveau/headers/nv_push_dump`` from the
			`build dir)`

			`.. _NVIDIA open-gpu-doc repository: https://github.com/NVIDIA/open-gpu-doc`
			`.. _NVIDIA open-gpu-kernel-modules repository: https://github.com/NVIDIA/open-gpu-kernel-modules`
			`.. _envyhooks: https://gitlab.freedesktop.org/nouveau/envyhooks`

			`Shader ISA`
			`^^^^^^^^^^`

			* `NVIDIA PTX documentation`_ is NVIDIA documentation for CUDA's
			`intermediate representation. We don't use PTX directly, but this often has`
			`hints about how underlying hardware instructions work. For example, the PTX`
			`redux` instruction is pretty much identical to the hardware instruction of
			`the same name.`

			* `CUDA Binary Utilities`_ is documentation for CUDA's disassembler,
			`nvdisasm`. It includes a brief description of most hardware instructions.
			There's also an `older version`_ that has older architectures (Kepler through
			`Volta).`

			* Kuter Dinel has reverse-engineered instruction encodings for the `Hopper
			ISA`_ and `Ada ISA`_ which are autogenerated from his `nv_isa_solver`_
			`project.`

			* `nv-shader-tools`_ has some additional tools for disassembling and fuzzing
			`the hardware ISA`

			* Mel has dumped a `list of avaiable instructions`_ and their opcodes on recent
			`architectures by scraping nvdisasm error messages.`

			* The `Volta whitepaper`_ section "Independent Thread Scheduling" has an
			`overview of the control flow model used on Volta+ GPUs.`

			* `Dissecting the NVidia Turing T4 GPU via Microbenchmarking`_ has
			`reverse-engineered info about the Turing instruction encoding. See especially`
			`section "2.1 Control information" for an overview of compiler-inserted delays`
			`and waits on Maxwell and later.`

			* `Analyzing Modern NVIDIA GPU cores`_ has additional reverse-engineered info
			`about the semantics of compiler-inserted delays and waits.`

			* `Control Flow Management in Modern GPUs`_ has more detail about control flow
			`reconvergence on Volta+`

			* `maxas`_ has some reverse-engineered info on the Maxwell ISA

			* `asfermi`_ has some reverse-engineered info on the older Fermi ISA

			`* Red Hat has some NDA'd documentation on instruction latencies from NVIDIA.`
			`Bother karolherbst or airlied on irc if you're missing a latency class for an`
			`instruction on recent architectures.`

			`* Behavior of instructions are tested using the hardware tests in`
			``src/nouveau/compiler/nak/hw_tests.rs`` and the corresponding ``Foldable``
			implementations in ``src/nouveau/compiler/nak/ir.rs`` (build with ``-D
			build-tests=true`` and run ``src/nouveau/compiler/nak hw_tests`` from the
			`build dir)`

			`* NAK's instruction encodings are tested against nvdisasm using`
			``src/nouveau/compiler/nak/nvdisasm_tests.rs`` (build with ``-D
			build-tests=true`` and run ``src/nouveau/compiler/nak nvdisasm_tests`` from
			`the build dir)`

			* The old GL driver's compiler, under ``src/gallium/drivers/nouveau/codegen``,
			`has some information. This is especially useful for graphics-only`
			`instructions, which are often not covered by other sources.`

			* `Compiler explorer`_ is a convenient tool to see what assembly NVIDIA
			`generates for a given CUDA program.`

			`.. _NVIDIA PTX documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html`
			`.. _CUDA Binary Utilities: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#instruction-set-reference`
			`.. _older version: https://docs.nvidia.com/cuda/archive/11.8.0/cuda-binary-utilities/index.html#instruction-set-ref`
			`.. _Hopper ISA: https://kuterdinel.com/nv_isa/`
			`.. _Ada ISA: https://kuterdinel.com/nv_isa_sm89/`
			`.. _nv_isa_solver: https://github.com/kuterd/nv_isa_solver`
			`.. _nv-shader-tools: https://gitlab.freedesktop.org/nouveau/nv-shader-tools`
			`.. _list of avaiable instructions: https://gitlab.freedesktop.org/mhenning/re/-/tree/main/opclass?ref_type=heads`
			`.. _Volta whitepaper: https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf`
			`.. _Dissecting the NVidia Turing T4 GPU via Microbenchmarking: https://arxiv.org/pdf/1903.07486`
			`.. _Analyzing Modern NVIDIA GPU cores: https://arxiv.org/pdf/2503.20481`
			`.. _Control Flow Management in Modern GPUs: https://arxiv.org/pdf/2407.02944`
			`.. _maxas: https://github.com/NervanaSystems/maxas/wiki`
			`.. _asfermi: https://github.com/hyqneuron/asfermi/wiki`
			`.. _Compiler explorer: https://godbolt.org/z/1jrfhq5G7`

			`Misc`
			`^^^^`

			* `envytools`_ has reverse-engineered documentation for maxwell and earlier
			`hardware.`
			`* The nvidia architecture whitepapers give a basic overview of what has changed`
			between hardware revisions. See eg. the `Blackwell whitepaper`_
			`* The nvidia architecture tuning guides often mention how details of a hardware`
			`generation has changed, often with information about the memory subsystem or`
			occupancy. See eg. the `Blackwell tuning guide`_
			* `The Nouveau wiki's CodeNames page`_ is useful for mapping NVIDIA marketing
			`names to engineering names`
			* `Matching CUDA arch and CUDA gencode for various NVIDIA architectures`_ has a
			`useful table comparing SM versions to engineering names`

			`.. _envytools: https://envytools.readthedocs.io/en/latest/hw/index.html`
			`.. _Blackwell whitepaper: https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf`
			`.. _Blackwell tuning guide: https://docs.nvidia.com/cuda/blackwell-tuning-guide/index.html`
			`.. _The Nouveau wiki's CodeNames page: https://nouveau.freedesktop.org/CodeNames.html`
			`.. _Matching CUDA arch and CUDA gencode for various NVIDIA architectures: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/`