External Hardware Documentation and Resources ============================================= Information about hardware behavior comes from a mix of official and reverse-engineered sources. Command buffers ^^^^^^^^^^^^^^^ * `NVIDIA open-gpu-doc repository`_ is official documentation from NVIDIA that has been released to the public. The majority of this documentation comes in the form of class headers which describe the class state registers. * `NVIDIA open-gpu-kernel-modules repository`_ is the open-source kernel mode driver that NVIDIA ships on Turing+ GPUs with GSP. The code here can provide examples of how to use some hardware features. If open-gpu-doc is missing a class header, sometimes there will be one here. * Reverse-engineered command names from `envytools`_ are available in mesa under eg. ``src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h``. These are no longer updated. nvk instead uses the open-gpu-doc headers * `envyhooks`_ is the modern way to dump command sequences from the proprietary driver * ``nv_push_dump`` is part of mesa and can disassemble command sequences (build with ``-D tools=nouveau``, run ``src/nouveau/headers/nv_push_dump`` from the build dir) .. _NVIDIA open-gpu-doc repository: https://github.com/NVIDIA/open-gpu-doc .. _NVIDIA open-gpu-kernel-modules repository: https://github.com/NVIDIA/open-gpu-kernel-modules .. _envyhooks: https://gitlab.freedesktop.org/nouveau/envyhooks Shader ISA ^^^^^^^^^^ * `NVIDIA PTX documentation`_ is NVIDIA documentation for CUDA's intermediate representation. We don't use PTX directly, but this often has hints about how underlying hardware instructions work. For example, the PTX `redux` instruction is pretty much identical to the hardware instruction of the same name. * `CUDA Binary Utilities`_ is documentation for CUDA's disassembler, `nvdisasm`. It includes a brief description of most hardware instructions. There's also an `older version`_ that has older architectures (Kepler through Volta). * Kuter Dinel has reverse-engineered instruction encodings for the `Hopper ISA`_ and `Ada ISA`_ which are autogenerated from his `nv_isa_solver`_ project. * `nv-shader-tools`_ has some additional tools for disassembling and fuzzing the hardware ISA * Mel has dumped a `list of avaiable instructions`_ and their opcodes on recent architectures by scraping nvdisasm error messages. * The `Volta whitepaper`_ section "Independent Thread Scheduling" has an overview of the control flow model used on Volta+ GPUs. * `Dissecting the NVidia Turing T4 GPU via Microbenchmarking`_ has reverse-engineered info about the Turing instruction encoding. See especially section "2.1 Control information" for an overview of compiler-inserted delays and waits on Maxwell and later. * `Analyzing Modern NVIDIA GPU cores`_ has additional reverse-engineered info about the semantics of compiler-inserted delays and waits. * `Control Flow Management in Modern GPUs`_ has more detail about control flow reconvergence on Volta+ * `maxas`_ has some reverse-engineered info on the Maxwell ISA * `asfermi`_ has some reverse-engineered info on the older Fermi ISA * Red Hat has some NDA'd documentation on instruction latencies from NVIDIA. Bother karolherbst or airlied on irc if you're missing a latency class for an instruction on recent architectures. * Behavior of instructions are tested using the hardware tests in ``src/nouveau/compiler/nak/hw_tests.rs`` and the corresponding ``Foldable`` implementations in ``src/nouveau/compiler/nak/ir.rs`` (build with ``-D build-tests=true`` and run ``src/nouveau/compiler/nak hw_tests`` from the build dir) * NAK's instruction encodings are tested against nvdisasm using ``src/nouveau/compiler/nak/nvdisasm_tests.rs`` (build with ``-D build-tests=true`` and run ``src/nouveau/compiler/nak nvdisasm_tests`` from the build dir) * The old GL driver's compiler, under ``src/gallium/drivers/nouveau/codegen``, has some information. This is especially useful for graphics-only instructions, which are often not covered by other sources. * `Compiler explorer`_ is a convenient tool to see what assembly NVIDIA generates for a given CUDA program. .. _NVIDIA PTX documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html .. _CUDA Binary Utilities: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#instruction-set-reference .. _older version: https://docs.nvidia.com/cuda/archive/11.8.0/cuda-binary-utilities/index.html#instruction-set-ref .. _Hopper ISA: https://kuterdinel.com/nv_isa/ .. _Ada ISA: https://kuterdinel.com/nv_isa_sm89/ .. _nv_isa_solver: https://github.com/kuterd/nv_isa_solver .. _nv-shader-tools: https://gitlab.freedesktop.org/nouveau/nv-shader-tools .. _list of avaiable instructions: https://gitlab.freedesktop.org/mhenning/re/-/tree/main/opclass?ref_type=heads .. _Volta whitepaper: https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf .. _Dissecting the NVidia Turing T4 GPU via Microbenchmarking: https://arxiv.org/pdf/1903.07486 .. _Analyzing Modern NVIDIA GPU cores: https://arxiv.org/pdf/2503.20481 .. _Control Flow Management in Modern GPUs: https://arxiv.org/pdf/2407.02944 .. _maxas: https://github.com/NervanaSystems/maxas/wiki .. _asfermi: https://github.com/hyqneuron/asfermi/wiki .. _Compiler explorer: https://godbolt.org/z/1jrfhq5G7 Misc ^^^^ * `envytools`_ has reverse-engineered documentation for maxwell and earlier hardware. * The nvidia architecture whitepapers give a basic overview of what has changed between hardware revisions. See eg. the `Blackwell whitepaper`_ * The nvidia architecture tuning guides often mention how details of a hardware generation has changed, often with information about the memory subsystem or occupancy. See eg. the `Blackwell tuning guide`_ * `The Nouveau wiki's CodeNames page`_ is useful for mapping NVIDIA marketing names to engineering names * `Matching CUDA arch and CUDA gencode for various NVIDIA architectures`_ has a useful table comparing SM versions to engineering names .. _envytools: https://envytools.readthedocs.io/en/latest/hw/index.html .. _Blackwell whitepaper: https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf .. _Blackwell tuning guide: https://docs.nvidia.com/cuda/blackwell-tuning-guide/index.html .. _The Nouveau wiki's CodeNames page: https://nouveau.freedesktop.org/CodeNames.html .. _Matching CUDA arch and CUDA gencode for various NVIDIA architectures: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/