diff --git a/docs/isl/ccs.rst b/docs/isl/ccs.rst new file mode 100644 index 00000000000..37797705cc9 --- /dev/null +++ b/docs/isl/ccs.rst @@ -0,0 +1,171 @@ +Single-sampled Color Compression +================================ + +Starting with Ivy Bridge, Intel graphics hardware provides a form of color +compression for single-sampled surfaces. In its initial form, this provided an +acceleration of render target clear operations that, in the common case, allows +you to avoid almost all of the bandwidth of a full-surface clear operation. On +Sky Lake, single-sampled color compression was extended to allow for the +compression color values from actual rendering and not just the initial clear. +From here on, the older Ivy Bridge form of color compression will be called +"fast-clears" and term "color compression" will be reserved for the more +powerful Sky Lake form. + +The documentation for Ivy Bridge through Broadwell overloads the term MCS for +referring both to the *multisample control surface* used for multisample +compression and the control surface used for fast-clears. In ISL, the +:cpp:enumerator:`isl_aux_usage::ISL_AUX_USAGE_MCS` enum always refers to +multisample color compression while the +:cpp:enumerator:`isl_aux_usage::ISL_AUX_USAGE_CCS_` enums always refer to +single-sampled color compression. Throughout this chapter and the rest of the +ISL documentation, we will use the term "color control surface", abbreviated +CCS, to denote the control surface used for both fast-clears and color +compression. While this is still an overloaded term, Ivy Bridge fast-clears +are much closer to Sky Lake color compression than they are to multisample +compression. + +CCS data +-------- + +Fast clears and CCS are possibly the single most poorly documented aspect of +surface layout/setup for Intel graphics hardware (with HiZ coming in a neat +second). All the documentation really says is that you can use an MCS buffer on +single-sampled surfaces (we will call it the CCS in this case). It also +provides some documentation on how to program the hardware to perform clear +operations, but that's it. How big is this buffer? What does it contain? +Those question are left as exercises to the reader. Almost everything we know +about the contents of the CCS is gleaned from reverse-engineering of the +hardware. The best bit of documentation we have ever had comes from the +display section of the Sky Lake PRM Vol 12 section on planes (p. 159): + + The Color Control Surface (CCS) contains the compression status of the + cache-line pairs. The compression state of the cache-line pair is + specified by 2 bits in the CCS. Each CCS cache-line represents an area + on the main surface of 16x16 sets of 128 byte Y-tiled cache-line-pairs. + CCS is always Y tiled. + +While this is technically for color compression and not fast-clears, it +provides a good bit of insight into how color compression and fast-clears +operate. Each cache-line pair, in the main surface corresponds to 1 or 2 bits +in the CCS. The primary difference, as far as the current discussion is +concerned, is that fast-clears use only 1 bit per cache-line pair whereas color +compression uses 2 bits. + +What is a cache-line pair? Both the X and Y tiling formats are arranged as an +8x8 grid of cache lines. (See the [chapter on tiling](#tiling) for more +details.) In either case, a cache-line pair is a pair of cache lines whose +starting addresses differ by 512 bytes or 8 cache lines. This results in the +two cache lines being vertically adjacent when the main surface is X-tiled and +horizontally adjacent when the main surface is Y-tiled. For an X-tiled surface +this forms an area of 64B x 2rows and for a Y-tiled surface this forms an area +of 32B x 4rows. In either case, it is guaranteed that, regardless of surface +format, each 2x2 subspan coming out of a shader will land entirely within one +cache-line pair. + +What is the correspondence between bits and cache-line pairs? The best model I +(Jason) know of is to consider the CCS as having a 1-bit color format for +fast-clears and a 2-bit format for color compression and a special tiling +format. The CCS tiling formats operate on a 1 or 2-bit granularity rather than +the byte granularity of most tiling formats. + +The following table represents the bit-layouts that yield the CCS tiling format +on different hardware generations. Bits 0-11 correspond to the regular swizzle +of bytes within a 4KB page whereas the negative bits represent the address of +the particular 1 or 2-bit portion of a byte. (Note: The haswell data was +gathered on a dual-channel system so bit-6 swizzling was enabled. It's unclear +how this affects the CCS layout.) + +============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== + Generation Tiling 11 10 9 8 7 6 5 4 3 2 1 0 -1 -2 -3 +============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== + Ivy Bridge X or Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0` + Haswell X :math:`u_6` :math:`u_5` :math:`v_3 \oplus u_1` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_0` + Haswell Y :math:`u_6` :math:`u_5` :math:`v_2 \oplus u_1` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`v_1` :math:`v_0` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_0` + Broadwell X :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`u_3` :math:`v_3` :math:`u_2` :math:`u_1` :math:`u_0` :math:`v_2` :math:`v_1` :math:`v_0` + Broadwell Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_7` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_2` :math:`v_3` :math:`u_3` :math:`u_2` :math:`u_1` :math:`v_1` :math:`v_0` :math:`u_0` + Sky Lake Y :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_6` :math:`v_5` :math:`v_4` :math:`v_3` :math:`v_2` :math:`v_1` :math:`u_3` :math:`u_2` :math:`u_1` :math:`v_0` :math:`u_0` +============ ======== =========== =========== ====================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== + +CCS surface layout +------------------ + +Starting with Broadwell, fast-clears and color compression can be used on +mipmapped and array surfaces. When considered from a higher level, the CCS is +layed out like any other surface. The Broadwell and Sky Lake PRMs describe +this as follows: + +Broadwell PRM Vol 7, "MCS Buffer for Render Target(s)" (p. 676): + + Mip-mapped and arrayed surfaces are supported with MCS buffer layout with + these alignments in the RT space: Horizontal Alignment = 256 and Vertical + Alignment = 128. + +Broadwell PRM Vol 2d, "RENDER_SURFACE_STATE" (p. 279): + + For non-multisampled render target's auxiliary surface, MCS, QPitch must be + computed with Horizontal Alignment = 256 and Surface Vertical Alignment = + 128. These alignments are only for MCS buffer and not for associated render + target. + +Sky Lake PRM Vol 7, "MCS Buffer for Render Target(s)" (p. 632): + + Mip-mapped and arrayed surfaces are supported with MCS buffer layout with + these alignments in the RT space: Horizontal Alignment = 128 and Vertical + Alignment = 64. + +Sky Lake PRM Vol. 2d, "RENDER_SURFACE_STATE" (p. 435): + + For non-multisampled render target's CCS auxiliary surface, QPitch must be + computed with Horizontal Alignment = 128 and Surface Vertical Alignment + = 256. These alignments are only for CCS buffer and not for associated + render target. + +Empirical evidence seems to confirm this. On Sky Lake, the vertical alignment +is always one cache line. The horizontal alignment, however, varies by main +surface format: 1 cache line for 32bpp, 2 for 64bpp and 4 cache lines for +128bpp formats. This nicely corresponds to the alignment of 128x64 pixels in +the primary color surface. The second PRM citation about Sky Lake CCS above +gives a vertical alignment of 256 rather than 64. With a little +experimentation, this additional alignment appears to only apply to QPitch and +not to the miplevels within a slice. + +On Broadwell, each miplevel in the CCS is aligned to a cache-line pair +boundary: horizontal when the primary surface is X-tiled and vertical when +Y-tiled. For a 32bpp format, this works out to an alignment of 256x128 main +surface pixels regardless of X or Y tiling. On Sky Lake, the alignment is +a single cache line which works out to an alignment of 128x64 main surface +pixels. + +TODO: More than just 32bpp formats on Broadwell! + +Once armed with the above alignment information, we can lay out the CCS surface +itself. The way ISL does CCS layout calculations is by a very careful and +subtle application of its normal surface layout code. + +Above, we described the CCS data layout as mapping of address bits. In +ISL, this is represented by :cpp:enumerator:`isl_tiling::ISL_TILING_CCS`. The +logical and physical tile dimensions corresponding to the above mapping. + +We also have special :cpp:enum:`isl_format` enums for CCS. These formats are 1 +bit-per-pixel on Ivy Bridge through Broadwell and 2 bits-per-pixel on Skylake +and above to correspond to the 1 and 2-bit values represented in the CCS data. +They have a block size (similar to a block compressed format such as BC or +ASTC) which says what area (in surface elements) in the main surface is covered +by a single CCS element (1 or 2-bit). Because this depends on the main surface +tiling and format, we have several different CCS formats. + +Once the appropriate :cpp:enum:`isl_format` has been selected, computing the +size and layout of a CCS surface is as simple as passing the same surface +creation parameters to :cpp:func:`isl_surf_init_s` as were used to create the +primary surface only with :cpp:enumerator:`isl_tiling::ISL_TILING_CCS` and the +correct CCS format. This not only results in a correctly sized surface but +most other ISL helpers for things such as computing offsets into surfaces work +correctly as well. + +CCS on Tigerlake and above +-------------------------- + +Starting with Tigerlake, CCS is no longer done via a surface and, instead, the +term CCS gets overloaded once again (gotta love it!) to now refer to a form of +universal compression which can be applied to almost any surface. Nothing in +this chapter applies to any hardware with a graphics IP version 12 or above. diff --git a/docs/isl/index.rst b/docs/isl/index.rst index 2d1714a5259..d91508d6689 100644 --- a/docs/isl/index.rst +++ b/docs/isl/index.rst @@ -12,6 +12,7 @@ Chery. units formats tiling + ccs The core representation of a surface in ISL is :cpp:struct:`isl_surf`.