diff --git a/docs/isl/index.rst b/docs/isl/index.rst index 74dbcda171d..2d1714a5259 100644 --- a/docs/isl/index.rst +++ b/docs/isl/index.rst @@ -11,6 +11,7 @@ Chery. units formats + tiling The core representation of a surface in ISL is :cpp:struct:`isl_surf`. diff --git a/docs/isl/tiling-basic.svg b/docs/isl/tiling-basic.svg new file mode 100644 index 00000000000..ebe4b6287dd --- /dev/null +++ b/docs/isl/tiling-basic.svg @@ -0,0 +1,512 @@ + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + + + + + + + + + 0 + 1 + 2 + 128 + 129 + 256 + 257 + 258 + 130 + ... + ... + ... + ... + ... + ... + + + + diff --git a/docs/isl/tiling.rst b/docs/isl/tiling.rst new file mode 100644 index 00000000000..462ce87754a --- /dev/null +++ b/docs/isl/tiling.rst @@ -0,0 +1,288 @@ +Tiling +====== + +The naive view of an image in memory is that the pixels are stored one after +another in memory usually in an X-major order. An image that is arranged in +this way is called "linear". Linear images, while easy to reason about, can +have very bad cache locality. Graphics operations tend to act on pixels that +are close together in 2-D euclidean space. If you move one pixel to the right +or left in a linear image, you only move a few bytes to one side or the other +in memory. However, if you move one pixel up or down you can end up kilobytes +or even megabytes away. + +Tiling (sometimes referred to as swizzling) is a method of re-arranging the +pixels of a surface so that pixels which are close in 2-D euclidean space are +likely to be close in memory. + +Basics +------ + +The basic idea of a tiled image is that the image is first divided into +two-dimensional blocks or tiles. Each tile takes up a chunk of contiguous +memory and the tiles are arranged like pixels in linear surface. This is best +demonstrated with a specific example. Suppose we have a RGBA8888 X-tiled +surface on Intel graphics. Then the surface is divided into 128x8 pixel tiles +each of which is 4KB of memory. Within each tile, the pixels are laid out like +a 128x8 linear image. The tiles themselves are laid out row-major in memory +like giant pixels. This means that, as long as you don't leave your 128x8 +tile, you can move in both dimensions without leaving the same 4K page in +memory. + +.. image:: tiling-basic.svg + :alt: Example of an X-tiled image + +You can, however do even better than this. Suppose that same image is, +instead, Y-tiled. Then the surface is divided into 32x32 pixel tiles each of +which is 4KB of memory. Within a tile, each 64B cache line corresponds to 4x4 +pixel region of the image (you can think of it as a tile within a tile). This +means that very small deviations don't even leave the cache line. This added +bit of pixel shuffling is known to have a substantial performance impact in +most real-world applications. + +Intel graphics has several different tiling formats that we'll discuss in +detail in later sections. The most commonly used as of the writing of this +chapter is Y-tiling. In all tiling formats the basic principal is the same: +The image is divided into tiles of a particular size and, within those tiles, +the data is re-arranged (or swizzled) based on a particular pattern. A tile +size will always be specified in bytes by rows and the actual X-dimension of +the tile in elements depends on the size of the element in bytes. + +Bit-6 Swizzling +^^^^^^^^^^^^^^^ + +On some hardware, there is an additional address swizzle that is applied on top +of the tiling format. Whether or not swizzling is enabled depends on the memory +configuration of the system. In general, systems with dual-channel RAM have +swizzling enabled and single-channel do not. Supposedly, this swizzling allows +for better balancing between the two memory channels and increases performance. +Because it depends on the memory configuration which may change from one boot +to the next, it requires a run-time check. + +The best documentation for bit-6 swizzling can be found in the Haswell PRM Vol. +5 "Memory Views" in the section entitled "Address Swizzling for Tiled-Y +Surfaces". + +ISL Representation +------------------ + +The structure of any given tiling format is represented by ISL using the +:cpp:enum:`isl_tiling` enum and the :cpp:struct:`isl_tile_info` structure: + +.. doxygenenum:: isl_tiling + +.. doxygenfunction:: isl_tiling_get_info + +.. doxygenstruct:: isl_tile_info + :members: + +The `isl_tile_info` structure has two different sizes for a tile: a logical +size in surface elements and a physical size in bytes. In order to determine +the proper logical size, the bits-per-block of the underlying format has to be +passed into `isl_tiling_get_info`. The proper way to compute the size of an +image in bytes given a width and height in elements is as follows: + +.. code-block:: c + + uint32_t width_tl = DIV_ROUND_UP(width_el * (format_bpb / tile_info.format_bpb), + tile_info.logical_extent_el.w); + uint32_t height_tl = DIV_ROUND_UP(height_el, tile_info.logical_extent_el.h); + uint32_t row_pitch = width_tl * tile_info.phys_extent_el.w; + uint32_t size = height_tl * tile_info.phys_extent_el.h * row_pitch; + +It is very important to note that there is no direct conversion between +:cpp:member:`isl_tile_info::logical_extent_el` and +:cpp:member:`isl_tile_info::phys_extent_B`. It is tempting to assume that the +logical and physical heights are the same and simply divide the width of +:cpp:member:`isl_tile_info::phys_extent_B` by the size of the format (which is +what the PRM does) to get :cpp:member:`isl_tile_info::logical_extent_el` but +this is not at all correct. Some tiling formats have logical and physical +heights that differ and so no such calculation will work in general. The +easiest case study for this is W-tiling. From the Sky Lake PRM Vol. 2d, +"RENDER_SURFACE_STATE" (p. 427): + + If the surface is a stencil buffer (and thus has Tile Mode set to + TILEMODE_WMAJOR), the pitch must be set to 2x the value computed based on + width, as the stencil buffer is stored with two rows interleaved. + +What does this mean? Why are we multiplying the pitch by two? What does it +mean that "the stencil buffer is stored with two rows interleaved"? The +explanation for all these questions is that a W-tile (which is only used for +stencil) has a logical size of 64el x 64el but a physical size of 128B +x 32rows. In memory, a W-tile has the same footprint as a Y-tile (128B +x 32rows) but every pair of rows in the stencil buffer is interleaved into +a single row of bytes yielding a two-dimensional area of 64el x 64el. You can +consider this as its own tiling format or as a modification of Y-tiling. The +interpretation in the PRMs vary by hardware generation; on Sandy Bridge they +simply said it was Y-tiled but by Sky Lake there is almost no mention of +Y-tiling in connection with stencil buffers and they are always W-tiled. This +mismatch between logical and physical tile sizes are also relevant for +hierarchical depth buffers as well as single-channel MCS and CCS buffers. + +X-tiling +-------- + +The simplest tiling format available on Intel graphics (which has been +available since gen4) is X-tiling. An X-tile is 512B x 8rows and, within the +tile, the data is arranged in an X-major linear fashion. You can also look at +X-tiling as being an 8x8 cache line grid where the cache lines are arranged +X-major as follows: + +===== ===== ===== ===== ===== ===== ===== ===== +===== ===== ===== ===== ===== ===== ===== ===== +0x000 0x040 0x080 0x0c0 0x100 0x140 0x180 0x1c0 +0x200 0x240 0x280 0x2c0 0x300 0x340 0x380 0x3c0 +0x400 0x440 0x480 0x4c0 0x500 0x540 0x580 0x5c0 +0x600 0x640 0x680 0x6c0 0x700 0x740 0x780 0x7c0 +0x800 0x840 0x880 0x8c0 0x900 0x940 0x980 0x9c0 +0xa00 0xa40 0xa80 0xac0 0xb00 0xb40 0xb80 0xbc0 +0xc00 0xc40 0xc80 0xcc0 0xd00 0xd40 0xd80 0xdc0 +0xe00 0xe40 0xe80 0xec0 0xf00 0xf40 0xf80 0xfc0 +===== ===== ===== ===== ===== ===== ===== ===== + +Each cache line represents a piece of a single row of pixels within the image. +The memory locations of two vertically adjacent pixels within the same X-tile +always differs by 512B or 8 cache lines. + +As mentioned above, X-tiling is slower than Y-tiling (though still faster than +linear). However, until Sky Lake, the display scan-out hardware could only do +X-tiling so we have historically used X-tiling for all window-system buffers +(because X or a Wayland compositor may want to put it in a plane). + +Bit-6 Swizzling +^^^^^^^^^^^^^^^ + +When bit-6 swizzling is enabled, bits 9 and 10 are XOR'd in with bit 6 of the +tiled address: + +.. code-block:: c + + addr[6] ^= addr[9] ^ addr[10]; + +Y-tiling +-------- + +The Y-tiling format, also available since gen4, is substantially different from +X-tiling and performs much better in practice. Each Y-tile is an 8x8 grid of cache lines arranged Y-major as follows: + +===== ===== ===== ===== ===== ===== ===== ===== +===== ===== ===== ===== ===== ===== ===== ===== +0x000 0x200 0x400 0x600 0x800 0xa00 0xc00 0xe00 +0x040 0x240 0x440 0x640 0x840 0xa40 0xc40 0xe40 +0x080 0x280 0x480 0x680 0x880 0xa80 0xc80 0xe80 +0x0c0 0x2c0 0x4c0 0x6c0 0x8c0 0xac0 0xcc0 0xec0 +0x100 0x300 0x500 0x700 0x900 0xb00 0xd00 0xf00 +0x140 0x340 0x540 0x740 0x940 0xb40 0xd40 0xf40 +0x180 0x380 0x580 0x780 0x980 0xb80 0xd80 0xf80 +0x1c0 0x3c0 0x5c0 0x7c0 0x9c0 0xbc0 0xdc0 0xfc0 +===== ===== ===== ===== ===== ===== ===== ===== + +Each 64B cache line within the tile is laid out as 4 rows of 16B each: + +==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== +==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== +0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f +0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f +0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27 0x28 0x29 0x2a 0x2b 0x2c 0x2d 0x2e 0x2f +0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x3a 0x3b 0x3c 0x3d 0x3e 0x3f +==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + +Y-tiling is widely regarded as being substantially faster than X-tiling so it +is generally preferred. However, prior to Sky Lake, Y-tiling was not available +for scanout so X tiling was used for any sort of window-system buffers. +Starting with Sky Lake, we can scan out from Y-tiled buffers. + +Bit-6 Swizzling +^^^^^^^^^^^^^^^ + +When bit-6 swizzling is enabled, bit 9 is XOR'd in with bit 6 of the tiled +address: + +.. code-block:: c + + addr[6] ^= addr[9]; + +W-tiling +-------- + +W-tiling is a new tiling format added on Sandy Bridge for use in stencil +buffers. W-tiling is similar to Y-tiling in that it's arranged as an 8x8 +Y-major grid of cache lines. The bytes within each cache line are arranged as +follows: + +==== ==== ==== ==== ==== ==== ==== ==== +==== ==== ==== ==== ==== ==== ==== ==== +0x00 0x01 0x04 0x05 0x10 0x11 0x14 0x15 +0x02 0x03 0x06 0x07 0x12 0x13 0x16 0x17 +0x08 0x09 0x0c 0x0d 0x18 0x19 0x1c 0x1d +0x0a 0x0b 0x0e 0x0f 0x1a 0x1b 0x1e 0x1f +0x20 0x21 0x24 0x25 0x30 0x31 0x34 0x35 +0x22 0x23 0x26 0x27 0x32 0x33 0x36 0x37 +0x28 0x29 0x2c 0x2d 0x38 0x39 0x3c 0x3d +0x2a 0x2b 0x2e 0x2f 0x3a 0x3b 0x3e 0x3f +==== ==== ==== ==== ==== ==== ==== ==== + +While W-tiling has been required for stencil all the way back to Sandy Bridge, +the docs are somewhat confused as to whether stencil buffers are W or Y-tiled. +This seems to stem from the fact that the hardware seems to implement W-tiling +as a sort of modified Y-tiling. One example of this is the somewhat odd +requirement that W-tiled buffers have their pitch multiplied by 2. From the +Sky Lake PRM Vol. 2d, "RENDER_SURFACE_STATE" (p. 427): + + If the surface is a stencil buffer (and thus has Tile Mode set to + TILEMODE_WMAJOR), the pitch must be set to 2x the value computed based on + width, as the stencil buffer is stored with two rows interleaved. + +The last phrase holds the key here: "the stencil buffer is stored with two rows +interleaved". More accurately, a W-tiled buffer can be viewed as a Y-tiled +buffer with each set of 4 W-tiled lines interleaved to form 2 Y-tiled lines. In +ISL, we represent a W-tile as a tiling with a logical dimension of 64el x 64el +but a physical size of 128B x 32rows. This cleanly takes care of the pitch +issue above and seems to nicely model the hardware. + +Tiling as a bit pattern +----------------------- + +There is one more important angle on tiling that should be discussed before we +finish. Every tiling can be described by three things: + + 1. A logical width and height in elements + 2. A physical width in bytes and height in rows + 3. A mapping from logical elements to physical bytes within the tile + +We have spent a good deal of time on the first two because this is what you +really need for doing surface layout calculations. However, there are cases in +which the map from logical to physical elements is critical. One example is +W-tiling where we have code to do W-tiled encoding and decoding in the shader +for doing stencil blits because the hardware does not allow us to render to +W-tiled surfaces. + +There are many ways to mathematically describe the mapping from logical +elements to physical bytes. In the PRMs they give a very complicated set of +formulas involving lots of multiplication, modulus, and sums that show you how +to compute the mapping. With a little creativity, you can easily reduce those +to a set of bit shifts and ORs. By far the simplest formulation, however, is +as a mapping from the bits of the texture coordinates to bits in the address. +Suppose that :math:`(u, v)` is location of a 1-byte element within a tile. If +you represent :math:`u` as :math:`u_n u_{n-1} \cdots u_2 u_1 u_0` where +:math:`u_0` is the LSB and :math:`u_n` is the MSB of :math:`u` and similarly +:math:`v = v_m v_{m-1} \cdots v_2 v_1 v_0`, then the bits of the address within +the tile are given by the table below: + +=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== + Tiling 11 10 9 8 7 6 5 4 3 2 1 0 +=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== +:cpp:enumerator:`isl_tiling::ISL_TILING_X` :math:`v_3` :math:`v_2` :math:`v_1` :math:`v_0` :math:`u_7` :math:`u_6` :math:`u_5` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0` +:cpp:enumerator:`isl_tiling::ISL_TILING_Y0` :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_4` :math:`v_3` :math:`v_2` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0` +:cpp:enumerator:`isl_tiling::ISL_TILING_W` :math:`u_5` :math:`u_4` :math:`u_3` :math:`v_5` :math:`v_4` :math:`v_3` :math:`v_2` :math:`u_2` :math:`v_1` :math:`u_1` :math:`v_0` :math:`u_0` +=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== + +Constructing the mapping this way makes a lot of sense when you think about +hardware. It may seem complex on paper but "simple" things such as addition +are relatively expensive in hardware while interleaving bits in a well-defined +pattern is practically free. For a format that has more than one byte per +element, you simply chop bits off the bottom of the pattern, hard-code them to +0, and adjust bit indices as needed. For a 128-bit format, for instance, the +Y-tiled pattern becomes u_2 u_1 u_0 v_4 v_3 v_2 v_1 v_0. The Sky Lake PRM +Vol. 5 in the section "2D Surfaces" contains an expanded version of the above +table (which we will not repeat here) that also includes the bit patterns for +the Ys and Yf tiling formats. diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c index 9c45798cb7b..f60cc554430 100644 --- a/src/intel/isl/isl.c +++ b/src/intel/isl/isl.c @@ -306,9 +306,12 @@ isl_device_get_sample_counts(struct isl_device *dev) } /** + * Returns an isl_tile_info representation of the given isl_tiling when + * combined with a format of the given size. + * * @param[out] info is written only on success */ -static void +void isl_tiling_get_info(enum isl_tiling tiling, uint32_t format_bpb, struct isl_tile_info *tile_info) diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h index 2a03bc847dc..3ae038f1fe0 100644 --- a/src/intel/isl/isl.h +++ b/src/intel/isl/isl.h @@ -558,18 +558,22 @@ enum isl_txc { }; /** - * @brief Hardware tile mode + * Describes the memory tiling of a surface * - * WARNING: These values differ from the hardware enum values, which are - * unstable across hardware generations. + * This differs from the HW enum values used to represent tiling. The bits + * used by hardware have varried significantly over the years from the + * "Tile Walk" bit on old pre-Broadwell parts to the "Tile Mode" enum on + * Broadwell to the combination of "Tile Mode" and "Tiled Resource Mode" on + * Skylake. This enum represents them all in a consistent manner and in one + * place. * * Note that legacy Y tiling is ISL_TILING_Y0 instead of ISL_TILING_Y, to * clearly distinguish it from Yf and Ys. */ enum isl_tiling { - ISL_TILING_LINEAR = 0, - ISL_TILING_W, - ISL_TILING_X, + ISL_TILING_LINEAR = 0, /**< Linear, or no tiling */ + ISL_TILING_W, /**< W tiling */ + ISL_TILING_X, /**< X tiling */ ISL_TILING_Y0, /**< Legacy Y tiling */ ISL_TILING_Yf, /**< Standard 4K tiling. The 'f' means "four". */ ISL_TILING_Ys, /**< Standard 64K tiling. The 's' means "sixty-four". */ @@ -1244,9 +1248,11 @@ struct isl_format_layout { }; struct isl_tile_info { + /** Tiling represented by this isl_tile_info */ enum isl_tiling tiling; - /* The size (in bits per block) of a single surface element + /** + * The size (in bits per block) of a single surface element * * For surfaces with power-of-two formats, this is the same as * isl_format_layout::bpb. For non-power-of-two formats it may be smaller. @@ -1265,7 +1271,8 @@ struct isl_tile_info { */ uint32_t format_bpb; - /** The logical size of the tile in units of format_bpb size elements + /** + * The logical size of the tile in units of format_bpb size elements * * This field determines how a given surface is cut up into tiles. It is * used to compute the size of a surface in tiles and can be used to @@ -1275,7 +1282,8 @@ struct isl_tile_info { */ struct isl_extent4d logical_extent_el; - /** The physical size of the tile in bytes and rows of bytes + /** + * The physical size of the tile in bytes and rows of bytes * * This field determines how the tiles of a surface are physically layed * out in memory. The logical and physical tile extent are frequently the @@ -1877,6 +1885,11 @@ bool isl_has_matching_typed_storage_image_format(const struct intel_device_info *devinfo, enum isl_format fmt); +void +isl_tiling_get_info(enum isl_tiling tiling, + uint32_t format_bpb, + struct isl_tile_info *tile_info); + static inline enum isl_tiling isl_tiling_flag_to_enum(isl_tiling_flags_t flag) {