mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2025-12-25 00:00:11 +01:00
This has gotten complicated enough that we need somewhere outside of the driver itself to give an overall flow of how the feature is implemented. This includes a few things that are enabled in the subsequent commits, specifically the LRZ parts. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36475>
187 lines
11 KiB
ReStructuredText
187 lines
11 KiB
ReStructuredText
Fragment Density Map
|
|
====================
|
|
|
|
``VK_EXT_fragment_density_map`` is an extension which is intended to allow
|
|
users to render parts of the screen at a lower resolution. It is designed to be
|
|
implemented on tiled rendering GPU architectures such as Adreno, and the
|
|
intention is that it is implemented by rendering some of the tiles at a lower
|
|
resolution and scaling them up when resolving to system memory or when sampling
|
|
the resulting image. This inherently means that it is "all or nothing," that
|
|
is, it must be enabled or disabled for the entire render pass. While the idea is
|
|
simple, the implementation in turnip is very subtle with lots of
|
|
interactions with various different features. This page attempts to document
|
|
the main principles behind the implementation.
|
|
|
|
Coordinate Space Soup
|
|
---------------------
|
|
|
|
In order to render a tile at lower resolution, we have to override the user's
|
|
viewport and scissor for each tile depending on the scaling factor provided by
|
|
the user. This becomes complicated fast, so let's start by defining a few
|
|
coordinate spaces that we'll have to work with.
|
|
|
|
Framebuffer space
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
This is the space of the final rendered image. From the user's perspective
|
|
everything is specified in this space, and fragments created by the rasterizer
|
|
appear to be larger than 1 pixel. But this is not what actually happens in the
|
|
hardware, it is a fiction created by the driver. The other spaces below are
|
|
what the hardware actually "sees".
|
|
|
|
GMEM Space
|
|
^^^^^^^^^^
|
|
|
|
This space exists whenever tiled rendering/GMEM is used, even without FDM. It
|
|
is the space used to access GMEM, with the origin at the upper left of the
|
|
tile. The hardware automatically transforms rendering space into GMEM space
|
|
whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
|
|
origin of this space will be called :math:`b_{cs}`, the common bin start, for
|
|
reasons that are explained below. When using FDM, coordinates in this space
|
|
must be multiplied by the scaling factor :math:`s` derived from the fragment
|
|
density map, or equivalently divided by the fragment area (as defined by the
|
|
Vulkan specification), with the origin still at the upper left of the tile. For
|
|
example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
|
|
without FDM and all coordinates in this space must be divided by 2.
|
|
|
|
Rendering space
|
|
^^^^^^^^^^^^^^^
|
|
|
|
This is the space in which the hardware rasterizer operates and produces
|
|
fragments. Normally this is the same as framebuffer space, but with FDM it is
|
|
not. We transform the viewport and scissor from framebuffer space to
|
|
rendering space by patching them per-tile in the driver and then when we
|
|
resolve the tile we scale the resulting tile back to the correct resolution by
|
|
blitting from the rendering space source to the framebuffer space destination.
|
|
|
|
In order to come up with the correct transform from framebuffer space to
|
|
rendering space, it has to shrink the coordinates by :math:`s` while
|
|
mapping the original bin start in framebuffer space :math:`b_s` to
|
|
:math:`b_{cs}`. Since :math:`b_{cs}` is entirely defined by the driver when
|
|
programming ``*_WINDOW_OFFSET``, one tempting way to do this is to just
|
|
multiply by :math:`s` and define :math:`b_{cs} = b_s * s`. It turns out,
|
|
however, that this doesn't work. A key requirement is to handle cases where the
|
|
same scene is rendered in multiple different views at the same time using
|
|
``VK_KHR_multiview``, as in VR use-cases, and in this case we want :math:`s` to
|
|
vary per view, but :math:`b_{cs}` is always the same for every view because
|
|
there is only one ``*_WINDOW_OFFSET`` register for all layers (hence the name).
|
|
|
|
We follow the blob by leaving :math:`b_{cs}` the same regardless of whether FDM
|
|
is enabled or not. This means that normally :math:`b_s = b_{cs}`, although this
|
|
is not the case if ``VK_EXT_fragment_density_map_offset`` is in use and the
|
|
bins are shifted per-view. Since the coordinates need to be scaled by :math:`s`,
|
|
we know that the transform needs to look like :math:`x' = s * x + o`, where
|
|
only the offset :math:`o` is free. Plugging in the constraint that :math:`b_s`
|
|
maps to :math:`b_{cs}`, we get that :math:`b_{cs} = s * b_s + o` or
|
|
:math:`o = b_{cs} - s * b_s`. This is the function computed by
|
|
``tu_fdm_per_bin_offset`` and used to calculate the transform for the viewport,
|
|
scissor, and ``gl_FragCoord``. One critical thing is that the offset must be an
|
|
integer, or in other words the framebuffer space bin start :math:`b_s` must be
|
|
a multiple of :math:`1 / s`. This is a natural constraint anyway, because if
|
|
it wasn't the case then the bin would start in the middle of a fragment which
|
|
isn't possible to handle correctly.
|
|
|
|
Viewport and Scissor Patching
|
|
-----------------------------
|
|
|
|
In order to have :math:`s` differ per view, we have to be able to override the
|
|
viewport per view. That is, we need to transform the viewport for each view
|
|
differently. If there is only one viewport, then we duplicate the user's
|
|
viewport for each view and transform it using the :math:`b_s` and :math:`s` for
|
|
that view, and we set a "per-view viewport" bit to select the viewport per view
|
|
instead of using the default viewport 0. When
|
|
``VK_VALVE_fragment_density_map_layered`` is in use, we instead have to insert
|
|
shader code to achieve the same thing.
|
|
|
|
If the user specifies multiple viewports but they are per-view because
|
|
``VK_QCOM_multiview_per_view_viewport`` is enabled, then we can just set the
|
|
per-view viewport bit and transform each user viewport individually by the
|
|
corresponding scale. But if the user explicitly writes ``gl_ViewportIndex``,
|
|
then there is nothing we can do and we have to make :math:`s` the same for all
|
|
views by conservatively taking the minimum. Then we apply :math:`s` to all of
|
|
the user-specified viewports.
|
|
|
|
Because the bin size is now per-view, the usual mechanism of
|
|
``*_WINDOW_SCISSOR`` for clipping fragments outside the bin doesn't work.
|
|
Instead the driver needs to intersect the transformed user-specified scissor
|
|
with the transformed rendering-space bin coordinates, effectively replacing
|
|
``*_WINDOW_SCISSOR``.
|
|
|
|
Fragment density map offset
|
|
---------------------------
|
|
|
|
In order to "properly" implement ``VK_EXT_fragment_density_map_offset``, we
|
|
need to add an extra row/column of bins at the end and then shift the binning
|
|
grid up and to the left by an offset :math:`b_o`. This offset is based on the
|
|
user's offset but has the opposite sign, i.e. when shifting the FDM to the left
|
|
we have to shift the binning grid to the right, and once the user's offset
|
|
becomes large enough then we "wrap around" and shift over the scaling factor
|
|
:math:`s` to the next bin. This has to happen per-view. In turnip the function
|
|
that computes :math:`b_o` is called ``tu_bin_offset``. Each tile then gets an
|
|
offseted start :math:`b_s = b_{cs} - b_o` except for the first row/column which
|
|
only shrink in height/width respectively.
|
|
|
|
If we cannot make :math:`s` per-view, then we also cannot make :math:`b_s`
|
|
per-view and so we cannot shift the bins over. Therefore we fall back to only
|
|
shifting where :math:`s` is sampled from, which produces jittery and jarring
|
|
transitions when a bin suddenly changes resolution.
|
|
|
|
Bin merging
|
|
-----------
|
|
|
|
FDM shrinks the size of the bin in GMEM, which results in a lot of wasteful
|
|
unused extra space in GMEM. a7xx mitigates this by introducing "bin merging".
|
|
If two tiles next to each other have the same scaling for each view, then we
|
|
combine them into one tile, as long as the combined size in rendering space
|
|
isn't larger than the original size of an unscaled bin in framebuffer space. We
|
|
can even merge larger groups of tiles. The only hardware feature needed for
|
|
this to work is the ability to merge the visibility streams for the tiles,
|
|
which was added on a7xx by a new bitmask in ``CP_SET_BIN_DATA5`` and variants.
|
|
Only bins within the same visibility stream/VSC pipe can be merged.
|
|
|
|
Hardware scaling registers and LRZ
|
|
----------------------------------
|
|
|
|
One disadvantage of FDM on a6xx is that low-resolution tiles cannot use
|
|
LRZ, because the LRZ hardware is not aware of the transform between framebuffer
|
|
space and rendering space and applies the framebuffer-space LRZ values to the
|
|
rendering-space fragments. In order to fix this, a740 adds new offset and scale
|
|
registers. The offset :math:`o'` is applied to fragment coordinates during
|
|
rasterization *after* LRZ, so that viewport, scissor, and LRZ are in a
|
|
new "LRZ space" while the other operations (resolves and unresolves, and
|
|
attachment writes) still happen in the rendering space which is now offset.
|
|
:math:`o'` is specified for each layer. The scale :math:`s` is the same as
|
|
before, and it is used to multiply the fragment area covered by each LRZ pixel.
|
|
|
|
Without ``VK_EXT_fragment_density_map_offset``, we can simply make LRZ space
|
|
equal to framebuffer space scaled down by :math:`s`. That is, we can set
|
|
:math:`o'` to what :math:`o` was before and then set :math:`o` to 0, only
|
|
scaling down the viewport but not shifting it and letting the hardware handle
|
|
the shift. Then LRZ pixels will be scaled up appropriately and everything will
|
|
work. However, this doesn't work if there is a bin offset :math:`b_o`. In order
|
|
to make binning work, we shift the viewport and scissor by :math:`b_o` when
|
|
binning. Unfortunately the offset registers do not have any effect when
|
|
binning, so rendering space and LRZ space have to be the same when binning, and
|
|
the visibility stream is generated from rendering space. This means that LRZ
|
|
space also has to be shifted over compared to framebuffer space, and the LRZ
|
|
buffer must be overallocated when FDM offset might be used with it (which is
|
|
signalled by ``VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT``) because
|
|
the LRZ image will be shifted by :math:`b_o`.
|
|
|
|
In order for LRZ to work, LRZ space when rendering must be equal to LRZ space
|
|
when binning scaled down by :math:`s`. The origin of LRZ space when binning is
|
|
:math:`-b_o`, and this must be mapped to 0. The transform from
|
|
framebuffer space to LRZ space is :math:`x' = x * s + o`, and the transform
|
|
from framebuffer space to rendering space is :math:`x'' = x * s + o + o'`.
|
|
We get that :math:`o + o' = b_{cs} - b_s * s`, similar to before, and
|
|
:math:`0 = -b_o * s + o` so that :math:`o = b_o * s` and finally
|
|
:math:`o' = b_{cs} - b_s * s - b_o * s`, or after rearranging
|
|
:math:`o' = b_{cs} - (b_s + b_o) * s`. For all tiles except those in the first
|
|
row or column, this simplifies to :math:`o' = b_{cs} - b_{cs} * s` because
|
|
:math:`b_{cs} = b_s + b_o`. For tiles in the first row or column, :math:`b_s`
|
|
and :math:`b_{cs}` are both 0 in one of the coordinates, so it becomes
|
|
:math:`o' = -b_o * s` in that coordinate. This isn't representable in hardware,
|
|
both because it is negative (which can be worked around by artifically
|
|
shifting :math:`b_{cs}`) but more importantly because it may not meet the
|
|
alignment requirements for the hardware register (which is currently 8 pixels).
|
|
We have to just disable LRZ in this case.
|