panfrost: Choose hierarchy masks by vertex count

Currently, we always use a hierarchy mask with all levels enabled. While this is
efficient for geometry-heavy workloads like 3D games, it is wasteful for 2D
applications that draw very few vertices. For drawing just a few textured quads,
the overhead of small bin sizes outweighs any performance advantages, so it's a
bit slower. More problematically, small bin sizes require tremendous amounts of
memory for the polygon lists, leading to significant memory consumption (~10MB)
for the polygon list for even the simplest of 2D blits.

To reduce our memory footprint, we need to choose our hierarchy masks more
carefully. In general, we want to allow small bin sizes for geometry-heavy
workloads but not for geometry-light workloads. We estimate vertex count in the
driver as a proxy for this, and use a simple heuristic to select a bin size
based on the estimated vertex count. None of this is an exact science, and the
heuristic could probably be tuned. Nevertheless, the heuristic used (comparing
framebuffer size to vertex count) works well in practice, significantly reducing
the memory footprint of 2D applications like Firefox without hurting the
performance of 3D applications.

I originally wrote this patch while diagnosing high memory footprints on my
Midgard laptop, which is why only Midgard is in scope here. On Bifrost and
Valhall, we have a similar hiearchy mask selection problem. It seems likely that
the same heuristic would work there too, but it's a different code path that I
have not integrated or tested. I'll leave that for the adventurous reader, to
get the memory footprint win there too.

(It's also possible the win is smaller on newer Malis than on Midgard, since Arm
claims they optimized the tiler data structures on the newer parts. There's
probably still some merit to the idea.)

On Mali-T860, glmark2 -bdesktop frametime decreased by 1.35% +/- 0.91% at 95%
confidence, showing a slight win for 2D workloads No statistically significant
difference for glmark2 -bshading:shading=phong, since 3D workloads continue to
use the same hierarchy masks.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
This commit is contained in:
Alyssa Rosenzweig 2023-03-15 21:36:18 -04:00 committed by Marge Bot
parent 1887b26845
commit cd03392c7e

View file

@ -320,12 +320,6 @@ panfrost_choose_tile_size(unsigned width, unsigned height,
return exp_w | (exp_h << 6);
}
/* In the future, a heuristic to choose a tiler hierarchy mask would go here.
* At the moment, we just default to 0xFF, which enables all possible hierarchy
* levels. Overall this yields good performance but presumably incurs a cost in
* memory bandwidth / power consumption / etc, at least on smaller scenes that
* don't really need all the smaller levels enabled */
unsigned
panfrost_choose_hierarchy_mask(unsigned width, unsigned height,
unsigned vertex_count, bool hierarchy)
@ -338,7 +332,45 @@ panfrost_choose_hierarchy_mask(unsigned width, unsigned height,
if (!hierarchy)
return panfrost_choose_tile_size(width, height, vertex_count);
/* Otherwise, default everything on. TODO: Proper tests */
/* Heuristic: choose the largest minimum bin size such that there are an
* average of k vertices per bin at the lowest level. This is modeled as:
*
* k = vertex_count / ((fb width / bin width) * (fb height / bin height))
*
* Bins are square, so solving for bin size = bin width = bin height:
*
* bin size = sqrt(((k) (fb width) (fb height) / vertex count))
*
* k = 4 represents each bin as a QUAD. If the screen is completely tiled
* into nonoverlapping uniform power-of-two squares, then this heuristic sets
* the bin size to the quad size, which seems like an ok choice.
*/
unsigned k = 4;
unsigned log2_min_bin_size =
util_logbase2_ceil((k * width * height) / vertex_count) / 2;
return 0xFF;
/* Do not use bins larger than the framebuffer. They will be empty. */
unsigned log2_max_bin_size = util_logbase2_ceil(MAX2(width, height));
/* For small framebuffers, use one big tile */
log2_min_bin_size = MIN2(log2_min_bin_size, log2_max_bin_size);
/* Clamp to valid bin sizes */
log2_min_bin_size = CLAMP(log2_min_bin_size, MIN_TILE_SHIFT, MAX_TILE_SHIFT);
log2_max_bin_size = CLAMP(log2_max_bin_size, MIN_TILE_SHIFT, MAX_TILE_SHIFT);
/* Bin indices are numbered from 0 started with MIN_TILE_SIZE */
unsigned min_bin_index = log2_min_bin_size - MIN_TILE_SHIFT;
unsigned max_bin_index = log2_max_bin_size - MIN_TILE_SHIFT;
/* Enable up to 8 bins starting from the heuristic selected minimum. 8
* is the implementation specific maximum in supported Midgard devices.
*/
unsigned mask =
(BITFIELD_MASK(8) << min_bin_index) & BITFIELD_MASK(max_bin_index + 1);
assert(mask != 0 && "too few levels");
assert(util_bitcount(mask) <= 8 && "too many levels");
return mask;
}