mirror of
https://gitlab.freedesktop.org/mesa/mesa.git
synced 2026-05-21 00:18:09 +02:00
Pre-patch, anv_descriptor_pool used a free list for host allocations
that never merged adjacent free blocks. If the pool only allocated
fixed-sized blocks, then this would not be a problem. But the pool
allocations are variable-sized, and this caused over half of the pool's
memory to be consumed by unusable free blocks in some workloads, causing
unnecessary memory footprint.
Replacing the free list with util_vma_heap, which does merge adjacent
free blocks, fixes the memory explosion in the target workload.
Disdavantges of util_vma_heap compared to the free list:
- The heap calls malloc() when a new hole is created.
- The heap calls free() when a hole disappears or is merged with an
adjacent hole.
- The Vulkan spec expects descriptor set creation/destruction to be
thread-local lockless in the common case. For workloads that
create/destroy with high frequency, malloc/free may cause overhead.
Profiling is needed.
Tested with a ChromeOS internal TensorFlow benchmark, provided by
package 'tensorflow', running with its OpenCL backend on clvk.
cmdline: benchmark_model --graph=mn2.tflite --use_gpu=true --min_secs=60
gpu: adl
memory footprint from start of benchmark:
before: init=132.691MB max=227.684MB
after: init=134.988MB max=134.988MB
Reported-by: Romaric Jodin <rjodin@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20289>
|
||
|---|---|---|
| .. | ||
| grl | ||
| layers | ||
| tests | ||
| anv_allocator.c | ||
| anv_android.c | ||
| anv_android.h | ||
| anv_android_stubs.c | ||
| anv_batch_chain.c | ||
| anv_blorp.c | ||
| anv_bo_sync.c | ||
| anv_cmd_buffer.c | ||
| anv_descriptor_set.c | ||
| anv_device.c | ||
| anv_formats.c | ||
| anv_gem.c | ||
| anv_gem_stubs.c | ||
| anv_genX.h | ||
| anv_image.c | ||
| anv_measure.c | ||
| anv_measure.h | ||
| anv_nir.h | ||
| anv_nir_apply_pipeline_layout.c | ||
| anv_nir_compute_push_layout.c | ||
| anv_nir_lower_multiview.c | ||
| anv_nir_lower_ubo_loads.c | ||
| anv_nir_lower_ycbcr_textures.c | ||
| anv_nir_push_descriptor_analysis.c | ||
| anv_perf.c | ||
| anv_pipeline.c | ||
| anv_pipeline_cache.c | ||
| anv_private.h | ||
| anv_queue.c | ||
| anv_util.c | ||
| anv_utrace.c | ||
| anv_wsi.c | ||
| genX_acceleration_structure.c | ||
| genX_blorp_exec.c | ||
| genX_cmd_buffer.c | ||
| genX_cmd_draw_helpers.h | ||
| genX_gpu_memcpy.c | ||
| genX_pipeline.c | ||
| genX_query.c | ||
| genX_state.c | ||
| gfx8_cmd_buffer.c | ||
| meson.build | ||
| TODO | ||