Payload size retrieval can greatly benefit from using SIMD to sum up
the 16 6-bit packed sizes. This commit proposes an optimized version
using Arm A64 NEON intrinsics. This was measured on a Rock 5B to be ~2
times faster than the original.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>