The programming model matches very closely to that of NVIDIA's NVDLA.
Enough is implemented to run SSDLite MobileDet with roughly the same
performance as the blob (when running on a single NPU core).
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698>