calc_blockdep always returned MAX_BLOCKDEP without checking if the
previous op writes to a buffer the current op reads from. This let
the NPU start reading before the previous write was done.
Add overlap check between previous OFM and current IFM so we set
blockdep to 0 when they share the same buffer.
Update ethos-imx93-fails.txt to remove the tests that now pass.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
Replace the two functions simplified_elementwise_add_sub_scale and
eltwise_emit_ofm_scaling with a single advanced_elementwise_add_sub_scale
that follows the ethos-u-vela naming. Remove the large block of
commented out Vela Python code.
No functional change.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
The upscale field was a bool which happened to work since true maps
to 1 which is NEAREST in the hardware. Change from bool to an enum
ethosu_upscale_mode so the intent is clear and we dont rely on the
bool-to-int mapping.
Also add a check in operation_supported so RESIZE only accepts 2x
upscaling since thats what the NPU can do with IFM_UPSCALE. Other
sizes fall back to CPU.
Keep the original zero_points from tensors in RESIZE and STRIDED_SLICE
instead of forcing them to 0 since the requantization needs them.
Fixes the RESIZE_NEAREST_NEIGHBOR operations in EfficientDet-Lite
models that use BiFPN with 2x nearest neighbor upsampling.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
fill_weights subtracted a single zero_point from all weights which
did not handle models with per-channel zero_points. Use the
per-channel zero_point for each output channel when available.
Also decouple the zero_points copy from the scales copy in the lower
pass so they are handled independently.
Suggested-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
For those models with coefficients that have different quantization
parameters for each channel.
The NPU can handle per-channel scales as can be seen in
fill_scale_and_biases(), which already iterates per output channel.
Activation tensors (input/output) don't have per-channel quantization.
- Add scales/zero_points arrays to ethosu_kernel struct
- Copy per-channel scales from weight tensor in lower pass
- Use per-channel scale when computing conv_scale in coefs
- Allow per-channel quantization in operation_supported check
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
The old code would assert when a model has multiple scales but only
one zero_point. This is common for symmetric quantization where all
channels share the same zero_point (typically 0).
Handle this by replicating the single zero_point for all channels
instead of crashing.
Fixes MoveNet models using per-channel quantization.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39594>
This will surely lose performance in some cases, this is a temporary fix
to align ourselves with how the Vulkan compiler works. We might be able
to us indirect varyings directly in the future depending on how we
handle their memory layout.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40391>
When the pipe_resource pointer returned by resource_create is NULL, the
process importing the handle into the underlying Vulkan driver is known
to have failed, and the handle importing process shouldn't continue.
Just return NULL in this case to prevent further check of pres being
non-NULL.
This also fixes the issue that renderonly code lacks check for non-NULL
pres, and the conversion of pipe_resource to zink_resource in renderonly
codepath is now gone because of a converted zink_resource is available
above.
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40490>
This code path is usually used by lavapipe when importing dmabufs, not
for output.
The resulting size_required is then used to calculate the size
requirements for VkMemoryRequirements2 etc. Requiring a multiple of
LP_RASTER_BLOCK_SIZE - 4 - can eventually result in lavapipe rejecting
dmabuf imports.
An example is YUV420 at a resolution of 1680x1050 produced by Gstreamer
1.28 - e.g. from a screencasts. In this case we currently compute a size
of 3235840, while other drivers like radv compute 3225600. The actual
size is 3227648, fitting into the later but not the former.
Removing the alignment brings lavapipe in line with other drivers.
Cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40424>
The preprocessor symbol we want is `PAN_ARCH`, not `MALI_ARCH`.
Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40459>
We were computing some positions using `void*` rather than pointers to
the appropriate structures. This caused bad pointers, the effect of
which depended on the current memory environment -- tests related to
texel buffers could pass or not depending on what other tests had run
previously.
Fixes: a21ee564e2 ("pan/bi: Make texel buffers use Attribute Buffers")
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40459>
Allows a uniform name to be passed to force_explicit_uniform_loc_zero
allowing us to set that uniform to an explicit location of zero.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40448>
The runtime builds a final pipeline state with pointers to structures
coming from the associated pipelines libraries.
So far it has considered that the viewMask was part of a structure
together with the rest of the renderpass information. This information
can be specified in pre-raster, fragment & color-output state groups
and it was assumed would be consistent for all 3. And the runtime
currently takes the pointer to the structure from the last pipeline
library (color output).
Some coming spec/cts will clarify that the viewMask only needs to be
specified for pre-raster & fragment groups, making the value in the
color-output group untrustworthy.
This change creates a new state structure to hold the viewMask on its
own so it is only gather on pre-raster & fragment groups.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Aitor Camacho <aitor@lunarg.com> (kosmickrisp)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (turnip)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Frank Binns <frank.binns@imgtec.com> (powervr)
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (panvk)
Royaled-yes-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (lavapipe)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39940>
There's no reason to adjust the matrix once unconditionally, and then
agaiun conditionally after overwriting the matrix. If we just make the
second adjustment unconditional, we should be good.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40175>
This is just cleaning up some naming, so we consistently refer to things
dealing with luma as "y" and things dealing with chroma as "c".
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40175>
DRM GEM mmap offsets go through drm_vma_offset_exact_lookup_locked()
which requires an exact match on the GEM offset. Passing a non-zero
bo_offset causes EINVAL because the kernel can't find the BO at the
shifted offset. Every caller already passes bo_offset=0 and maps the
full BO size, so drop those parameters and use bo->size directly.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40315>
If we are growing a CL, double the BO allocation size to reduce the
number of allocations in large command buffers. This showed significant
performance improvements in workloads with large CLs (e.g. WebGL
Aquarium with 5000 fishes). Also, this is the same strategy that v3dv
uses.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40425>
Android requires VK_ANDROID_native_buffer v8+ to advertise KHR_swapchain
v69+ for supporting wsi image alias. On newer Android, the same is
required to advertise EXT_swapchain_maintenance1 to avoid memory hit
upon swapchain creation/recreation. This change completes the support.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40384>
Vertex shaders shorter than four instructions can hard-lock R3xx GPUs.
This seems to happen in combination with a small vertex count. This was
seen before, most notably with dummy shaders, but the earlier fix only
removed those dummy shaders, so some occurrences could still slip
through the cracks. Pad all vertex shaders to four instructions on R3xx.
Reviewed-by: Filip Gawin <filip@gawin.net>
Fixes: c6aa639ba9 ("r300: skip draws instead of using a dummy vertex shader")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/337
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40331>
In the case that a Wayland compositor is ran with kmsro, the render
driver node will be passed to the client instead of the display node.
With Zink being able to use with KMSRO, now we cannot expect the render
driver node to from be a known driver name.
Fallback render nodes with unknown driver names to Zink instead of
KMSRO, because KMSRO on non-KMS FD is meaningless.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-By: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38810>