A reshape operation just changes the dimensions of a tensor, but doesn't
change the data at all. So we just point the OFM to the IFM data and
we're done.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The ethosu_lower_add() function can handle other element wise operations
such as multiply, minimum, and maximum, so rename it in preparation to
add those operations.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
Add support for fully-connected convolution. FC convolution lowering is
nearly the same, so refactor the existing convolution code to support
both.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
For axis 1 concatenation, the OFM strides need to match the IFM strides.
Presumably axis -3 can also be supported, but there haven't been any
models with -3. Not sure what axis 2 would need either.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
Some of the tensor info is needed at various points during lowering.
Instead of storing the tensor index and looking it up every time, store
a point to the tensor struct instead.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The struct ethosu_operation structure has the same initialization in
multiple ops. More ops with the same duplication are about to be added.
Move this out to a common initializer function.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
The vela compiler defines shift as signed and some upcoming LUT code
allows for negative shifts, so make shift signed everywhere.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39975>
Because of a previous refactor, pco_last_igrp was incorrectly changed to return
the first entry in a linked list instead of the last. Update pco_last_igrp to
return the last entry in a linked list.
The following CTS tests now pass:
dEQP-GLES3.functional.shaders.switch.conditional_fall_through_2_dynamic_fragment
dEQP-GLES3.functional.shaders.switch.conditional_fall_through_dynamic_fragment
dEQP-GLES3.functional.shaders.switch.conditional_fall_through_uniform_fragment
Fixes: 719ece42c0 ("pco: Switch back to util/list")
Signed-off-by: Duncan Brawley <duncan.brawley@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41125>
The Vulkan runtime and panvk already handle unused attachments
correctly. Enable the extension and feature flags.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40920>
Just like the fix for nvk, just drop this in the GL driver as well.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41143>
BTD unit will keep accumulating the threads and then eventually dispatch
those active threads once it reaches the counter.
I guess dispatching too fast will not have full occupancy at the BTD
unit, instead we just pick the half of max value for counter.
This patch also add drirc option to dispatch_timeout_counter and tweak
values internally with respect to HW limits. Default value we have right
now is 512 clocks, we can for sure tune it per app.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40733>
Since field is split in between multiple fields, we have to manually
write the values and refer to Bspec 43851 for exact values.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40733>
For the future patches in Anv, we will be expanding the driconf enum to
have 19 entries, so extend array from 5 to 20 to account that.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40733>
The new tool has much better image diffing presentation (thanks to
Danilo's work on turnip's private trace CI), better performance, flake
checking within a single run, parallelized downloads along with replays,
system monitoring for replay debug (OOMs especially), and DXVK support
(I've added a few traces, but not most of the collection because I didn't
want to block on stabilizing this job with everything).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41115>
The new tool has much better image diffing presentation (thanks to
Danilo's work on turnip's private trace CI), better performance, flake
checking within a single run, parallelized downloads along with replays,
system monitoring for replay debug (OOMs especially), and DXVK support
(I've added a few traces, but not most of the collection because I didn't
want to block on stabilizing this job with everything).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41115>
When crosvm crashes, the `exit_code` file might not exist or might
contain unexpected garbage (multi-line output or spaces).
Because $CROSVM_RET was unquoted in comparisons, this led to intermittent
"too many arguments" bash syntax errors, which masked the true failure.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41124>
Based on 075d78115e ("panvk: implement deferred image creation"),
8aa2f1a94f ("panvk: add panvk_android_get_wsi_memory for AHB spec v8+"),
and 66bbd9eec8 ("panvk: implement AHB image deferred init and memory alloc").
Defer image initialization for both ANB alias images (gralloc v8+)
and AHB-backed images using vk_android_init_deferred_image() to
deep-copy the VkImageCreateInfo at vkCreateImage time.
For ANB alias images, tu_image_init() and tu_image_update_layout()
run at vkBindImageMemory2 time via tu_android_get_wsi_memory() when
the native buffer arrives.
For AHB images, tu_image_init() and tu_image_update_layout() run at
vkAllocateMemory time when the AHardwareBuffer handle is available
via dedicated allocation.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40635>
Based on 752ea7f6df ("panvk: resolve ANB (pre spec v8)").
Replace the hardcoded memoryTypeIndex 0 and lseek-based allocationSize
with proper GetImageMemoryRequirements and GetMemoryFdPropertiesKHR
queries.
Also fix a missing error check on os_dupfd_cloexec().
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40635>
Add a new helper based on panvk's implementation added in
075d78115e ("panvk: implement deferred image creation").
vk_android_init_deferred_image deep-copies and sanitizes a
VkImageCreateInfo chain for deferred Android native buffer (ANB)
alias image creation.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40635>
Fixes assertion failure when MESA_SPIRV_DUMP_PATH is set for OpenCL
programs.
Signed-off-by: Eric Guo <eric.guo@nxp.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41097>
reduces hello world kernel 57 -> 44 inst on jay. why do we have two opcodes that
do literally the same thing? :/
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41085>
On Xe, we have this bit reversed. It's called Thread preemption Disable.
On Xe2+ (Bspec 56590), it's called Thread preemption with option
enabled/disabled.
AFAIK, we don't support mid-thread preemption. This patch set values
properly according to bspec.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41120>
Four linked D3D12 pipeline-validation problems with GLSL TCS on DXIL:
1) dxil_nir_kill_unused_outputs killed TCS outputs read back by the
patch-constant function after a barrier, zeroing the tess factors.
Keep shader_out locations with any intra-shader load_deref live
regardless of next_stage_read_mask.
2) is_dead_in_variable dropped TES padding placeholders (no local
uses) in nir_remove_dead_variables. Also honor
prev_stage_written_mask so padded TES inputs stay alive.
3) Preserving (1) leaves HS with outputs the DS doesn't declare,
breaking pipeline validation (e.g. piglit's barrier.shader_test).
Add dxil_nir_pad_tes_input_signature, called from both link paths,
to synthesize matching TES inputs (reusing each TCS output's type
so sig shape and stride match byte-for-byte) plus the tess-level
inputs -- subsuming the tess-level-only block previously in
dxil_spirv_nir_link. Scope the per-variable padding to TCS
outputs that TCS itself reads back via load_deref: outputs that
neither TES nor TCS consumes get killed from the HS signature,
so padding them into DS would make the DS input signature longer
than HS output and break validation for SSO pipelines whose TCS
declares unused per-patch writes (arb_separate_shader_objects/
mix-and-match-tcs-tes).
4) remove_hs_intrinsics rewrote load_output but not
load_per_vertex_output in HS main. With (1) keeping outputs alive,
GLSL reads of outputs in main whose result survives DCE (UAV
atomics, non-tess per-vertex output writes) left
LoadOutputControlPoint in the control-point function, which dxil.dll
rejects outside the PCF (CreatePipelineState then fails with
E_INVALIDARG). Treat load_per_vertex_output like load_output.
Validated on piglit arb_tessellation_shader/execution (WARP + DXC
1.8.2403): barrier now passes; the previously-crashing
tcs-output-unmatched and variable-indexing/tcs-output-array-* fail
gracefully matching baseline; isoline/isoline-no-tcs remain flakes
(pre-existing canary corruption, unrelated).
d3d12-quick_shader.txt drops barrier; d3d12-flakes.txt adds
isoline-no-tcs alongside isoline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41028>
get_tessellator_output_primitive used to unconditionally invert CW<->CCW
on the assumption the input was GL-origin (lower-left). That was wrong
for any upper-left caller — including spirv_to_dxil, whose SPIR-V sources
(DXC, glslang) already align with D3D winding.
Make nir_to_dxil copy info.tess.ccw through and expect upper-left. The
d3d12 gallium driver (GL) flips before the conversion to preserve its
output. spirv_to_dxil and dozen (Vulkan, UPPER_LEFT default) are unchanged.
Assisted-by: Claude Opus 4.7 <noreply@anthropic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41028>