For an atomic with a divergent addr generates a CFG grouping the same addrs
values together and emits a single atomic with fused data covering
the subgroup. Lanes with other addr values perform a default atomic.
Co-authored-by: Jhanani Thiagarajan <jhanani.thiagarajan@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631>
The new version updates the default Mesa version to 26.1.0-devel.
This is used for booting the VM, after which point the drivers are
replaced by the ones built in the Mesa CI pipeline.
Fixes GPU faults with ANGLE on Turnip.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40010>
Enable the virtio freedreno kernel mode driver in the debian-android
build. This will be used by Cuttlefish virtual machines.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40010>
Setting the VK_DRIVER variable for lavapipe jobs simplifies the driver
replacement logic while keeping all existing paths working.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40010>
Cuttlefish usually boots within 2-3 minutes, and this ensures logs are
saved if the boot process hangs or fails.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40010>
Huge thanks to Laura and Doug for updating the EC and AP firmware, and
for switching the network adapter across all trogdor Chromebooks,
enabling them to boot Cuttlefish.
Also limit the concurrency to 6 for now.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40010>
nir_builder_alu_instr_finish_and_insert initialized the def's bit_size
and num_components so we should set them afterwards.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Fixes: c66967b5cb ("nir: add nir_opt_varyings, new pass optimizing and compacting varyings")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40651>
Thanks to Konstantin for pointing out that we really don't need atomics
here. We can use the IR offset to get the slot and keep stuffing the
instance address in it. Header already writes the instance count for us.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40734>
Now that all callers of ethosu_allocate_feature_map() are in ethosu_lower.c,
move it there too.
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40719>
The IFM and OFM were already allocated by the call to allocate_feature_maps()
in ethosu_lower_convolution().
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40719>
The U85 uses average mode for kernel sizes less than or equal to 8x8 and
sum mode for larger (in either dimension) kernel sizes. According to the
U85 TRM, the average and sum modes have the following constraints:
average - Average pooling up to 8x8, inbuilt scale only
sum - Sum or average pooling, per-channel, or global scale
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40719>
... as this can lead to a deadlock with the following sequence:
Time1: guest-thread-1: vkDestroyImageView() called
Time2: VkEncoder grabs seqno 1
Time3: guest-thread-2: vkQueueSubmit() called
Time4: ResourceTracker::on_vkQueueSubmitTemplate() locks
mLock for using `info_VkFence`
Time5: ResourceTracker::on_vkQueueSubmitTemplate() calls
enc->vkQueueWaitIdle()
Time6: VkEncoder grabs seqno 2
Time7: VkEncoder sends the vkQueueWaitIdle with seqno
2 via ASG to host
Time8: VkEncoder waits for the `VkResult` from the
host via `stream->read()`
Time9: guest-thread-1: VkEncoder calls sResourceTracker->destroyMapping()
->mapHandles_VkImageView((VkBuffer*)&buffer);
which calls
ResourceTracker::unregister_VkImageView()
ResourceTracker::unregister_VkImageView() tries to
lock mLock to erase the info struct
!!! DEADLOCKED HERE !!!
guest-thread-1 is stuck waiting on mLock (currently locked by
guest-thread-2) before it would `stream->flush();` to finishing
sending the vkDestroyImageView() command to the host and potentially
ping its corresponding host-render-thread-1.
guest-thread-2 is stuck waiting on the result from host-render-thread-2
but host-render-thread-2 won't progress until host-render-thread-1
finishes seqno 1 which needs guest-thread-1 to finish sending/pinging.
Android equivalent change ag/39258728 for b/498964194
Test: cvd create --gpu_mode=gfxstream_guest_angle_host_swiftshader
open maps
pan/zoom/etc for a couple minutes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40767>
Previously, we only checked if the hardware duration was greater
than the requested sample period by 1000 ns. This can lead the
hardware duration to be rejected and use the next cycle, which
is double the size of the current duration.
At larger requested sample size, this can mean getting a hardware
duration of 1.7 ms for a requested sample period of 1 ms.
To fix this, we'll scale the check so that it uses 67% of the
requested sample period as the reject threshold. This way, if the
hardware duration is below 67%, it's guaranteed to be within
100%-133% of the requested sample period on the next hardware interval.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40735>
Instead of generating special single source send in some cases, always
use the split send (called SENDS pre-Xe, and the only option in Xe).
Having code-path for single source was relevant for old Gfx versions,
but for Gfx9+ split send is always available.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40755>
- We won't be able to rely on u_trace_fini leaving u_trace in
valid state, so u_trace_init should be called after it.
- There probably was a double-free of u_trace_submission_data.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40728>
The commit that introduced 9_9_9_E5 RB support mistakenly broke
fake-format blits (such as compressed formats, etc). Re-order the
logic to restore fake-format blits.
Fixes iova fault in manhattan. Not to mention inadvertantly falling
off of the A2D path for a lot of blits.
Fixes: 9dc3410512 ("tu: Add support for VK_FORMAT_E5B9G9R9_UFLOAT_PACK32 color attachments")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40754>
The external move region frame number was continuously generated. However, the current POC was reset based on IDR.
Modified the logic of validation and logged a warning in case of mismatch.
Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40756>
st_cb_bitmap appends a temporary bitmap sampler view to the sampler
view array passed to set_sampler_views().
1a5c660ef5 changed this path to only release the extra YUV views
returned by st_get_sampler_views(), but the temporary bitmap view is
created locally and is not part of extra_sampler_views. It therefore
stopped being released so release the temporary bitmap sampler view
explicitly after drawing the bitmap quad.
Fixes: 1a5c660ef5 ("st/bitmap: only release YUV samplerviews")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40694>
Doesn't really help many shaders, but I've seen a couple that turn from
MUFU into F2F(MUFU.F16(F2F)). Though this might be as well a limitation
of related code, e.g. returning F32 from TEX, and not use TEX.F16 instead.
Totals:
CodeSize: 8662337424 -> 8662336960 (-0.00%)
Static cycle count: 4718044491 -> 4718044554 (+0.00%); split: -0.00%, +0.00%
Totals from 7 (0.00% of 1163204) affected shaders:
CodeSize: 236480 -> 236016 (-0.20%)
Static cycle count: 2108061 -> 2108124 (+0.00%); split: -0.01%, +0.01%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392>
Instructions that take a F16 value can generally select which component to
read from. This lets us get rid of some PRMTs.
This also cleans up partial support for it for F2F and streamlines
everything into an uniform model as previously it wasn't wired up
generally and copy prop didn't always propagate the swizzle through.
This also makes it uneccessary to apply a Xx swizzle to scalar FP16
sources.
Totals from 907 (0.08% of 1163204) affected shaders:
CodeSize: 40856816 -> 40843408 (-0.03%); split: -0.03%, +0.00%
Static cycle count: 20898101 -> 20895619 (-0.01%); split: -0.01%, +0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392>