Thanks to Konstantin for pointing out that we really don't need atomics
here. We can use the IR offset to get the slot and keep stuffing the
instance address in it. Header already writes the instance count for us.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40734>
Now that all callers of ethosu_allocate_feature_map() are in ethosu_lower.c,
move it there too.
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40719>
The IFM and OFM were already allocated by the call to allocate_feature_maps()
in ethosu_lower_convolution().
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40719>
The U85 uses average mode for kernel sizes less than or equal to 8x8 and
sum mode for larger (in either dimension) kernel sizes. According to the
U85 TRM, the average and sum modes have the following constraints:
average - Average pooling up to 8x8, inbuilt scale only
sum - Sum or average pooling, per-channel, or global scale
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40719>
... as this can lead to a deadlock with the following sequence:
Time1: guest-thread-1: vkDestroyImageView() called
Time2: VkEncoder grabs seqno 1
Time3: guest-thread-2: vkQueueSubmit() called
Time4: ResourceTracker::on_vkQueueSubmitTemplate() locks
mLock for using `info_VkFence`
Time5: ResourceTracker::on_vkQueueSubmitTemplate() calls
enc->vkQueueWaitIdle()
Time6: VkEncoder grabs seqno 2
Time7: VkEncoder sends the vkQueueWaitIdle with seqno
2 via ASG to host
Time8: VkEncoder waits for the `VkResult` from the
host via `stream->read()`
Time9: guest-thread-1: VkEncoder calls sResourceTracker->destroyMapping()
->mapHandles_VkImageView((VkBuffer*)&buffer);
which calls
ResourceTracker::unregister_VkImageView()
ResourceTracker::unregister_VkImageView() tries to
lock mLock to erase the info struct
!!! DEADLOCKED HERE !!!
guest-thread-1 is stuck waiting on mLock (currently locked by
guest-thread-2) before it would `stream->flush();` to finishing
sending the vkDestroyImageView() command to the host and potentially
ping its corresponding host-render-thread-1.
guest-thread-2 is stuck waiting on the result from host-render-thread-2
but host-render-thread-2 won't progress until host-render-thread-1
finishes seqno 1 which needs guest-thread-1 to finish sending/pinging.
Android equivalent change ag/39258728 for b/498964194
Test: cvd create --gpu_mode=gfxstream_guest_angle_host_swiftshader
open maps
pan/zoom/etc for a couple minutes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40767>
Previously, we only checked if the hardware duration was greater
than the requested sample period by 1000 ns. This can lead the
hardware duration to be rejected and use the next cycle, which
is double the size of the current duration.
At larger requested sample size, this can mean getting a hardware
duration of 1.7 ms for a requested sample period of 1 ms.
To fix this, we'll scale the check so that it uses 67% of the
requested sample period as the reject threshold. This way, if the
hardware duration is below 67%, it's guaranteed to be within
100%-133% of the requested sample period on the next hardware interval.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40735>
Instead of generating special single source send in some cases, always
use the split send (called SENDS pre-Xe, and the only option in Xe).
Having code-path for single source was relevant for old Gfx versions,
but for Gfx9+ split send is always available.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40755>
- We won't be able to rely on u_trace_fini leaving u_trace in
valid state, so u_trace_init should be called after it.
- There probably was a double-free of u_trace_submission_data.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40728>
The commit that introduced 9_9_9_E5 RB support mistakenly broke
fake-format blits (such as compressed formats, etc). Re-order the
logic to restore fake-format blits.
Fixes iova fault in manhattan. Not to mention inadvertantly falling
off of the A2D path for a lot of blits.
Fixes: 9dc3410512 ("tu: Add support for VK_FORMAT_E5B9G9R9_UFLOAT_PACK32 color attachments")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40754>
The external move region frame number was continuously generated. However, the current POC was reset based on IDR.
Modified the logic of validation and logged a warning in case of mismatch.
Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40756>
st_cb_bitmap appends a temporary bitmap sampler view to the sampler
view array passed to set_sampler_views().
1a5c660ef5 changed this path to only release the extra YUV views
returned by st_get_sampler_views(), but the temporary bitmap view is
created locally and is not part of extra_sampler_views. It therefore
stopped being released so release the temporary bitmap sampler view
explicitly after drawing the bitmap quad.
Fixes: 1a5c660ef5 ("st/bitmap: only release YUV samplerviews")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40694>
Doesn't really help many shaders, but I've seen a couple that turn from
MUFU into F2F(MUFU.F16(F2F)). Though this might be as well a limitation
of related code, e.g. returning F32 from TEX, and not use TEX.F16 instead.
Totals:
CodeSize: 8662337424 -> 8662336960 (-0.00%)
Static cycle count: 4718044491 -> 4718044554 (+0.00%); split: -0.00%, +0.00%
Totals from 7 (0.00% of 1163204) affected shaders:
CodeSize: 236480 -> 236016 (-0.20%)
Static cycle count: 2108061 -> 2108124 (+0.00%); split: -0.01%, +0.01%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392>
Instructions that take a F16 value can generally select which component to
read from. This lets us get rid of some PRMTs.
This also cleans up partial support for it for F2F and streamlines
everything into an uniform model as previously it wasn't wired up
generally and copy prop didn't always propagate the swizzle through.
This also makes it uneccessary to apply a Xx swizzle to scalar FP16
sources.
Totals from 907 (0.08% of 1163204) affected shaders:
CodeSize: 40856816 -> 40843408 (-0.03%); split: -0.03%, +0.00%
Static cycle count: 20898101 -> 20895619 (-0.01%); split: -0.01%, +0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392>
From the OpenCL specification:
`CL_MEM_KERNEL_READ_AND_WRITE`: This flag is only used by
clGetSupportedImageFormats to query image formats that may be both
read from and written to by the same kernel instance. To create a
memory object that may be read from and written to use
CL_MEM_READ_WRITE.
If an application follows the instructions above, i.e. query a list of
supported image formats, using `CL_MEM_KERNEL_READ_AND_WRITE` as
input, and then attempts to create an image using one of the supported
image formats, by calling `clCreateImage` and passing
`CL_MEM_READ_WRITE`, the call to the image creation entry point should
succeed. This instead fails on Mali devices with the error
`CL_IMAGE_FORMAT_NOT_SUPPORTED`.
Rusticl fails when validating the image format against its supported
flags. Formats that support `PIPE_BIND_SHADER_IMAGE` have their
supported flags set as `CL_MEM_WRITE_ONLY` and
`CL_MEM_KERNEL_READ_AND_WRITE`.
This changes the supported CL flags to be `CL_MEM_WRITE_ONLY` for
`PIPE_BIND_SHADER_IMAGE` and `CL_MEM_READ_WRTE |
CL_MEM_KERNEL_READ_AND_WRITE` for `PIPE_BIND_SAMPLER_VIEW |
PIPE_BIND_SHADER_IMAGE`.
Fixes: 3386e142 (rusticl: support read_write images)
Fixes OpenCL-CTS test: `test_image_streams` on Mali. Invocation:
```
test_image_streams write 1D CL_RGB CL_SIGNED_INT8
```
Signed-off-by: Ahmed Hesham <ahmed.hesham@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39692>
Before this, everything was in the giant bifrost_compile.c file, now
preprocess, optimize and postproces are in their own "small"
bifrost_nir.c.
I also removed some dead functions and moved the passes closer to their
usage, (ex, passes only used in preprocess are now just before
preprocess). Otherwise it's all the same code we had before.
Signed-off-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40717>
VCN 5_0_1 uses gfx9 address mode. This was also set in previous
radeon_vcn_dec codes.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40736>
This would read OOB and crash because data type is optional per the
SPIRV spec.
Original patch by Faith Ekstrand <faith.ekstrand@collabora.com>.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40731>
When converting the index buffer from 4-bytes to 2-bytes, we use the
uploader for the job. Since commit b3133e250e we do an uploader alloc
ref, which releases the uploader buffer if there is no enough space,
creating a new one.
The problem happens when we also need this buffer because it is the one
containing the index buffer to convert. This happens, for instance, if
we need to convert the primitives because they are not supported (e.g.,
converting quads to triangles), as this is done
also using the uploader.
The solution is to ensure the uploader's buffer has an extra reference
so when released, it is not destroyed. This can easily achieved by
calling first pipe_buffer_map_range(), which is required to access the
buffer, and it increases the references.
This fixes `spec@!opengl 1.1@longprim`.
Fixes: b3133e250e ("gallium: add pipe_context::resource_release to eliminate buffer refcounting")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40642>