We don't have v_permlane64_b32 yet, but we can still optimize it using
shared vgprs. Using the DPP16 row mask, we can even avoid writing exec.
With v0 input/output and v24/v25 as shared vgprs, this results in:
v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>
This should avoid some SIMD stalls.
I think this special case was added to try to handle this case:
First Instruction: WMMA
Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C
Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction
If I read it correctly, we shouldn't need a delay if the type is the same and no
modifier is used. That's kind of complex to handle, so leave it for now.
Not inserting any delays likely hurts more than this.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>
Retrieving memory requirement size and alignment via
anv_image_get_memory_requirements() return's 0 before surfaces are added
by resolve_anb_image() and will assert in align64() when align is 0:
Abort message: '../src/util/u_math.h:713: uint64_t align64(uint64_t, uint64_t): assertion "util_is_power_of_two_nonzero64(alignment)" failed'
Refactor out anv_image_bind_from_gralloc() into resolve_anb_image() so
the checks are performed after the surface is adds.
Resolving also requires API 29 so return VK_ERROR_EXTENSION_NOT_PRESENT
without it.
Fixes: 43cb986d9e ("anv/android: resolve ANB swapchain images on bind")
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36060>
Introduce a scratch FBD that will be used in the event of IR. Also store
a subset of FBD words that are needed to construct the relevant IR FBD
in the scratch FBD memory.
This patch also increase the TILER_OOM_HANDLER_MAX_SIZE from 512 to 1024
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34733>
Also initialize allocator to NULL for tiler OOM handler and assert
if capacity is sufficient in the event that allocator is NULL
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34733>
There is nothing guaranteeing that the currently used sampler view
indexes will be contiguous, which means the resulting extra sampler
views created by st_get_sampler_views may not be placed at the end of
the resulting array. Therefore, the exact indexes of these views must
be passed to the caller for releasing instead of simply assuming that
they will always be placed at the end.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13363
Fixes: 73da0dcddc ("gallium: eliminate frontend refcounting from samplerviews")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36397>
This operation could be implemented in the TP cores, but this operation
tends to be added by convertors that export to TFLite from frameworks
with different channel order, and end up being no-ops.
Once we move to NIR for tensor operations, we can support this operation
and then remove it when we have an explicit transpose operation that is
negated by a consequent transpose operation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34629>
This old code was needed to get the backend assembler to do the
right thing when emitting index and address registers, but sfn
is handling this now so we can drop this.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36101>
Mark VK_EXT_robustness2 as supported on pank/v10+, as it's been exposed
for a little while.
Fixes: ef91ad64d5 ("panvk/v10+: Advertise nullDescriptor support")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36408>
There's been a few years since these flags existed, let's drop them.
Fixes: ea03d0652d ("panfrost: Remove PAN_MESA_DEBUG=deqp")
Fixes: 7c7c38b126 ("panfrost: Remove unused debug parameter")
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36310>
We have an SSA register allocator now that should be a lot faster on
most (if not all) of these tests now. Let's re-enable them to gain more
CI coverage.
Acked-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36310>
This change is inspired by 1021d6fe62 ("dri: deal
with ARGB1555")
This issue is now mostly fixed with
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36081
Anyway, the dri3_cpp_for_fourcc entry is still missing and should
be added.
This change is useful for instance with r600 which
can handle this format.
Note: this mode was generated at the "glx visuals" level
on r600 by default before the commit d709b42180.
This change was tested on r600 palm and cayman with X11
loaded with a version of mesa generating this very mode:
glx/glx-visuals-depth -pixmap: fail pass
glx/glx-visuals-stencil -pixmap: fail pass
Fixes: 00aa095d53 ("dri: Support 1555/4444 formats")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34294>
When the rasterizer state is updated, we only need to update
the scissoring state if the rasterizer scissor state has changed.
This avoids re-sending the same scissor state any time the rasterizer
is changed.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36352>
We need to update the CFG_BITS packet if the early_fragment_test status
changed vs previous draw call. But we don't need to update it every
time the FS is changed, we only need to update it when disable_ez
value is different from previous FS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36352>