I keep reaching for this helper but it doesn't exist. So I fixed that.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36142>
I'm guessing the hardware needs to prefetch the whole sampler heap, so if we're
not gonna use it, let's omit it. I don't know if this helps, but it can't hurt.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36127>
Some Chromebooks have unreliable UART, so we fall back to SSH for them.
However, SSH setup adds a 10-15s overhead, so we now restrict its usage
to devices with the "depthcharge" boot method (i.e., Chromebooks).
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35936>
This is already being set as needed everywhere else, and would cause
issues in future work.
Use the relative `install/` path for `HWCI_TEST_SCRIPT` as that's
supported by both HW runners and FDo runners.
A separate MR will fix the `/install/` vs `install/` mess.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36145>
We were allocating a fixed number of temporary registers; this isn't
always enough, and in fact we should have calculated the number of
temporaries required.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 6c64ad934f ("panfrost: spill registers in SSA form")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36135>
Copy between memory and a depth/stencil image requires copying the depth
and stencil aspects in separate calls. For D32S8, this needs to be
special cased in order to handle (de)interleaving.
For image->image copies, deinterleaving is not supported. Aspects must
match between src and dest for non-planar images.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
This is needed for VK_EXT_host_image_copy which, like the buffer<->image
copy commands, treats depth/stencil like separate image planes and
requires copying each separately.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
We don't need to use fixed-size pixel_t types and put the tiling loop in
a macro in order to get good codegen for this. Replacing the fixed-size
types with memcpy/__builtin_assume_aligned, the compiler is still able
to generate multi-word load/store instructions. Without the fixed-size
types, the only advantage of putting this in a macro is to ensure the
code is specialized on size/is_store/shift, but we can get the same
specialization by making the functions ALWAYS_INLINE.
Measured performance in VK_EXT_host_image_copy benchmraks is unchanged,
and generated assembly looks effectively identical to the previous version.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
Since we don't have a CPU implementation of AFBC compression, host copy
is only implemented for u-interleaved tiling.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
This is needed for VK_EXT_host_image_copy.
Most other mesa drivers use a similar approach to implement tiled->tiled
copy, with a few differences. They use a temp buffer sized for only one
tile, don't attempt to tile-align the copies in either the src or dest,
and they don't have the memcpy fast path. I measured performance of a
variety of implementations on a rock5b, and found:
- The fast path for when the copy region is tile-aligned is a 167%
improvement.
- Aligning the temp buffer chunks to src tiles is a 20% improvement.
- Using a 64k buffer instead of a tile-sized buffer is a 14%
improvement. This buffer size appears optimal in my benchmark,
smaller and larger buffers are both slower. Skipping the chunk
approach and just (de)tiling to a temp buffer that fits the whole
image (what NVK does) is also slower.
- I had no luck with attempts at a direct tiled->tiled copy algorithm
that didn't need a temp buffer. The fastest I got was ~1/4 the speed
of the temp buffer implementation.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
For supporting VK_EXT_host_image_copy for tiled images, we need to be
to determine whether AFBC may be supported in
vkGetPhysicalDeviceImageFormatProperties2.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>