We were allocating a fixed number of temporary registers; this isn't
always enough, and in fact we should have calculated the number of
temporaries required.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 6c64ad934f ("panfrost: spill registers in SSA form")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36135>
Copy between memory and a depth/stencil image requires copying the depth
and stencil aspects in separate calls. For D32S8, this needs to be
special cased in order to handle (de)interleaving.
For image->image copies, deinterleaving is not supported. Aspects must
match between src and dest for non-planar images.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
This is needed for VK_EXT_host_image_copy which, like the buffer<->image
copy commands, treats depth/stencil like separate image planes and
requires copying each separately.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
We don't need to use fixed-size pixel_t types and put the tiling loop in
a macro in order to get good codegen for this. Replacing the fixed-size
types with memcpy/__builtin_assume_aligned, the compiler is still able
to generate multi-word load/store instructions. Without the fixed-size
types, the only advantage of putting this in a macro is to ensure the
code is specialized on size/is_store/shift, but we can get the same
specialization by making the functions ALWAYS_INLINE.
Measured performance in VK_EXT_host_image_copy benchmraks is unchanged,
and generated assembly looks effectively identical to the previous version.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
Since we don't have a CPU implementation of AFBC compression, host copy
is only implemented for u-interleaved tiling.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
This is needed for VK_EXT_host_image_copy.
Most other mesa drivers use a similar approach to implement tiled->tiled
copy, with a few differences. They use a temp buffer sized for only one
tile, don't attempt to tile-align the copies in either the src or dest,
and they don't have the memcpy fast path. I measured performance of a
variety of implementations on a rock5b, and found:
- The fast path for when the copy region is tile-aligned is a 167%
improvement.
- Aligning the temp buffer chunks to src tiles is a 20% improvement.
- Using a 64k buffer instead of a tile-sized buffer is a 14%
improvement. This buffer size appears optimal in my benchmark,
smaller and larger buffers are both slower. Skipping the chunk
approach and just (de)tiling to a temp buffer that fits the whole
image (what NVK does) is also slower.
- I had no luck with attempts at a direct tiled->tiled copy algorithm
that didn't need a temp buffer. The fastest I got was ~1/4 the speed
of the temp buffer implementation.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
For supporting VK_EXT_host_image_copy for tiled images, we need to be
to determine whether AFBC may be supported in
vkGetPhysicalDeviceImageFormatProperties2.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
Depth/stencil and tiled images require some additional complexity, so
will be implemented in later commits.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
For VK_EXT_host_image_copy, we need to access image memory from the CPU
after mapping the BO. The existing base field in pan_image_plane doesn't
work for this because it's a GPU address and we don't have a mechanism
to recover the GPU base address of an image's BO to calculate the offset.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
Advertising SAMPLED_IMAGE_DEPTH_COMPARISON is a no-op for images that
don't have SAMPLED_IMAGE_BIT, but it's confusing and results in us
advertising a lot of formats that with only the
SAMPLE_IMAGE_DEPTH_COMPARISON feature that aren't usable for anything.
For R32_UINT and R32_SINT, the change is just a cleanup, because we
always support these for storage images.
Whe we implement VK_EXT_host_image_copy, advertising unusable formats
triggers failures in dEQP-VK.api.image_clearing.*, so it's convenient to
have features==0 for all unusable formats.
Fixes: 70b8056df1 ("panvk: Enable KHR_format_feature_flags2 and use them")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35910>
All current callers check for MAP_FAILED, not NULL, and we are returning
MAP_FAILED already on the other error paths.
Fixes: d5f4f918f3 ("panfrost: clean up mmap-diagnostics")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36122>
Fixes: c257bf5142 ("panvk: Conditionally register an host address when tracking user memory")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36122>
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
This commit proposes an optimized version using Arm A32 NEON
intrinsics.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Payload size retrieval can greatly benefit from using SIMD to sum up
the 16 6-bit packed sizes. This commit proposes an optimized version
using Arm A64 NEON intrinsics. This was measured on a Rock 5B to be ~2
times faster than the original.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
The AFBC-P payload layout is currently retrieved in 2 steps starting
with the payload sizes retrieval using a CS job on the GPU followed by
a CPU pass to set the payload offsets. This commit proposes to do both
steps on the CPU at once using a new utility function
pan_afbc_payload_layout_packed().
A new utility function pan_afbc_payload_uncompressed_size() is added
to help retrieve the uncompressed size from a pipe_format and
modifier. Both the CPU and GPU versions use it now.
A new AFBC-P driconf option "pan_afbcp_gpu_payload_sizes" is added to
fallback to the original payload sizes retrieval on the GPU.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Add an AFBC header block structure pan_afbc_headerblock to improve
readability when accessing header blocks. get_superblock_size(), which
will be used for AFBC packing in the next commits, has been moved to
pan_afbc.h and renamed to pan_afbc_payload_size() so that it can be
tested. Other utility functions pan_afbc_header_subblock_size() and
pan_afbc_header_subblock_uncompressed_size() hasve been added to help
retrieve the compressed or uncompressed size of a subblock from a
header. This commit also fixes a few issues like arch handling.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Will be used by etnaviv too.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
Bifrost will require more work
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35789>
Adds support for 64-bit atomic operations for KHR_shader_atomic_int64
using 64-bit atomic instructions. Valhall is working but Bifrost will
require some more work to implement as it requires two instructions to
execute a 64-bit atomic.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35789>
Add 64-bit atomic instructions for bifrost/valhall
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35789>
Fixes: 563823c9ca ("panvk: Implement vk_shader")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Caterina Shablia <caterina.shablia@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36006>
Previously this was allowing invalid forms like
"CLPER.i32.subgroup8.zero lane-id, src1" to reach bi_pack.
This fixes the assert that can be seen with
"dEQP-VK.glsl.derivate.dfdxsubgroup.*" but doesn't fix failures.
Fixes: 0acc6b564e ("pan/bi: Rework FAU lowering")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36006>
Previously we were allowing passthrough to temps without using
bi_reads_temps.
This was causing instructions like CLPER to create undefined encodings.
We now check if the instruction support temps.
Fixes: 4252fb84f4 ("pan/bi: Add passthrough register rewriting helper")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36006>
Since commit 8bb46de0, the correct way to check for a compute shader is
with `gl_shader_stage_is_compute()`.
Fixes: d2838f3c ("pan/bi: handle barriers with SUBGROUP scope")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35957>
We don't need to depend on the generated wl_drm files, as wl_drm support
was removed from Vulkan quite some time ago.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: e090316570 ("vulkan/wsi/wayland: drop support for wl_drm")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35885>
The extension allows returning 0 if a given rate is unknown, which
allows us to support this on all GPUs, but since the extension depends
on Vulkan 1.1, we only expose it on v10+ for now.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35155>
Adds pixel rate, texel rate, and fma rate to the model table for v9+
GPUs. This will be exposed via VK_ARM_shader_core_properties and
GL_ARM_shader_core_properties later.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35155>
Rather than splitting the GPU ID in two, let the GPU ID users do that
when they need.
We also rework the model detection to use a mask so we can customize
the mask if the version major/minor fields are required to differentiate
two GPUs with the same arch major/minor and product major.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35155>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
The GLSL compiler always lowers inputs to temps for VS and GS, so exclude
them from driver support because the GLSL compiler will no longer do that
unconditionally. Thus, indirect VS and GS inputs are completely untested
and broken in a lot of drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
Before doing register allocation, use information available from
the SSA representation to determine register pressure and to
spill registers. This spilling doesn't have to be perfect (the
register allocator is still allowed to spill) but it will be
much faster to do the SSA spilling than RA spilling. In general
this should vastly improve the performance of register allocation.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34446>
We're currently not using image layouts in PanVK so we can advertise
this extension without additional changes.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35911>
We have enough DUTs to increase the job parallelism to 8. At this level,
the runtime reported by deqp-runner averages about 8:30, which is below
the 10-minute target recommended by the docs.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35913>
The JOB_OFFSET only actually affect the global id and not wg id.
In NIR common, we assume that if base wg isn't supported, it means that
global and wg id already contains it.
To follow the convention around, we remove the offset to assume
global id will need an offset added.
Alone, this doesn't change functionality as we always lower away global
id so far.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35909>
FEX is a 64-bit process potentially running x86 (32-bit) binary, in
which case the automatic user MMIO offset detection doesn't work, so
let's explicitly set the user MMIO offset when we can.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34573>