It's non-trivial to drop the private binding or transfer ownership to
the bound memory. So we track the image in the device memory for
dedicated allocation so that wsi image alias can find the original wsi
image from the wsi memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36095>
Lowering IO to temps leads to problems with RA with piglit
spec@glsl-1.50@execution@geometry@max-input-component
Not doing so results in an assertion failure with piglit
spec@glsl-1.50@execution@geometry@dynamic_input_array_index
because not all indirect IO access is lowered. Using
nir_lower_indirect_derefs works around this limitations.
v2: Fix formatting (Patrick Lerda)
Fixes: 1186c73c6 (r600: implement gs indirect load_per_vertex_input)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36051>
I've overlooked that unconditionallowering of indirect VS
inputs had been dropped. Since VS inputs are stored in
consecutive registers one can implement the indirect access
without additional lowering, it just needs a proper declararion
of the registers forming the array.
v2: - Fix formatting (Patrick Lerda)
- Use allocator for std::map to avvoid menory leak
(Patrick Lerda)
Fixes: a43bfffe1e
r600: Correct nir_indirect_supported_mask
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36051>
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
This commit proposes an optimized version using Arm A32 NEON
intrinsics.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Payload size retrieval can greatly benefit from using SIMD to sum up
the 16 6-bit packed sizes. This commit proposes an optimized version
using Arm A64 NEON intrinsics. This was measured on a Rock 5B to be ~2
times faster than the original.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
The AFBC-P payload layout is currently retrieved in 2 steps starting
with the payload sizes retrieval using a CS job on the GPU followed by
a CPU pass to set the payload offsets. This commit proposes to do both
steps on the CPU at once using a new utility function
pan_afbc_payload_layout_packed().
A new utility function pan_afbc_payload_uncompressed_size() is added
to help retrieve the uncompressed size from a pipe_format and
modifier. Both the CPU and GPU versions use it now.
A new AFBC-P driconf option "pan_afbcp_gpu_payload_sizes" is added to
fallback to the original payload sizes retrieval on the GPU.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Add an AFBC header block structure pan_afbc_headerblock to improve
readability when accessing header blocks. get_superblock_size(), which
will be used for AFBC packing in the next commits, has been moved to
pan_afbc.h and renamed to pan_afbc_payload_size() so that it can be
tested. Other utility functions pan_afbc_header_subblock_size() and
pan_afbc_header_subblock_uncompressed_size() hasve been added to help
retrieve the compressed or uncompressed size of a subblock from a
header. This commit also fixes a few issues like arch handling.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Print the AFBC-P state of a resource along its asynchronous packing
process when PAN_MESA_DEBUG=forcepack. There's no need to prevent
tiling in that case now that packing maintains the tiling state.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Pack AFBC resources asynchronously in order to prevent stalls at
texture upload waiting for 1) the AFBC staging blit (sparse encoding)
to complete and 2) the AFBC payload sizes retrieval.
After a texture upload, an AFBC resource is now progressively packed
at each read access once consecutively accessed a certain number of
times without a write access. This allows to prevent most stalls by
making AFBC packing a progressive async background process.
A useful side effect is that consecutive glTexSubImage*() calls on the
same texture (for texture atlases for instance) don't uselessly
respawn packing.
A new AFBC-P driconf option "pan_afbcp_reads_threshold" is added to
tweak the consecutive reads threshold.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
It isn't used outside of pan_resource.c.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
The pan_afbc_block_info structure describes the extent (offset and
size) of the payload data (compressed data) for a superblock, so use
the pan_afbc_payload_extent structure name instead in order to be more
precise and improve readability. This also allows to differentiate
superblocks and payload data which will be useful later in this series
when new helpers will be added to pan_afbc.h.
A set of payload extents describes the layout of various payloads, so
use the term "layout" instead of the generic term "metadata" to
describe it.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Preventing the use of the AFBC tiled layout could be useful to further
optimise memory usage when using AFBC packing. This commit introduces
a new option to disable it through a driconf option.
This is exposed as a new AFBC pan_afbc_tiled option (not tied to
pan_force_afbc_packing) because it would otherwise imply a useless
performance hit for the tiled to untiled conversion at packing time:
there's no need to detile if the resource is created untiled in the
first place. This could also be useful to compare the performance of
the AFBC tiled and untiled layouts.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Layout refactoring commits broke AFBC packing while removing several
fields to simplify the logic. The stride and height are now derived
when necessary at packing time based on the resource modifier. The
problem is that the code assumes that the source and destination
headers are the same although the source and destination modifiers
might differ and create size mismatches when passed to the AFBC
utilities in pan_afbc.h. The destination modifier is set as the source
modifier without the AFBC_FORMAT_MOD_SPARSE and AFBC_FORMAT_MOD_TILED
flags. While the AFBC_FORMAT_MOD_SPARSE flag doesn't have any impact
on these utilities, the AFBC_FORMAT_MOD_TILED flag does.
This commit fixes the issue by keeping the same header block layout
(linear or tiled header layout) when packing a resource. This allows
to simply parse header blocks linearly without having to bother with
the internal layout (Morton order). The tiled packed resource might
also benefit from better cache accesses.
Fixes: a2e9ce39e9 pan/layout: Drop pan_image_slice_layout::afbc::{stride_sb,nr_sblocks}
Fixes: 01d325ba63 pan/layout: Interleave header/body in AFBC(3D)
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35001>
Implement offset lowering by computing the appropriate LOD from
gradients and adjusting coordinates accordingly.
Passes dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.* on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
Implement offset lowering by using the explicit LOD value with nearest-integer
rounding (floor(lod + 0.5)) and reusing the coordinate calculation helper.
Passes dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.* on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
Implement offset lowering by calculating implicit LOD using coordinate derivatives (ddx/ddy)
and doing some deep floating point wizardry matching the binary blob behaviour.
Adds helper functions for coordinate calculation and LOD clamping that will be
reused by subsequent offset lowering passes.
Passes dEQP-GLES3.functional.shaders.texture_functions.textureoffset.* without explicit bias on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
Will be used by etnaviv too.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
Passes dEQP-GLES3.functional.shaders.texture_functions.textureoffset.* with explicit bias on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35753>
LDS sizes and offsets from LLVM are no longer used.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>
This will enable large code removal.
shader->config.lds_size is now always computed the same as ACO except for
compute shaders.
We have to add a new 8-bit user SGPR bitfield called
GS_STATE_GS_OUT_LDS_OFFSET_256B, which contains the offset
that was previously set by the relocation.
Since the offset must be a multiple of 256, we have to add padding
to the LDS size computation to make sure the alignment to 256 for the ESGS
LDS size doesn't cause us to exceed the maximum LDS size.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>
It has been implemented and works for PS outputs already.
The lowering callback needs 2 variants because we can't access
pipe_screen from it. The callback is rewritten to be more general.
We also need to do nir_clear_mediump_io_flag for any outputs we don't
lower because the mediump flag might prevent optimizations if it's not
cleared.
v2: fix si_nir_optim
Acked-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>
outputs_written_before_ps is used to determine kill_outputs, which removes
param exports, but non-zero GS streams are xfb-only and not exported.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>
The result of that function was overwritten by other code, so just remove it.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35529>