Add buffer copy/fill/update helpers using compute shaders. The driver
can select the optimal per-workgroup copy/fill/update size by specifying
a non-zero vk_meta_device::buffer_access::optimal_size_per_wg size.
If zero, the core will assume a 64-byte size (the usual cache-line size).
Buffer accesses will be done through SSBOs unless
vk_meta_device::buffer_access::use_global_address is true, in which
case the core will the buffer address using GetBufferDeviceAddress()
and pass that address as a push constant to the compute shader.
Image to buffer copies are always done through a compute shader. The
optimal workgroup size will be chosen based on
vk_meta_copy_image_properties::tile_size: the copy logic picks a
workgroup size matching the tile size, and aligns accesses on a tile.
The view format is selected by the driver. To optimize things on the
shader side, pick UINT formats (usually less work to do to pack data).
Buffer to image copies can be done done through the graphics pipeline
if needed (use_gfx_pipeline passed to vk_meta_copy_buffer_to_image()),
which is useful for vendor-specific compressed formats that can't be
written outside of the graphics pipeline. Drivers should normally prefer
compute-based copies when that's an option. Just like for image to buffer
copies, the workgroup size of compute shaders is picked based on the
image tile size, and the view format must be selected by the driver.
Image to image copies is just a mix of the above, with the driver being
able to select the pipeline type, as well as define the tile size and
view format to use. When using a compute pipeline, the workgroup size
will be MAX2(src_tile_sz, dst_tile_sz), and accesses will be aligned
on the selected reference image.
For compressed formats, the caller should pick an RGBA format matching
the compressed block size.
Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29333>
Will be needed for partial interleaved depth/stencil copies where the
image is treated as a color image with some components assigned to the
depth and others assigned to the stencil. If only one aspect is copies
using a graphics pipeline, we need to preserve components assigned to
the other aspect, and an easy way to do that is to tweak the color
attachment write mask.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29333>
The view extent needs to be divided by the block width/height/depth in
that case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29333>
vk_image_buffer_range() returns the buffer range needed for an
image <-> buffer copy, which will help us initialize storage buffer
descriptors.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29333>
If we don't do that and the source is not 32-bit we end up with a
bit_size mismatch when doing the ior operation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29333>
VkAccessFlags2 and VkPipelineStageFlags2 being both 64-bit bitmasks
the mistake is harmless, but let's fix that anyway to avoid any
confusion.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30735>
Creating views of images using a different format should be possible
as long as the internal layout match. Pick the format of the view
rather than the original image format when creating texture planes
on v9.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30695>
On Valhall, we can store the layer index in PositionFIFO attributes and
have the primitives dispatched to the appropriate list in the tiler
context, which means we no longer have to issue N IDVS jobs when doing
layered rendering.
On the fragment shader side, we can pass the layer index through the
frame_argument field, which can be preloaded in r62-r63, so do that to
save a push uniform slot.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30695>
There's no varying outputs on a fragment shader, but we need to
initialize the varying inputs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30695>
We don't necessarily want to allocate the root CS chunk upfront if we
don't know if there will be instructions emitted on the CS.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30695>
This way we can support decoding of descriptors that are passed through
context registers, which we will need for panvk, where the tiler/FB
descriptors come from the VkQueue object, and are passed to command
buffers.
Of course, that means we can only see the latest version of such
indirectly passed data, but that's already the case for most descriptors
that are used several times in a command buffer anyway.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30695>
Leads to invalid mappings when the selected register is not matching the
one hardcoded in pandecode_run_idvs().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30695>
The resource table passed to the shaders needs to be aligned on 64-byte.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30695>
During disassembly triggered by PAN_MESA_DEBUG=trace,
the upper bits of the blend shader address are set from the passed
in frag_shader. However, this is 0 for some blend shaders. In this case,
skip the blend shader disassembly.
This fixes a failing assert at line 86 of panfrost/lib/genxml/decode.h.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30603>
Make pseudo instructions for the IR separate from real Bifrost and
Valhall instructions, which are kept in their own ISA.xml files.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30179>
Make the valhall ISA parser valhall.py have a functional interface
returning a tuple, rather than making users directly access variables
within it.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30179>
Updates the ISA.xml parser to be able to handle some of the constructs
from the Valhall ISA.xml (which differs in significant ways from the
Bifrost ISA.xml). The eventual intent is to avoid duplicating instructions
in the two files, although that isn't enabled in this patch.
The new features aren't used yet, that will be in a future commit.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30179>
We were using the first character of names to indicate the execution unit
('+' for add, '*' for fma). Change the ISA.xml file to have an explicit
`unit` attribute for instructions; this makes the XML more flexible
for future architectures and matches what the valhall ISA.xml does.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30179>
Forgot to remove those two entries when merging previous MR.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 091df61138 ("panvk: Skip blend descriptors when no fragment shader is present")
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30740>
ARL+ can dispatch indirect draws through the hardware.
Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
ARL+ supports some form of indirect draws, instead of trying to mash
support for indirect draws across various generations, let's make things
cleaner by factoring out XI support into it's own function.
Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
Xe2 allows us to program in a custom byte stride for indirect draws
Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
Apply the same optimisation as ACO and AGX.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30484>
This trim vector srcs to the appropriate component count
based on the write mask.
This also should help with image store as the vector srcs
will be trimed according to the format if its known.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30484>
Ensure we vectorize load/store when possible.
Also move lower pack after loop optimization.
This drastically reduce the shader size of
"dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite" and allow
it to pass instead of timing out but it might greatly help others.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30484>
Previously, it was possible to have a texture operation marked as SKIP
while one of the dests was in use in conditional control flow.
If an helper thread was to execute that instruction, it would result
in an undefined value being used.
This fix
"dEQP-VK.graphicsfuzz.cov-nested-loops-sample-opposite-corners" where
helper threads would get stuck inside a loop depending on the result of
a TEXS_2D invocation.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30484>
Will be used for DCE and helper invocations pass changes.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30484>
Loop iterates viewports but for MaximumVPIndex we only need viewport
count and last stage that writes viewport.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30732>
mesa will expose GL_EXT_texture_compression_astc_decode_mode
extension if the cap is enabled by the driver.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30560>
These tests that may hit the 60s timeout in pre-merge jobs. They pass during full runs
with longer timeouts, so only skip them here.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30708>
These failures were previously fixed, but this was missed due to fractional runs.
The skips are no longer necessary either.
Add some flakes seen in various pipelines.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30708>