Pass the per-image-view minimum LOD clamp from the Vulkan runtime
(vk_image_view::min_lod) through pan_image_view into the Mali texture
descriptor's Minimum LOD field.
Mali v6+ hardware has per-texture-descriptor LOD clamp fields that
operate independently from the sampler's LOD clamps, so no shader
lowering or descriptor merging is needed.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39938>
Add an option to enable HSR Prepass.
It is currently disabled by default as it might cause performance
regressions for content that:
- Has very simple fragment work.
- Already does a ZS prepass.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39615>
For sufficiently big images, tiled AFBC offers perf advantages
over linear AFBC. Keep using linear AFBC for images that are thin
and fall through to U-interleaved for even thinner images. Note
that indeed, interleaved 64k will be skipped in this case as it
won't meet the minimum size criteria set out by interleaved 64k's
test_props.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39737>
The passed flags is always zero on the import paths:
- panfrost_bo_import
- panvk_AllocateMemory
- panvk_GetMemoryFdPropertiesKHR
Fixes: 1c7793ea0b ("panvk: Advertise a HOST_CACHED memory type if we have WC maps")
Tested-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39723>
Interleaved 64k should be better than U-interleaved for most
workloads so use it if we can and memory waste isn't too bad.
This also improves perf in cases when we can't use U-interleaved,
but can use interleaved 64k, such as BLOCK_TEXEL_VIEW_COMPATIBLE
images. Currently we'll end up picking linear, which is strictly
worse than interleaved 64k when it comes to perf.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39657>
Compressed formats cannot support storage operations on any Mali
generation:
- On Bifrost (v6-v7), the texture descriptor contains the compressed
format directly, and the hardware doesn't support storage operations
on compressed formats.
- On Valhall (v9+), storage operations would require
InternalConversionDescriptors, which cannot describe compressed
formats.
Storage operations on compressed formats don't make practical sense
anyway - each pixel write would require full block recompression.
Remove PAN_BIND_STORAGE_IMAGE from the FMTC macro used by all
compressed format definitions.
Fixes crashes in dEQP-VK.memory.zero_initialize_device_memory tests
that attempt to use compressed formats as storage images.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39658>
Match spec by using clean_tile_write_enable instead of
clean_pixel_write_enable for v6+ architectures.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
As opposed to pre-frame shaders 0 and 1, we can keep using the
INTERSECT mode for the post-frame shader when any of the clean
tile/pixel flags are set.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
Compute clean tile state once at FB descriptor emission and use the
cached results to set the clean tile/pixel write enable flag on the RT
and Z/S/CRC descriptors and to retrieve whether any of the clean tile
write flags is set.
This commit now also prevents setting the clean tile/pixel write
enable flag on descriptors when the associated attachment is
discarded.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
Previous commit makes rt_clear() and rt_clean_pixel_write() calls
directly from pan_emit_rt(). Move these function definitions
closer. This will also improve diffs of coming commits.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
This simplifies the initialization of common flags by setting them
outside of the modifier handlers. This also makes it more consistent
with the depth/stencil config which uses the same technique.
Panfrost bitfield packed descriptors are most commonly pushed into
Non-Cacheable memory. For descriptors packed in more than one step
(e.g. to support various modifiers), it's important to avoid mixing
loads and stores on the same cacheline for best performance. This is
why pan_merge() is first called on stack allocated packed descriptors
before copying into a BO.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
This only forces write-back of clean tiles when there's a clear.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
Clean tiles must actually be written back for AFBC buffers (color,
z/s) when either one of the effective tile size dimension is smaller
than the superblock dimension. This commit fixes the current check
which compares the effective tile size to the superblock size.
Fixes: 762a0f4133 ("panfrost: Add the concept of render block")
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
New traces can safely be added now that they are disabled by default.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39407>
Use the new Mesa CPU trace wrappers to give Panfrost traces a
category. For builds with either Perfetto, Gpuvis or sysprof enabled,
Panfrost traces must now be enabled at run-time through the PAN_TRACE
environment variable. It removes the CPU cost of traces when traces
aren't used, it allows users to enable traces based on the needed
categories and it will allow to add new traces without worrying about
the CPU cost.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39407>
Add the PAN_TRACE_SCOPE() and PAN_TRACE_FUNC() wrappers based on
MESA_TRACE_SCOPE_IF() in order to associate a category to each trace
and let users select the set of tracing categories to enable at
run-time through the PAN_CPU_TRACE environment variable. This makes
Panfrost tracing an opt-in and avoids to CPU cost of tracing by
default.
There are 3 categories for now:
- "lib" for the shared utilities
- "gl" for the Gallium driver
- "vk" for the Vulkan driver
Each of these categories are divided into subcategories so that
subsystems can easily be traced ("gl.csf" or "lib.kmod" for instance).
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Ashley Smith <ashley.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39407>
And immediately implement it in terms of
DRM_FORMAT_MOD_ARM_INTERLEAVED_64K.
Also ban DRM_FORMAT_MOD_ARM_INTERLEAVED_64K for WSI in panfrost.
Normally, the modifier's test_props would take care of but as
panfrost doesn't use test_props, this has to be handled in
panfrost itself.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38986>
This replaces all full lisence headers with SPDX identifiers and
generally makes things more consistent. I've also dropped the few
remaining author tags. If someone wants to know who wrote a bit of
code, `git blame` is going to be way more accurate than author tags
anyway.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39397>
This is a little more manual (though it's actually less code) but it
gives us a lot more control and makes the whole flow nicer.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
Instead of making it explicitly about outputs, this switchies it to
being a NIR version of LD_TILE. It means we have to do a bit of work in
NIR and add a builder helper but the end result is something much more
versatile.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39367>
Texel buffers are currently described by a TextureDescriptor, which
leads to restrictive limits on size and alignment. These limits can be
avoided by using AttributeDescriptors + AttributeBufferDescriptors
instead.
This requires us to access texel buffers using attributes rather than
textures, which involves setting up AttributeDescriptors and
AttributeBufferDescriptors in their respective allocations, rather than
the previous TextureDescriptors in the texture allocation.
This is already done for images, so we simply place the texel buffer
attributes after the images and ensure the indexing if offset correctly.
Accessing a texel buffer thus becomes:
1. Get the buffer address and ConversionDescriptor with LEA_ATTR[_IMM]
2. Use LD_CVT to get the value
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38490>
It's a lot more explicit to just have an intrinsic for this than to
treat blend shaders as their own weird stage. Also, the new intrinsic
uses the same io_semantics as a fragment store so the back-end code is a
little easier to read because it now checks sem.dual_source_blend_index
instead of the generic load_input offset.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39244>
The one non-trivial change here is that we're now using BLEND with a
constant descriptor instead of ST_TILE for MSAA blend shaders. However,
this shouldn't make any practical difference.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39244>
We already optimize the case where the destination format does not
contain alpha. However, there are a few more cases around formats and
blend constants which we can optimize. In particular, float blending
doesn't support constants so we really want to check if the client hands
us a 0/1 constant.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39171>
This actually enables blending for 4 of the supported float formats.
Technically, RGB16F blending is possible as well, using RGBA16F
internally but we only support FORMAT_R16G16B16_SFLOAT for vertex
buffers so there's really no point. This elimiates a lot of blend
shaders and improves the performance of the 3DMark Wild Life benchmark
by about 5 FPS (7-8%) on my MediaTek Chromebook.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39171>
Blend equations that work on float are treated a bit differently, hence
the new is_float on pan_blend_equation.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39171>
This is duplicated between the two drivers and about to get more
complicated.
Reviewed-by: Aksel Hjerpbakk <aksel.hjerpbakk@arm.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39171>
We all know who wrote a bunch of Panfrost code. No need to repeat this a million
places, the copyright line is plenty.
in cases where there's a joint me & Italo/Eric/.. tag, i've left it alone to
respect others' potential wishes.
$ find . -type f -exec perl -i -p0e 's/ \*\s+\* Author[^\n]+\s+\*\s+Alyssa[^\n]+\n \*\// \*\//' \{} \;
v2: delete more tags (Boris).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39136>
The CTS issue for this was closed two months ago, so this should be
fixed now.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Ryan Mckeever <ryan.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38997>
While the MAX2 thing here is correct for some formats, it's not correct
for all; for instance R8_SNORM doesn't need 32-bits here.
This should enable some higersample-counts on some 8 and 16-bit formats
on some Mali GPUs.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38968>
We're going to have to do this from non arch-specific code, so let's
factor the meat out into a helper so we don't need to repeat the logic.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38968>
This matches better what we do in pan_emit_fbd, where we don't increase
the cbuf_offset variable for unused render-targets. This way we simply
make sure we *at least* can fit a dummy-RT (as per the HW spec), but
since we don't write to it we also don't need to give it dedicated
memory beyond that.
This also seemingly fixes a subtle bug where we don't deal with PLS if
there's no active render-targets.
Fixes: 9ec6197a0b ("panfrost: allocate tile-buffer for dummy render-targets")
Fixes: c15a43cce0 ("pan/lib: prepare for pixel local storage support")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38968>
pan_kmod_flush_bo_map_syncs() queues CPU-sync operations, and
pan_kmod_flush_bo_map_syncs_locked() ensures all queued
operations are flushed/executed. Those will be used when we start
adding support for CPU-cached mappings.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Fail early in pan_kmod_bo_mmap() if PAN_KMOD_BO_FLAG_NO_MMAP is set.
This saves us a user -> kernel round-trip, but most importantly, it
allows us to enforce NO_MMAP at the userspace level on BOs that the
kernel would otherwise accept to mmap() (mapping of imported BOs
requires extra DMA_BUF_IOCTL_SYNC calls we don't have).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Will be used to skip cache maintenance operations when the GPU is IO
coherent.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Will be needed to let the frontend know if it can use cached CPU-mappings,
and it allows us to extend the set of supported flags without introducing
a new field if we ever have to.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
We have a few hand-rolled instances of this which work well enough but it
gets more complicated as soon as we care about checking a major version
more than 1. Add a helper to make this more robust.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
The frontend is going to query the device props anyway, so let's just
query it at device creation time and store it in pan_kmod_dev::props.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385>
Unlike what the comment said here, V9 does in fact support a single
float-format, so let's allow that.
But also, V10 and later supports FP16 formats, but this incorrect check
made that not work. Enable the FP16 formats also while we're at it. We
don't need any additional checks here, because the 16-bit unorm formats
were also added in V10, so util_format_any_to_unorm() does the right
thing here.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38848>
There's no reason why the S8_UINT check should be written in a different
way than the other checks here; let's make this consistent.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38848>
shader_realtime_clock requires a newer kernel version in order to enable
GLB_COUNTER_EN this change adds a check on this kernel functionality.
Remove GL_EXT_shader_realtime_clock from extensions as this now depends
on kernel version.
Fixes: e9c2c324 ("panvk: enable VK_KHR_shader_clock")
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37915>
The min/max funcs are designed to operate solely on the source and
destination colors directly, without any scaling or multiplication by a
factor.
Test: dEQP-GLES3.functional.fragment_ops.blend.* pass with enabled FPK
Cc: mesa-stable
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38824>
It existed entirey to save us a switch statement. It's very unlikely
that's worth pulling GENX API command stream stuff into the compiler.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38788>