Android Vulkan loader relies on aliased ANB image support to advertise
KHR_swapchain spec v69+. This change adds panvk_android_get_wsi_memory
helper based on deep copied (and sanitized) image create info to perform
deferred image initialization and ANB memory alloc.
Also we switch to use VK_USE_PLATFORM_ANDROID_KHR instead.
Acked-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36603>
No apps or tests have hit the spec corner case yet, but in theory they
could pass invalid offset and expect the impl to ignore it for wsi alias
binding. This change ensures the offset is zero, which aligns with
common wsi side binding as well as obeying the dedicated allocation
requirement.
Fixes: 187956bd51 ("panvk: adopt wsi_common_get_memory")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36603>
Implemented in the way without leaking concerns. The container
panvk_android_deferred_image will be freed up upon panvk_DestroyImage
with the strictly paired allocator obeying the spec.
Acked-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36603>
Before ANB spec v8, all ANB images are created and fully initialized
upon panvk_CreateImage.
Acked-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36603>
ANB and AHB image handling will be implemented on top of
VK_EXT_image_drm_format_modifier impl. To be specific, they will be
resolved to VkImageDrmFormatModifierExplicitCreateInfoEXT when the
backing gralloc image is available:
- ANB: upon ANB image creation
- ANB alias: upon binding image to memory
- AHB: upon dedicated memory import
So for ANB alias and AHB, the initial VkImage creation only needs to
allocate the image object while the create info has to be deferred till
later to help with actual image layouting.
Acked-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36603>
The branch offset needs to fit in 8 bits, and with the shr(3) modifier,
this means the max legal value is 2040. Let's verify that while packing.
CID: 1503283
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
We already have a more robust helper for this, so let's use it rather
than open-coding the same.
While we're at it, return early on error for readability here. There's
no need to continue the logic in those cases.
CID: 1444074
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
lseek can return a negative value on error here. While it's not likely
to happen, let's add some error-checking here to prevent bad behavior if
we're unlucky.
CID: 1648299
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
If the shader contains zero words, we would try to use -1 as an index
into an array at the end of this function, which would be bad. But
shaders without any words are, uh, no point in disassembling in the
first place, so this seems like a theoretical bug in the first place.
However, since the only thing we *really* care about last_next_tag is if
is TAG_BREAK or not, let's initialize it to TAG_BREAK instead. This
means we'll avoid a bogus print at the end here, even if we ended up
calling this on an empty shader.
CID: 1458835
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
r1.w should be written, so let's add an assert here instead of making
lcra_add_node_interference() overrun a buffer here.
CID: 1510007
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
Allocations can fail, and since this is an optimization pass, let's just
skip the pass and let some other code deal with the OOM situation.
Fixes: 800a861431 ("pan/bi: Fuse FCMP/ICMP on Valhall")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36724>
That was a legacy thing only needed for the original present_id/wait
suport.
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36835>
Now that we have pan_image_test_modifier_with_format(), use it do our
native modifier check. This involves fully describing the YUV lowering
even for formats that don't have a native YUV-as-RGB fallback.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35761>
It's not necessarily shorter, because of the pan_image_props
initialization but we're likely to omit details when doing the check.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35761>
pan_image_test_props() checks all the image properties at once, and
pan_image_test_modifier_with_format() just a <modifier,format> pair.
This will allow us to use the check done in pan_mod instead of
duplicating the same set of rules in panvk/panfrost and possibly having
one that's ahead of the other.
There are still checks we can't do at the pan_mod/image level, like
anything involving format lowering, but that's okay.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35761>
pan_mod_handler::supports_format() is not used yet, and will be too
limited for panvk, so let's provide a callback that will check all
image properties at once instead of just the <modifier,format> pair.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35761>
This allows us to call pan_mod_get_handler() from static inline
functions defined in headers that are included from per-gen and
gen-agnostic source files.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35761>
We can do some movement for UBO and SSBO after they are lowered in
preprocess.
We already do this in postprocess but this now also catch SSBOs as they
are lowered in postprocess.
Overall, reduce fills (less load from TLS) in fossils (excluding
parallel-rdp as it crash still):
Totals:
Instrs: 115242 -> 115046 (-0.17%); split: -0.20%, +0.03%
CodeSize: 1168896 -> 1164928 (-0.34%); split: -0.35%, +0.01%
Estimated normalized CVT cycles: 762.015625 -> 757.109375 (-0.64%); split: -0.75%, +0.11%
Estimated normalized Load/Store cycles: 12693.0 -> 12680.0 (-0.10%); split: -0.11%, +0.01%
Number of spill instructions: 358 -> 359 (+0.28%)
Number of fill instructions: 1600 -> 1584 (-1.00%)
Totals from 127 (15.82% of 803) affected shaders:
Instrs: 31753 -> 31557 (-0.62%); split: -0.73%, +0.12%
CodeSize: 335104 -> 331136 (-1.18%); split: -1.22%, +0.04%
Estimated normalized CVT cycles: 205.546875 -> 200.640625 (-2.39%); split: -2.78%, +0.40%
Estimated normalized Load/Store cycles: 3935.0 -> 3922.0 (-0.33%); split: -0.36%, +0.03%
Number of spill instructions: 124 -> 125 (+0.81%)
Number of fill instructions: 452 -> 436 (-3.54%)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
This should have no effect apart cleaning up NIR_DEBUG print outputs a
bit.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
We now have lower_texture_early and lower_texture.
lower_texture_early handle nir_lower_tex and (in the future) could handle
anything that is backend specific that need to happen before nir_lower_io.
lower_texture handles actual lowering of backend specific things that
must happen after nir_lower_tex and nir_lower_io.
This allows us to finally not run nir_lower_tex two times in panvk.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
Moving it out of there will allow us to shuffle and move API specific parts
out of there.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
As we are going to move texture and IO lowering, this split preprocess
functions in two, one handling preprocess the other postprocess.
The split is done right before lower_io and has no functional change for
now.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
We are going to move to run nir_lower_tex once and before
lower_descriptors.
To avoid needing to rerun it, let's never generate a sampler or texture
index in lower_descriptors when offset is present.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
Unused outside of pan/bi and also remove orphan bifrost_nir_lower_xfb
declaration.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
This should only run on frag shaders, let's group it the same way we
have it in midgard compiler.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36776>
BITFIELD_MASK() returns a 32-bit unsigned integer, and Clang complains
if we assign it to a 16-bit unsigned integer without a cast. Let's add
that cast.
While we're at it, add an assert() to make it clear to the compiler that
the condition in BITFIELD_MASK() can be optimized away.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
BITFIELD_MASK() returns a 32-bit unsigned integer, and Clang complains
if we assign it to a 16-bit unsigned integer without a cast. Let's add
that cast.
While we're at it, add an assert() to make it clear to the compiler that
the condition in BITFIELD_MASK() can be optimized away.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
The enum pan_earlyzs is just enum mali_pixel_kill under a different
name, which was needed because the enum was missing from common.xml.
However, because pan_earlyzs_lut is used in files that are both included
with PAN_ARCH unset and set to values including values lower than 6, we
get issues with the way genxml/common_pack.h gets included, resulting in
the enum not being defined.
We don't really depend on the values for this, only on the size. So
let's just use unsigned values in the struct instead, to side-step the
issue.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
While this was also using translate_zs_format() before the commit in
question, that's didn't lead to any real issues, because only a single
value was legal here before. While it's not entirely in-spec to use
other values, it seems the HW doesn't mind.
But when this logic was reworked, the typed field was used instead. This
lead to a compiler warning on Clang.
Let's correct this properly here, rather than papering over the compiler
warning.
Fixes: 7a763bb0a3 ("pan/genxml: Rework the RT/ZS emission logic")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
To generate a nir_component_mask_t, we should use nir_component_mask,
not BITFIELD_MASK()...
But we're also generating the same mask twice here, so let's just
store that to a variable and reuse the mask when shifting it while we're
at it.
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36606>
Clear big bos cache if the the user calls vkResetCommandPool with
VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36713>
Cache TLS in the case of large spilling. For content that is spilling
large amounts of TLS this can bring substantial uplifts in
performance.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36713>
On Mali, GPU timestamp cycle counts are mapped to the arch counter, and
so advance at the same rate as CNTVCT (with a fixed offset). The kernel
applies gradual NTP adjustments to CLOCK_BOOTTIME by modifying the rate
of the cycle->ns conversion slightly from the nominal frequency of the
clock, which causes it to drift from the GPU clock's ns values (which
just use the nominal frequency). On a rock5b, I measured this drift in
the 25-30µs/s range.
Perfetto's clock synchronization applies a fixed offset between each
clock snapshot, and so does not handle clocks with significantly
different rates and infrequent snapshots well. For panvk, we emit
snapshots once per second, and so the drift results in an error of
~25µs right before the next snapshot. This is significant for measuring
the latency of CPU<->GPU operations, and shows up as a sawtooth pattern
on the measured latency distribution over time.
CLOCK_MONOTONIC_RAW does not have the NTP adjustment, and so the only
source of drift is error in the shift/mult approximation that the kernel
uses for cycle->ns. This error is very small, and so by emitting CPU
trace events against CLOCK_MONOTONIC_RAW instead of CLOCK_BOOTTIME, we
can get much more accurate synchronization.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34390>
Everything is currently using CLOCK_BOOTTIME, which is perfetto's
default, and matches the previous behavior. On some hardware, different
clocks may be better synchronized with the gpu clock.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34390>
DRM syncobjs always let you wait repeatedly on them, so we can set the
flag in the core instead of having each driver override it once they try
to enable the emulated timeline semaphores.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36563>
util_perfetto_init() was called in some places, util_cpu_trace_init()
was called in other places, and some places used tracing without ever
calling either of them
util_cpu_trace_init() is now guaranteed to be called:
* on gallium screen create
* on VK instance create
thus no driver/frontend/etc should ever need to call this manually
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36628>
Stage and access masks often go together. Representing the two
with a single value lets us have neater code.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36176>
Currently the only thing this function ever does is clear AFBC
metadata when transitioning away from UNDEFINED.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36176>
Make get_cs_deps accumulate the dependencies. This is useful when
implementing layout transitions. Layout transitions away from
UNDEFINED are implemented with a compute dispatch sandwiched
in-between src and dst, specified in the image barrier. When
implementing device event wait and the event dependency needs a
layout transition, we can defer emitting the barrier that syncs
layout transition dispatches with dst until after we have emitted
all dispatches, avoiding sync between the said transition
dispatches.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36176>