Currently it's not easy to know which modifier will be picked for
an image and thus end up causing a layout difference. The next
commit causes us, for certain images, to choose interleaved 64k if
HOST_TRANSFER is not specified, but choose U-interleaved when it
is, causing a layout difference.
See https://gitlab.freedesktop.org/panfrost/mesa/-/issues/281 for
details.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39657>
Advertise the extension and enable the zeroInitializeDeviceMemory
feature. The panfrost and panthor kernel drivers uses drm_gem_shmem which
gets zeroed pages from the shmem subsystem, so memory is already
zero-initialized by default.
VK_IMAGE_LAYOUT_ZERO_INITIALIZED_EXT is treated the same as
VK_IMAGE_LAYOUT_UNDEFINED. Since panvk doesn't use image layouts
(layout transitions are no-ops), no special barrier handling is
needed for either layout.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39658>
Compressed formats cannot support storage operations on any Mali
generation:
- On Bifrost (v6-v7), the texture descriptor contains the compressed
format directly, and the hardware doesn't support storage operations
on compressed formats.
- On Valhall (v9+), storage operations would require
InternalConversionDescriptors, which cannot describe compressed
formats.
Storage operations on compressed formats don't make practical sense
anyway - each pixel write would require full block recompression.
Remove PAN_BIND_STORAGE_IMAGE from the FMTC macro used by all
compressed format definitions.
Fixes crashes in dEQP-VK.memory.zero_initialize_device_memory tests
that attempt to use compressed formats as storage images.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39658>
Some ALU instructions will likely end up being copy propagated in the
backend, which means they would not have any cost. This helps the
scheduler make better decisions for the new open-coded patterns
produced in NIR for extracts (i.e. unpack_2x16) with MR#39511.
With this (together with previous patches) we manage to produce similar
shader-db results as with the unpack_2x16 NIR extract opcodes that
MR#39511 will drop.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
We need this to produce optimal code in the backend for sequences
like this:
32 %10 = ushr %5.x, %9 (0x10)
16 %14 = u2u16 %10
32 %17 = f2f32 %14
With such code, our copy propagation pass will drop the u216 and
with this patch we will be able to drop the ushr too.
This pattern can show up for VK_KHR_16bit_storage when we successfully
vectorize 16-bit loads into 32-bit loads, but will become a lot more
common after MR#39511 lands, since that would also affect things like
16-bit TMU loads, which are more common.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
We only really use sub-32bit integers in conversions, so we can skip
clearing the MSB bits when we produce them by converting from larger types
(leaving these bits undefined) and only clear them when we convert from them
to larger types, since we don't have native opcodes to do these conversions
that would only access relevant bits, at least on Pi4. Also, document the
cases where we could do better for Pi5.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
There's a lot that has been fixed, but nobody has been paying attentions
to t720. Let's update the results.
In addition, we used to do skips here instead of flakes. Not sure why,
flakes works just fine.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39711>
This was an oversight of VK_KHR_dynamic_rendering_local_read which has
been addressed by VK_KHR_maintenance10 which introduced new flags to
give more information to implementations.
The Vulkan spec says:
"VK_RENDERING_ATTACHMENT_INPUT_ATTACHMENT_FEEDBACK_BIT_KHR is
intended to give implementations similar information as a subpass
where an attachment could be used as both a color attachment and
input attachment. Some implementations require extra work to make
this scenario work beyond just considering the image layouts.
Implementations which have no such considerations may treat this
flag as a noop. The primary use case for this flag is to enable
feedback loops inside a single shader."
"Applications are encouraged to use
VK_RENDERING_LOCAL_READ_CONCURRENT_ACCESS_CONTROL_BIT_KHR if
maintenance10 is available and they use feedback loops with
VK_KHR_dynamic_rendering_local_read. Feedback loops are still
allowed when not using the rendering flag, but the performance
implication was an oversight in the original definition of
VK_KHR_dynamic_rendering_local_read."
Because it's clearly defined by the Vulkan spec, let's just pessimize
always to avoid relying on some shaders state which require to do very
late decompression passes. This will allow us to do more cleanups and
optimizations related to the framebuffer. Also note that DRLR is still
a niche feature.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39538>
Its no longer an error for depth and stencil formats to have invalid
accumulator format.
Fixes the following tests:
* dEQP-VK.api.info.image_format_properties.2d.optimal.d16_unorm
* dEQP-VK.api.info.image_format_properties.2d.optimal.d24_unorm_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.x8_d24_unorm_pack32
Backport-to: 26.0
Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com>
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39626>
The only uses of the macro can be fatal assertions instead.
No point keeping it around, especially as it doesn't work with the ASSERTED
hint to suppress warnings either.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39677>
As rpi5 can work with either 16k or 4k pages, instead of hardcoding the
pagesize just query the kernel.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39555>
Without SUBOPTIMAL, we'd generally end up picking a modifier which isn't
scanout capable, so direct scanout wasn't possible.
This allows direct scanout to work e.g. in Talos Principle.
v2:
* Also bail from wsi_x11_swapchain_query_dri3_modifiers_changed.
(Hans-Kristian Arntzen)
* Use use_modifiers helper function.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39679>
Now that the small/large pages race is fixed, we can safely enable it
back when the kernel side report 1.4.2 support.
Fixes: f3c53cf66b ("nvk: Disable large pages for now")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39706>
Now that we can use info from the shaders to better determine which
possible interpretation of a descriptor is (or might be) used, we can
dump descriptor decoding by default without causing a massive spam of
incorrect/impossible descriptor decoding.
The old arg is repurposed to show all descriptors (ie. the previous
behavior).
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
Add the descriptor index and current pm4 packet to the script hook.
Knowing the pm4 packet tells the callback whether it is a 3d draw or
compute shader, so it knows which shader stages to check. Having the
descriptor index lets the script check against stats extracted from
the relevant enabled shader stages.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
We were assuming that a non-numerical offset was a pm4/descriptor packet
payload. This didn't work for ir3_shader_stats which is defined more
like a "struct" (to match a 'C' struct).
Fixes: ebde70cdce ("freedreno/decode: Allow direct access to domain bitfield")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
Extract stats about the descriptors used by a shader. A later commit
will use this information to help filter the used descriptors and types.
Unfortunately a1.x usage throws a bit of a wrench into the gears, since
(like s2en) we don't know which tex/samp or even in some cases bindless
base is used. This could perhaps be improved to detect the commmon case
of an immed value loaded into a1.x.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
Makes it more clear which src arg is the descriptor, and makes it easier
for the disassembler to locate the src which references a descriptor.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
If the rendering state is inherited in the secondary, otherwise nothing
wait for the pending flushes after a decompression pass. One more
argument to stop delaying this.
Fixes
dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.*
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39678>
LTO is not supported with Mesa. It has caused random impossible-to-debug
bugs for a long time, and LTO bug reports are generally considered a
"WONTFIX please disable LTO instead". Let's make this clear to users
and packagers so they don't run into more weird issues that result in
inactionable reports.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39671>
Not enough tested on over Gen12 platforms.
Turns out to be not working on DG2, for example.
Cc: mesa-stable
Closes: #14449
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39676>
Match spec by using clean_tile_write_enable instead of
clean_pixel_write_enable for v6+ architectures.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
As opposed to pre-frame shaders 0 and 1, we can keep using the
INTERSECT mode for the post-frame shader when any of the clean
tile/pixel flags are set.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
Compute clean tile state once at FB descriptor emission and use the
cached results to set the clean tile/pixel write enable flag on the RT
and Z/S/CRC descriptors and to retrieve whether any of the clean tile
write flags is set.
This commit now also prevents setting the clean tile/pixel write
enable flag on descriptors when the associated attachment is
discarded.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
Previous commit makes rt_clear() and rt_clean_pixel_write() calls
directly from pan_emit_rt(). Move these function definitions
closer. This will also improve diffs of coming commits.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>
This simplifies the initialization of common flags by setting them
outside of the modifier handlers. This also makes it more consistent
with the depth/stencil config which uses the same technique.
Panfrost bitfield packed descriptors are most commonly pushed into
Non-Cacheable memory. For descriptors packed in more than one step
(e.g. to support various modifiers), it's important to avoid mixing
loads and stores on the same cacheline for best performance. This is
why pan_merge() is first called on stack allocated packed descriptors
before copying into a BO.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38422>