We are having problem establishing connections to
gitlab.freedesktop.org, even though performance once the link is
established is perfectly fine (10+ MB/s)... so let's disable the
farm until we can figure it out.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39719>
This change only addresses the clear of one channel via the TQ for DS
formats. This is exercised by VK_KHR_depth_stencil_resolve in two ways:
resolve depth and clear stencil, or resolve stencil and clear depth.
When resolving, we need to propagate source and destination format if the
DS format is combined because we need either combination of both for cases
where the DSMERGE and PICKD flags are set.
- Resolve op
+ For combined DS formats
1. resolve the stencil from the source merging it with the depth of the
destination. Leave source depth unchanged.
2. resolve the depth from the source merging it with the stencil of the
destination. Leave the source stencil untouched.
+ For non-combined formats
1. we can use the source for all aspects / channels, this ensures the
size to blit the source to is compatible with the destination. Note
that the TQ doesn't require src/dst to be single channel formats.
- Non resolve op
+ Not part of this change.
Fix for deqp:
dEQP-VK.renderpass2.depth_stencil_resolve.*.*.d24_unorm_s8_uint.compatibility*
dEQP-VK.renderpass2.depth_stencil_resolve.*.*.d32_sfloat_s8_uint.compatibility*
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Co-authored-by: Leon Perianu <leon.perianu@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39654>
In order to set DSMERGE, and eventually PICKD-epth, both the source and
the destination have to be combined D/S formats.
Removed tests that now currently pass
Fix for deqp:
dEQP-VK.renderpass2.depth_stencil_resolve.*.*.d32_sfloat_s8_uint.*
dEQP-VK.renderpass2.depth_stencil_resolve.*.*.d24_unorm_s8_uint.*
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39654>
Fix loop condition in pvr_isp_ctrl_stream to reset fill_blit
when processing fill blits with sources.
Fix for deqp:
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_17_1.*.d24_unorm_s8_uint.*
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_49_13.*.d24_unorm_s8_uint.*
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_5_1.*.d24_unorm_s8_uint.*
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_8_32.*.d24_unorm_s8_uint.*
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_8_32.*.d24_unorm_s8_uint.*
Signed-off-by: Leon Perianu <leon.perianu@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39654>
Since d95423686f ("pan/format: Add PAN_BIND_STORAGE_IMAGE flag"), we
have a separate flag for PAN_BIND_STORAGE_IMAGE, and can now also
properly check for this.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39686>
There's no good reason we need to keep these separated, they're the same
feature from a HW point-of-view.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39686>
This is to make sure early culling related Wa_16020518922 is enabled
properly.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39712>
Interleaved 64k should be better than U-interleaved for most
workloads so use it if we can and memory waste isn't too bad.
This also improves perf in cases when we can't use U-interleaved,
but can use interleaved 64k, such as BLOCK_TEXEL_VIEW_COMPATIBLE
images. Currently we'll end up picking linear, which is strictly
worse than interleaved 64k when it comes to perf.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39657>
Currently it's not easy to know which modifier will be picked for
an image and thus end up causing a layout difference. The next
commit causes us, for certain images, to choose interleaved 64k if
HOST_TRANSFER is not specified, but choose U-interleaved when it
is, causing a layout difference.
See https://gitlab.freedesktop.org/panfrost/mesa/-/issues/281 for
details.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39657>
Advertise the extension and enable the zeroInitializeDeviceMemory
feature. The panfrost and panthor kernel drivers uses drm_gem_shmem which
gets zeroed pages from the shmem subsystem, so memory is already
zero-initialized by default.
VK_IMAGE_LAYOUT_ZERO_INITIALIZED_EXT is treated the same as
VK_IMAGE_LAYOUT_UNDEFINED. Since panvk doesn't use image layouts
(layout transitions are no-ops), no special barrier handling is
needed for either layout.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39658>
Compressed formats cannot support storage operations on any Mali
generation:
- On Bifrost (v6-v7), the texture descriptor contains the compressed
format directly, and the hardware doesn't support storage operations
on compressed formats.
- On Valhall (v9+), storage operations would require
InternalConversionDescriptors, which cannot describe compressed
formats.
Storage operations on compressed formats don't make practical sense
anyway - each pixel write would require full block recompression.
Remove PAN_BIND_STORAGE_IMAGE from the FMTC macro used by all
compressed format definitions.
Fixes crashes in dEQP-VK.memory.zero_initialize_device_memory tests
that attempt to use compressed formats as storage images.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39658>
Some ALU instructions will likely end up being copy propagated in the
backend, which means they would not have any cost. This helps the
scheduler make better decisions for the new open-coded patterns
produced in NIR for extracts (i.e. unpack_2x16) with MR#39511.
With this (together with previous patches) we manage to produce similar
shader-db results as with the unpack_2x16 NIR extract opcodes that
MR#39511 will drop.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
We need this to produce optimal code in the backend for sequences
like this:
32 %10 = ushr %5.x, %9 (0x10)
16 %14 = u2u16 %10
32 %17 = f2f32 %14
With such code, our copy propagation pass will drop the u216 and
with this patch we will be able to drop the ushr too.
This pattern can show up for VK_KHR_16bit_storage when we successfully
vectorize 16-bit loads into 32-bit loads, but will become a lot more
common after MR#39511 lands, since that would also affect things like
16-bit TMU loads, which are more common.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
We only really use sub-32bit integers in conversions, so we can skip
clearing the MSB bits when we produce them by converting from larger types
(leaving these bits undefined) and only clear them when we convert from them
to larger types, since we don't have native opcodes to do these conversions
that would only access relevant bits, at least on Pi4. Also, document the
cases where we could do better for Pi5.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39687>
There's a lot that has been fixed, but nobody has been paying attentions
to t720. Let's update the results.
In addition, we used to do skips here instead of flakes. Not sure why,
flakes works just fine.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39711>
This was an oversight of VK_KHR_dynamic_rendering_local_read which has
been addressed by VK_KHR_maintenance10 which introduced new flags to
give more information to implementations.
The Vulkan spec says:
"VK_RENDERING_ATTACHMENT_INPUT_ATTACHMENT_FEEDBACK_BIT_KHR is
intended to give implementations similar information as a subpass
where an attachment could be used as both a color attachment and
input attachment. Some implementations require extra work to make
this scenario work beyond just considering the image layouts.
Implementations which have no such considerations may treat this
flag as a noop. The primary use case for this flag is to enable
feedback loops inside a single shader."
"Applications are encouraged to use
VK_RENDERING_LOCAL_READ_CONCURRENT_ACCESS_CONTROL_BIT_KHR if
maintenance10 is available and they use feedback loops with
VK_KHR_dynamic_rendering_local_read. Feedback loops are still
allowed when not using the rendering flag, but the performance
implication was an oversight in the original definition of
VK_KHR_dynamic_rendering_local_read."
Because it's clearly defined by the Vulkan spec, let's just pessimize
always to avoid relying on some shaders state which require to do very
late decompression passes. This will allow us to do more cleanups and
optimizations related to the framebuffer. Also note that DRLR is still
a niche feature.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39538>
Its no longer an error for depth and stencil formats to have invalid
accumulator format.
Fixes the following tests:
* dEQP-VK.api.info.image_format_properties.2d.optimal.d16_unorm
* dEQP-VK.api.info.image_format_properties.2d.optimal.d24_unorm_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat
* dEQP-VK.api.info.image_format_properties.2d.optimal.d32_sfloat_s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.s8_uint
* dEQP-VK.api.info.image_format_properties.2d.optimal.x8_d24_unorm_pack32
Backport-to: 26.0
Signed-off-by: Arjob Mukherjee <arjob.mukherjee@imgtec.com>
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39626>
The only uses of the macro can be fatal assertions instead.
No point keeping it around, especially as it doesn't work with the ASSERTED
hint to suppress warnings either.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39677>
As rpi5 can work with either 16k or 4k pages, instead of hardcoding the
pagesize just query the kernel.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39555>
Without SUBOPTIMAL, we'd generally end up picking a modifier which isn't
scanout capable, so direct scanout wasn't possible.
This allows direct scanout to work e.g. in Talos Principle.
v2:
* Also bail from wsi_x11_swapchain_query_dri3_modifiers_changed.
(Hans-Kristian Arntzen)
* Use use_modifiers helper function.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39679>
Now that the small/large pages race is fixed, we can safely enable it
back when the kernel side report 1.4.2 support.
Fixes: f3c53cf66b ("nvk: Disable large pages for now")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39706>
Now that we can use info from the shaders to better determine which
possible interpretation of a descriptor is (or might be) used, we can
dump descriptor decoding by default without causing a massive spam of
incorrect/impossible descriptor decoding.
The old arg is repurposed to show all descriptors (ie. the previous
behavior).
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
Add the descriptor index and current pm4 packet to the script hook.
Knowing the pm4 packet tells the callback whether it is a 3d draw or
compute shader, so it knows which shader stages to check. Having the
descriptor index lets the script check against stats extracted from
the relevant enabled shader stages.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
We were assuming that a non-numerical offset was a pm4/descriptor packet
payload. This didn't work for ir3_shader_stats which is defined more
like a "struct" (to match a 'C' struct).
Fixes: ebde70cdce ("freedreno/decode: Allow direct access to domain bitfield")
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
Extract stats about the descriptors used by a shader. A later commit
will use this information to help filter the used descriptors and types.
Unfortunately a1.x usage throws a bit of a wrench into the gears, since
(like s2en) we don't know which tex/samp or even in some cases bindless
base is used. This could perhaps be improved to detect the commmon case
of an immed value loaded into a1.x.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
Makes it more clear which src arg is the descriptor, and makes it easier
for the disassembler to locate the src which references a descriptor.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39636>
If the rendering state is inherited in the secondary, otherwise nothing
wait for the pending flushes after a decompression pass. One more
argument to stop delaying this.
Fixes
dEQP-VK.renderpasses.dynamic_rendering.partial_secondary_cmd_buff.local_read.*
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39678>