if %cond {
%store_reg (%reg, %val)
}
can be skipped if no invocations are active. This did not take helper
invocations into account, meaning the value of %reg could be garbage for
helper invocations.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31802>
The expansion of DUMP_CL is missing parenthesis, making the dumping of
descriptors incorrect.
Fixes: 3b69edf825 ("pan/genxml: Enforce explicit packed types on pan_[un]pack")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33040>
After the SPIR-V generator is optimized to generate multiple constant
types, the shader compiler of Imagination proprietary drivers can no
longer correctly handle these shaders and will bail out.
Handle this as a driver quirk and revert to the old behavior with only
uint constants when IMG proprietary drivers are detected.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32342>
Right now, we have a problem when we flush draws inside a render pass
and we don't have enough information to re-emit the framebuffer/tiler
descriptors.
Turns out the only situations where this happens is when an occlusion
query end happens, but we shouldn't really flush the draws in that case.
What we should do instead is record the OQ in our command buffer, so we
can signal OQ availability when the fragment job is done.
In order to solve that, we add an OQ chain to the command buffer to
track OQs ending inside the render pass. We then walk this chain at
fragment job emission time to signal the syncobjs attached to each
query.
This also simplifies the whole occlusion query synchronization model:
instead of waiting for each syncobj individually, we now wait on
the iterators to make sure all OQs have landed. Thanks to this new
synchronization, we can batch OQ reset/copy operations and make the
command stream a lot shorter when big query ranges are copied/reset.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32973>
We have wrappers distinguishing staging registers from sratch registers,
so let's use cs_sr_reg64() here.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32973>
The SYSTEM scope triggers CPU interrupts we don't really need, so let's
use the CSG scope to avoid those. Note that the scope doesn't encode
the visibility aspect, meaning changes to the sync object with a CSG
scope will still be instantly visible to the CPU, it's just that the
CPU needs to poll the value to detect a change, which is basically what
we're doing for syncobjs attached to events/queries, so we're good.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32973>
Let's prevent clang-format from adding the semi-colon on a new line when
we use cs_{continue,break}();
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32973>
We already do that in the other cs_emit(b, BRANCH, I), so let's fix this
path too.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32973>
The increment was wrong, which ended up generating a lot more stores
than we need.
Fixes: bf05842a8d ("pan/cs: Add an event-based tracing mechanism")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32973>
Descriptor sets which have `size` of `0`, such as a descriptor set
with just dynamic descriptors, weren't being freed in
`vkDestroyDescriptorPool()` since that relies on keeping track of
descriptor sets in the `entries` list. Keep track of them in the
`entries` list.
Fixes a memory leak in:
dEQP-VK.binding_model.descriptor_copy.compute.uniform_buffer_dynamic_5
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32968>
We can disable FDM in the renderpass based on
`tu_render_pass_disable_fdm()` however a pipeline could have
been bound before starting the renderpass which had
`...RENDERING_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT` set, in
which case the shader is compiled with FDM support. We still
need to apply the patchpoints. Previously `patchpoints_ctx` was
created only based on whether FDM was enabled in the renderpass,
which was leading to the patchpoints being allocated with no
context so they were never getting freed. Now setup `patchpoint_ctx`
regardless of FDM being disabled or not.
Fixes memory leaks in some tests from:
dEQP-VK.dynamic_rendering.*_cmd_buff.fragment_density_map.*
e.g.
dEQP-VK.dynamic_rendering.partial_secondary_cmd_buff
.fragment_density_map.2_views .render.divisible_density_size
.1_sample.static_subsampled_1_2
Fixes: 05f96dd00f ("tu: Add core FDM patchpoint infrastructure")
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32968>
When no attachments are used in a renderpass we should ignore the
clear values. It seems to be valid applications to still pass clear
values in such a case.
Fixes memory leaks in some tests from:
dEQP-VK.image.texel_view_compatible.graphic.basic.*d_image.texture_read.*
dEQP-VK.image.texel_view_compatible.graphic.extended.*d_image.texture_read.*
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32968>
On Bifrost/Midgard, when an attribute has a non-zero divisors, the
attribute offset is tweaked to take the base_instance into account,
which implies we have to re-emit the attributes if the base instance
value changed.
Let's not bother tracking the last base instance and re-emit
unconditionally in that case, which is still better than what we had
before 3db963a135 ("panfrost: Emit attribs in
panfrost_update_state_3d() on bifrost/midgard") and fixes the regression
introduced by this commit.
Fixes: 3db963a135 ("panfrost: Emit attribs in panfrost_update_state_3d() on bifrost/midgard")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Tested-By: Chris Healy <healych@amazon.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33010>
Although in some cases python 3.x will come without "3" as suffix,
"python3" should still be preferred because sometimes "python" is python
2.x instead.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33025>
It's quite wasteful on the command stream space to always emit all of the
PA_SHADER_ATTRIBUTES states, as many shaders use far less varyings than
the absolute max number. Limit the emission to actually used attributes.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32991>
Halti5 has no interpolation setting which will take into account the
rasterizer flatshade state. Force the interpolation qualifier to flatshade
for the color varyings when API level flatshade is enabled. Only set the
new bit in the shader key on halti5 GPUs to avoid generating unnecessary
shader variants on GPUs where the rasterizer flatshade state is properly
applied to the varyings.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32991>
Pull the interpolation qualifiers from NIR and translate them to
HALTI5_SHADER_ATTRIBUTES states.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32991>
Use memcpy to transfer the varying setup from stack arrays into the
compiled shader state. Resulting code is less verbose and the
compiler is smart enough to optimize away the function call anyway.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32991>
During loading of snapshot, there will be a single-threaded
decoder that aquires the same mReconstructionMutex, repeatedly.
Since the mReconstructionMutex is intentionally changed to
be non-recursive, we should not aquire it at the beginning of
load call; otherwise, we will be deadlock the decoder thread.
Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33018>
Replace tryLock / unlock with regular scoped lock now that the
"extra handles" have been moved out of VkReconstruction and into
the VkSnapshotApiCallInfo.
Switch to regular std::mutex and std::lock_guard.
Annotate mReconstruction with GUARDED_BY to start to get more
thread safety analysis.
Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33018>
... so that `VkDecoderGlobalState` can append additional information
needed for snapshotting. Specifically, `VkDecoderGlobalState` may create
additional boxed handles that are not visible directly in the API surface.
For example, `vkCreateDevice()` creates boxed handles for the `VkQueue`-s
and `vkCreateDescriptorPool()` creates boxed handles for pre-allocated
`VkDescriptorSets`. These boxed handles are not recoverable from the API
for `vkCreateDevice()` nor `vkCreateDescriptorPool()` directly. This was
previously worked around by just sticking the extra boxed handles in
`VkReconstruction::mExtraHandlesForNextApi` but this is not thread safe.
Instead, let's give `VkDecoderGlobalState` and `VkDecoderSnapshot` exclusive
access to individual `VkSnapshotApiCallInfo` objects.
Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33018>
Removes the additional saving of opcode and packet len as this is
all available from within the packet itself.
Removes the "trace" position from `VulkanMemReadingStream` as these
seemed to only be used for getting the packet start and packet size
but these are already available.
Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33018>
In CoherentMemory::subAllocate function, we should
also update offset to the correct values.
Test: boot with skiavk enabled and the textures
should not be corrupted
Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33018>
Original change from: kyoungwon.kim@bytedance.com at aosp/3310239
Moving the change into gfxstream and codegen.
The issue was found by pengzejie@bytedance.com.
Note that vkReadStream's BumpPool is effectively `freeAll`'ed by
`clearPool` calls. The same call for vkStream is not being called
while alloc is called here and there.
Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33018>