Here we change things to simply clone the entire hash table. This
is much faster than trying to rebuild it and is needed to avoid
slow compilation of very large shaders.
The Blender shader compile time in issue #9326 improves as folows:
251.29 seconds -> 151.09 seconds
The CTS test dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
improves as follows:
2.38 seconds -> 1.67 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
There is no point doing an expensive clone of the copies if the
if-branch is empty.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
1. merge si_set_es_return_value_for_gs into si_llvm_es_build_end
2. stop return value when mono mode in which case GS use ES input as
input instead of ES output
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24204>
1. merge si_set_ls_return_value_for_tcs into si_llvm_ls_build_end because they
do the same job to return value
2. stop return value when mono mode with different thread count, in which case
TCS use LS input as its input instead of LS output
3. use si_insert_input_ret_float
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24204>
ACO use same args for merged shader stages, but vb desc input sgpr args
is not present when second stage of merged shader. In order to share
same shaders args, move it to last so other args have same index.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24204>
We only need it to merge LS/HS or ES/GS now, prolog and epilog have
been lowered in nir already. So we just need to handle two parts and
they are sure to be first and second stage of a merged shader.
This also remove the needs SGPRs must be before VGPRs, which is required
by following commits to move some SGPRs after VGPRs.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24204>
fopencookie is a glibc feature, so we can't use it on macOS (and
probably other libc's?). It's only used for the hypervisor interface,
though, so we can just make the hypervisor piece glibc-only while
otherwise fixing the wrap.dylib build.
Fixes: ee83453f69 ("asahi: Add a shared library interface for decode")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24293>
sliceInterleaved may be true for layered images on gfx8. Such a htile
cannot be cleared with radv_clear_htile.
Fixes 24 failures in
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_16_64_6.* on GFX8.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23959>
For secondary command buffers, vn_cmd_begin_render_pass was only used to
track inherited render pass previously. So we clean it up.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24103>
Reset cmd states during vkBeginCommandBuffer regardless of the
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT for simplicity.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24103>
With PIPE_CAP_CLEAR_TEXTURE removed, we need to set clear_texture to NULL
on svga vgpu9 device so it can use the fallback path.
Fixes: a1eabeff66 ("gallium: remove PIPE_CAP_CLEAR_TEXTURE")
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24264>
Lavapipe has switched to layer push descriptor support atop descriptor
updates internally since 12a7fc51c7, so
it must skip retrieving immutable samplers from the write info even if
the update call itself is blessed by the spec to not hit that case.
Fixes: 12a7fc51c7 ("lavapipe: Rework descriptor handling")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24263>
Based on the previous commit, this isn't actually a bug and is expected
behavior. Turnip should already be handling it correctly for user
flushes, we just have to make sure to handle it for flushes we insert
ourselves in turnip and freedreno.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24162>
When I wrote this code, I was under the impression that at most one
context from each cluster could be executing at a time. This would mean
that we could treat clusters as pipeline stages and only insert a WFI if
there was a bubble where an earlier stage depends on the result of a
later stage.
This mental model was wrong, though. Experiments on a6xx show it's
possible for two contexts to be executing simultaneously, even though
there are only two contexts - register writing is just stalled until the
earliest-launched context finishes.
This means that the mental model is now much simpler. Any draw can, in
theory, execute in parallel with any previous draw, blit, flush, etc,
although it seems that flushes do wait for any earlier work to finish.
Clusters are mostly just an implementation detail that only matter in
some corner cases, like setting a non-context register (written in the
last cluster) that is used by an earlier cluster that can race ahead of
the write.
An example where this makes a difference is a fragment shader that
writes an image via stib followed by a blit from that same image.
Because both operations happen in the same cluster and use the same
cache, we wouldn't emit anything in the barrier, however actually we
still need to WFI.
This was getting worse on a7xx because later clusters now have 4
contexts, making it easier for draws to be executed in parallel. However
AFAICT it was already a problem on a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24162>
using limited number of bs buffers constraints the simultaneous
use of all available jpeg engines especially when count is lesser than
that of the available engines. make sure the number of buffers
available are more than or equal to the number of jpeg engines on the asic.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24240>