Based on the previous commit, this isn't actually a bug and is expected
behavior. Turnip should already be handling it correctly for user
flushes, we just have to make sure to handle it for flushes we insert
ourselves in turnip and freedreno.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24162>
When I wrote this code, I was under the impression that at most one
context from each cluster could be executing at a time. This would mean
that we could treat clusters as pipeline stages and only insert a WFI if
there was a bubble where an earlier stage depends on the result of a
later stage.
This mental model was wrong, though. Experiments on a6xx show it's
possible for two contexts to be executing simultaneously, even though
there are only two contexts - register writing is just stalled until the
earliest-launched context finishes.
This means that the mental model is now much simpler. Any draw can, in
theory, execute in parallel with any previous draw, blit, flush, etc,
although it seems that flushes do wait for any earlier work to finish.
Clusters are mostly just an implementation detail that only matter in
some corner cases, like setting a non-context register (written in the
last cluster) that is used by an earlier cluster that can race ahead of
the write.
An example where this makes a difference is a fragment shader that
writes an image via stib followed by a blit from that same image.
Because both operations happen in the same cluster and use the same
cache, we wouldn't emit anything in the barrier, however actually we
still need to WFI.
This was getting worse on a7xx because later clusters now have 4
contexts, making it easier for draws to be executed in parallel. However
AFAICT it was already a problem on a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24162>
using limited number of bs buffers constraints the simultaneous
use of all available jpeg engines especially when count is lesser than
that of the available engines. make sure the number of buffers
available are more than or equal to the number of jpeg engines on the asic.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24240>
add support to use variable number of bitstream buffers for decode
v2: remove the always true if condition (CI report)
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24240>
For PS epilogs we have two paths, the first one is to pre-compile PS
epilogs at pipeline creation time, while the second one is to compile
PS epilogs on-demand when some dynamic states are used.
Binding the pre-compiled PS epilog to the cmdbuf state allows us to
remove one more pipeline dependency when recording cmdbufs (for shader
objects).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24254>
Nothing produces them any more, so remove them from NIR. This massively reduces
the size of nir_src, which should improve performance all over.
nir_src size reduced from 56 bytes -> 40 bytes (pahole results on arm64, x86_64
should be similar.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253>
anv no longer needs to track if the CFE state is valid since we ensure
that the state is valid at pipeline creation time.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23934>
Fixes: 90a39cac87 ("intel/blorp: Emit compute program based on BLORP_BATCH_USE_COMPUTE")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23934>
Non of the GPU models know at this time have hardware support to
retrieve the dimensions of a level of a texture. Do almost the
same as the binary blob and store the needed values as uniforms.
Passes dEQP-GLES3.functional.shaders.texture_functions.texturesize.*
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24217>
Non of the know etnaviv GPUs support this feature in hardware
and the binary blob provides sizes via uniforms too.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24217>
This is just a prep commit to keep all texture related
lowerings in one c file.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24217>
Since !24096 landed, we can just use ycbcr_info to get information
of an image of the P010 format.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24265>
Overallocating by 2 gprs for ugprs is a wild guess by me. It does make
sense though as each subgroup shares 64 ugprs and that's 2 per thread.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24261>
The API bits are already implemented in clover and rusticl and by
definition a CPU driver implements SVM.
This should allow anybody to work on proper SyCL/CHIP-SPV support for
rusticl running llvmpipe.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24092>
This enables the DG2 render compression modifier in anv. When I tested
this against vkcube, I observed that the full resolve which happened at
the end of every frame was converted to a partial resolve, allowing the
framebuffer to retain compression.
According to Caleb Callaway's testing, enabling this modifier positively
impacts the FPS of the following game benchmarks:
- Strange Brigade.vk-g6 +12.78%
- Strange Brigade.dx12vk-g6 + 9.33%
- Shadow of the Tomb Raider.vk-g6-lx + 2.37%
- Dota 2 (replay Jul 2020).vk-g6 + 2.28%
Thanks to Felix Degrood for pointing out that Strange Brigade would
benefit from this optimization.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
Replace the aux_usage field with two booleans: one for render
compression and one for media compression.
This more accurately describes how CCS_E is used on gfx12. On those
platforms, the FCV feature may be enabled or disabled, but ISL's
modifier table has been using the FCV aux-usage for every gfx12 render
compression modifier. Instead, set the newly-added render compression
boolean to true.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>