If the blit formats match and the resource formats match, then that's a
memcpy whether or not the blit's view of the resource matches the
resource's format.
Improves perf of portal-2-v2's last frame on zink+anv by 1.33212% +/-
0.302829% (n=5), where there's a blit that is viewing the RGBA8_UNORM
src/dst resources as RGBA8_SRGB.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20594>
In file included from src/vulkan/wsi/wsi_common_drm.c:34:
include/drm-uapi/dma-buf.h:23:10: fatal error: 'linux/types.h' file not found
#include <linux/types.h>
^~~~~~~~~~~~~~~
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16987>
From https://cgit.freedesktop.org/drm-misc/
9cc4853e4781bf0dd0f35355dc92d97c9da02f5d
Author: Antonio Borneo <antonio.borneo@foss.st.com>
Date: Tue Jun 7 23:31:44 2022 +0200
drm: adv7511: override i2c address of cec before accessing it
This version has the new sync_file import/export ioctls.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16987>
Fixes assertion fails in piglit isinf-and-isnan, which uses a constant infinity,
which has an out-of-bounds mantissa (but the function contract says that's
fine and we just return something undefined.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20563>
This acts as a depth/stencil write. The AGX compiler checks outputs_written to
determine what conservative depth settings the driver needs. Nominally, this
should work: the original store_output(FRAG_RESULT_DEPTH) intrinsic causes the
DEPTH outputs_written bit to be set, so the metadata is still correct after
lowering store_output to store_zs_agx. However, there are a handful of places
that call nir_gather_info late, which *resets* the existing outputs_written
value and regathers, causing Asahi to use the wrong conservative depth settings
when shuffling NIR pass order and breaking gl_FragDepth.
To fix, handle store_zs_agx conservatively when gathering info so we don't have
to play games with the pass order or stashing info in a sideband.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20563>
Rely on the common address arithmetic optimizations. We don't need the
special formats for UBO loads anyway, so this is simpler and optimizes
out the ushr.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20558>
This works like store_global, but lets us optimize address arithmetic. Like
load_agx, it is formatted to match the hardware semantic. We don't make use of
any clever formats in this series, though.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20558>
Freedreno needs to know when an image has volatile or coherent access
flags in the shader.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20612>
Don't turn gl_access_qualifier coming from NIR back into GL enums,
losing information in the process.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20612>
Implement the get_decoder_fence vfunc. Note that the waiting for
completion in this driver happens in the end_frame vfunc itself.
Signed-off-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in picture->fence in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Implement the get_decoder_fence vfunc by waiting on the fence
previously passed in picture->fence in the end_frame vfunc.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Use the new get_decoder_fence vfunc to implement
vaQuerySurfaceStatus and vaSyncSurface in the va state tracker.
A pointer to the surface's fence is passed to the codecs before the
end_frame vfunc and the codec is responsible for allocating a fence on
command stream submission.
This fence is then queried on vaQuerySurfaceStatus and waited on in
vaSyncSurface.
Notably both functions were not implemented as per the VA-API docs for
PIPE_VIDEO_ENTRYPOINT_BITSTREAM.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Add PIPE_DEFAULT_DECODER_FEEDBACK_TIMEOUT_NS as a way to control
how much to wait for decoders if this is supported.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Add a get_decoder_fence vfunc that can be used to query the status
of the previous decode job denoted by 'fence' given 'timeout'.
A pointer to a fence pointer can be passed to the codecs before the
end_frame vfunc and the codec should then be responsible for allocating
a fence on command stream submission.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Only construct the key on-demand if the PROG state is dirty. The newly
added "virtual" PROG_KEY state is used to know when other state that the
shader key depends on changes. Worth ~13% at drawoverhead test 0.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
A pretty significant amount of time spent in fd6_draw_vbo is calling
memset to zero init the on-stack struct. And a big part of the size
of the struct is fd6_state, of which we only need to initialize
num_groups to zero. This is worth a 15% improvement in drawoverhead
test 0 ("no state change").
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
We already overwrote the entire descriptor in patch_fb_read_sysmem().
Doing the same in patch_fb_read_gmem() will simplify things for moving
the fb_read descriptor to the FS's bindless descriptor set.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
Split out the build-up of CP_SET_DRAW_STATE packet, as we are going to
want to re-use this for compute state later when we switch to bindless
IBO descriptors.
While we are at it, drop the enable_mask param, as this is determined
solely by the group_id, and it is easier to maintain a table for the
handful of exceptions to ENABLE_ALL. The compiler should be able to
optimize away the table lookup.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
Same thing as https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20530:
newly added `src/vulkan/util/rmv/vk_rmv_tokens.h` (see !17331) includes
`src/util/` files, so anything that includes it needs `idep_mesautil`.
In file included from ../src/vulkan/util/rmv/vk_rmv_common.h:29,
from ../src/vulkan/runtime/vk_device.h:26,
from ../src/vulkan/wsi/wsi_common.c:31:
../src/util/simple_mtx.h:34:12: fatal error: valgrind.h: No such file or directory
34 | # include <valgrind.h>
| ^~~~~~~~~~~~
compilation terminated.
Fixes: 5f30a7538b ("vulkan: Add RMV token definitions")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20642>
Driver workarounds for game bugs can be easily broken. This one
shouldn't be applied to meta shaders and this restores previous logic.
Fixes: da32cbb5c6 ("aco: fix missing uses of MRT output flags")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20637>
The chance we'll miss anything from non-LTO is minimal, and having
both builds in one is too slow (usually the latest job to finish).
Acked-by: Martin Roukala <martin.roukala@mupuf.org>
Acked-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20623>
This directive needs a newline following it to render correctly.
While we're at it, fixup the incorrect indent for one of the
descriptions.
Fixes: 0c58ad3e32 ("docs: use envvar directive more")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20640>
NIR will automatically lower all of these opcodes unless the driver
specifies that it can handle them natively. We don't have any hardware
support for any of these opcodes though, so we just let NIR lower
all of them.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20639>
In Vulkan this is expected to work with single sample scenarios too.
Fixes new test in CTS main:
dEQP-VK.pipeline.monolithic.multisample.alpha_to_one.samples_1
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20634>
Particularly, this makes compilation stop as soon as we get a
valid shader and doesn't try to optimize spilling by trying
fallback strategies.
Might come in handy to reduce CTS execution time, for example,
dEQP-VK.ssbo.layout.random.8bit.all_per_block_buffers.6 goes from
43m46.715s down to 15m15.068s.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20601>
There's no real reason not to, WDDM supports it. It's not really that
useful, but I don't expect most apps to really want to do it anyway.
It does enable some useful synchronization scenarios sometimes.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
The CPU copy is horribly slow, so let's hook-up DXGI swapchains. Note
that we're still limited in term of features. For instance, we can't
support more than 2 images per swapchain because of the DXGI present
ordering constraint. We also have to do an extra copy, because DXGI
only allows rendering to a resource on the queue that the swapchain
was created against, but swapchains in Vulkan don't have a queue.
The swapchain is bound to the window using DirectComposition aka
DComp. The DComp infrastructure is set up in the surface, and is
transitioned from one swapchain to the next when the new swapchain
begins presenting.
Unlike Wayland and X, there's no requirement that the compositor has
to release a surface before you can start rendering against it. However,
since we're now supporting the non-sw path, we do need to prevent apps
from rendering to a resource *while* the blit is occurring. We do this
by blocking for a fence while acquiring an image.
Co-authored-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>