The CPU copy is horribly slow, so let's hook-up DXGI swapchains. Note
that we're still limited in term of features. For instance, we can't
support more than 2 images per swapchain because of the DXGI present
ordering constraint. We also have to do an extra copy, because DXGI
only allows rendering to a resource on the queue that the swapchain
was created against, but swapchains in Vulkan don't have a queue.
The swapchain is bound to the window using DirectComposition aka
DComp. The DComp infrastructure is set up in the surface, and is
transitioned from one swapchain to the next when the new swapchain
begins presenting.
Unlike Wayland and X, there's no requirement that the compositor has
to release a surface before you can start rendering against it. However,
since we're now supporting the non-sw path, we do need to prevent apps
from rendering to a resource *while* the blit is occurring. We do this
by blocking for a fence while acquiring an image.
Co-authored-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
The win32 swapchain can be backed by a DXGI swapchain, but such swapchains
are incompatible with STORAGE images (AKA UNORDERED_ACCESS usage in
DXGI). So, we need to allocate an intermediate image that will serve as
a render-target, and copy this image to the WSI image when QueuePresent()
is called. That's pretty similar to what we do for the buffer blit case,
except the image -> buffer copy is replaced by an image -> image copy.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
Right now, the WSI core supports copying WSI images to a linear buffer
for implementations that want the result in this form. This being said,
most of the blit logic can be re-used for image to image copies, and that's
exactly what we'll need if we want to hook-up DXGI swapchains in the
win32 WSI implementation. So let's rename a few fields so we no longer
imply that images are copied to a buffer, and the use_buffer_blit boolean
an enum so we can extend the implementation to support image -> image
copies.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
If all potentially supported surface types support present wait,
we can expose the extension.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19279>
When vkWaitForPresentKHR succeeds, we are guaranteed
that any dependent semaphores have been unsignalled.
In an explicit sync world, we are guaranteed this automatically by
having a present complete, since that event must follow a semaphore wait
completion.
However, if the swapchain image is implicitly
synchronized, the semaphore might technically not have been unsignaled
before the present complete event triggers.
Present IDs must be signalled in monotonic order, same as timeline
semaphores.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19279>
Using the wl_drm protocol we can check whether the compositor uses the
same GPU as the application.
This allows to run vulkan applications using a DG2 GPU with the
compositor using another card.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19224>
This flag indicates whether or not the legacy scanout flag is supported.
It defaults to true since that has been the default assumption for the
WSI code up until now.
On NVIDIA hardware, we can't render to linear so, if we don't have
modifiers, we want to automatically fall back to the blit path. In
theory, we could do this inside the driver but it's a giant pain and
much harder to ensure that the blit only happens as part of
vkQueuePresent().
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18826>
Some drivers such as lavapipe are 100% fine with using linear for WSI
images. Most HW drivers, however, would rather render tiled and eat a
blit.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>
This isn't a big deal for the current buffer paths because the required
alignment for PRIME is already higher than any driver advertises.
However, the SW path we're about to add won't have the PRIME requirement.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>
Instead of attempting to signal based on the memory object, use the new
DMA_BUF_IOCTL_EXPORT_SYNC_FILE to get a sync_file for the dma-buf and
use that to signal the semaphore or fence. Because this happens before
we transfer ownership back to the driver, the resulting sync_file should
only contain dma_fences from the compositor and/or display and shouldn't
be mixed up with the driver in any way. This gives us a real semaphore
and fence (as opposed to the dummy objects we've used int the past)
without over-synchronization.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
Not all Vulkan implementations allows rendering to linear images, so in
order to support scanning out from these on Windows we might have to copy
through a buffer like we do in the PRIME path.
To avoid reimplementing the same, let's instead generalize the code a
bit so it doesn't have to specfy any PRIME-specific details.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12210>
We're about to need including this header from a C++ source, so let's
wrap the whole thing in extern "C".
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14850>
The idea is to offer the driver a way to execute on a different queue
than the one the app is using for Present.
For instance, this could be used to make the DRI_PRIME blit asynchronous,
by using a transfer queue.
So instead of creating a command buffer to be executed on present using
the supplied queue, this commit uses an internal transfer queue to perform
the blit.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13959>
Rather than using 2 vfuncs, use one since we've unified the
synchronization framework in the runtime with a single vk_sync object.
v2 (Jason Ekstrand):
- create_sync_for_memory is now in vk_device
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14237>
This gets rid of the bespoke vfunc interface in the WSI code.
Eventually, I'd love to get rid of wsi_display_fence entirely but
this at least makes it a lot more palatable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
Get rid of most of the guts of the base class and just leave it as a
vtable. We can also drop some of wsi_display_fence. One functional
change here is that we're now using VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE
which is more correct anyway because, thanks to the funky reference
counting we do with destroyed and event_received, its lifetime is tied
to the physical device, at best.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
Use os_time_get_nano() instead. At this point, it's just a wrapper
around os_time_get_nano() anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
These changes implement vkRegisterDeviceEventEXT and detection of
monitor hotplug. Wsi launches a thread that listens to udev events and
signals the appropriate device fences when hotplug hapens.
v2: use wsi fences instead of syncobj api (Jason Ekstrand)
v3: refactor + cleanups, create thread on demand (Samuel Pitoiset)
v4: bring back syncobj support from initial version for radv
v5: make libudev dependency optional, check for poll errors (Simon Ser)
v6: change matching mechanism to use udev device node instead of path
v7: remove the matching mechanism
v8: fix a race with thread creation + use single mutex + other cleanups
(Jason Ekstrand)
Fixes:
dEQP-VK.wsi.display_control.register_device_event
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12305>
Drivers that import sync_fd to the wsi fences can use this to set file
descriptor for syncobj related calls. This fixes permission errors when
registering display/device events and importing sync_fd from driver
side.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12305>
By default, Mesa's X11 Vulkan WSI will wait for buffers to be ready
before submitting them to Xwayland when the swapchain is created
with the IMMEDIATE mode.
This is undesirable when the Wayland compositor already monitors
fences. A Wayland compositor may want to know the delay between
the buffer submition and the end of the GPU work, this is impossible
to measure if the WSI waits for the buffer to be ready before
submission.
Since most compositors don't monitor fences, let's introduce a driconf
option for this for now. We can reconsider once more compositors
have better support for fences.
Signed-off-by: Simon Ser <contact@emersion.fr>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11290>
venus only needs to know if a WSI image is a prime blit source. In an
upcoming swapchain image rework, the prime blit destination is unknown
when the WSI image is created. Replace prime_blit_buffer by a bool.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12046>
Always chain wsi_image_create_info to VkImageCreateInfo, which indicates
that the image is a wsi image and can be transitioned to/from
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.
Add prime_blit_buffer to the struct as well. When set, it indicates the
prime blit destination and implies that the image is a prime blit
source.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10789>
Until now, the WSI code would rely on VK_EXT_pci_bus_info to check if the
WSI device matched the DRM device used for display. If they matched, then
it would allow direct presentation of the swapchain images, otherwise
it would fallback to a blit path through a linear buffer.
Unfortunately, this only works for PCI devices, so embedded GPUs would
always fail the test and fallback to the prime blit path, incurring in a
performance penalty due to the extra blit.
With this change driver backends have the possibility to attach a
callback to the WSI device to take control over that decision and have
freedom to make that choice according to their own platform particularities.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5917>
This adds an option to the WSI support for a software path to be
used with the vulkan sw drivers. There is probably some changes
that could be made to improve this and use present, for now
just use put image.
v2: roll out flag across all drivers (Eric)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6082>
DOOM Eternal happily creates a swapchain with 2 images for IMMEDIATE.
This fixes a 10% performance issue with RADV.
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5704>
Just because a modifier is returned for the given format, that doesn't
mean it works with all usages and flags. We need to filter the list by
calling vkGetPhysicalDeviceImageFormatProperties2.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>
The anv implementation still isn't quite complete, but we can at least
start using the structs from the real extension.
v2: Fix circular pNext list (Lionel)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>
This lets us treat the implicit synchronization that we need for X11 and
Wayland like a semaphore. Instead of trusting the driver to somehow
figure out when that memory object needs to be signaled, we provide an
explicit point where the driver can set EXEC_OBJECT_WRITE and signal the
dma_fence on the BO. Without this, we have to somehow track inside the
driver when WSI buffers are actually used to avoid extra synchronization
dependencies.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This option strictly allocate the minImageCount given by the
application at swapchain creation.
This works around application that do not deal with the fact that the
implementation allocates more images than the minimum specified.
v2: Add values in default drirc (Bas)
v3: specify engine name/version (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
The dri options are optional. When the dri options are not provided
the WSI will not use adaptive sync.
FWIW I think for xf86-video-amdgpu this still requires an X11 config
option, so only people who opt in can get possible regressions from this.
So then the remaining question is: why do this in the WSI?
It has been suggested in another MR that the application sets this.
However, I disagree with that as I don't think we'll ever get a
reasonable set of applications setting it.
The next questions is whether this can be a layer. It definitely
can be as implemented now. However, I think this generally fits
well with the function of the WSI. Furthemore, for e.g. the DISPLAY
WSI this is much harder to do in a layer.
Of course, most of the WSI could almost be a layer, but I think
this still fits best in the WSI.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is common to all Vulkan drivers and all WSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
anv and radv both happened to already return 2^14 for these, but
querying the ICD is safer and will help if vdreno (or whatever it's
called) doesn't have the same max.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.
The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs. On a
single-GPU setup or one which doesn't support this present mode, we need
to do something. We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already have wsi_device and we know the instance allocator at
wsi_device_init time so there's no need to pass it into the physical
device queries. This also fixes a memory allocation domain bug that can
occur if CreateSwapchain gets called prior to any queries (not likely)
in which case the cached connection gets allocated off the device
instead of the instance.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This was added as part of 1.1 but it's very hard to track exactly what
extension added it. In any case, we should implement it.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <Airlied@redhat.com>
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2: Remove DRM_CRTC_SEQUENCE_FIRST_PIXEL_OUT flag. This has
been removed from the proposed kernel API.
Add NULL parameter to drmCrtcQueueSequence ioctl as we
don't care what sequence the event was actually queued to.
v3: Adapt to pthread clock switch to MONOTONIC
v4: Fix scope for wsi_display_mode andwsi_display_connector allocs
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Use wsi_rel_to_abs_time helper function to convert relative
timeouts to absolute timeouts without causing overflow.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v6:
Change WSI fence wait function to return VkResult instead of
bool. This makes the meaning of the return value easier to
understand, and allows for the indication of failure.
Also change the WSI fence wait function to take only absolute
timeouts and not provide an option for a relative timeout. No
users wanted relative timeouts, and it's simpler if that option
isn't available.
Terminate the DPMS property loop once we've found the property.
Assert that the fence hasn't already been destroyed in
wsi_display_fence_destroy.
Rearrange the event handler function order in the file to place
routines in an easier to find order.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v7:
Adapt to API changes for surface_get_capabilities
v8:
Use wsi->alloc in register_display_event so that callers
don't have to dig out an allocator for us.
v9:
Fix a few minor formatting issues
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v10:
Use wsi->alloc if none provided in wsi_display_fence_alloc.
Now that drivers are expected to pass the allocator argument
straight through from the application, we need to check those
for NULL everywhere.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This extension is required to support EXT_display_control as it offers a
way to query whether the vblank counter is supported. Internally, it is
implemented using a fake MESA extension which provides a chain-in to
GetSurfaceCapabilities2KHR which contains the one added field. This has
the advantage of reducing number of callbacks needed in the back-ends.
It also means that anything chained into GetSurfaceCapabilities2EXT
through VkSurfaceCapabilities2KHR::pNext so we only need to handle
crawling the pNext chain once per back-end.
Reviewed-by: Keith Packard <keithp@keithp.com>