Even if we're the first job on some queue, there may be no wait
semaphores but we still need to ensure things happen in-order. (See
the "Implicit Synchronization Guarantees" section of the Vulkan spec.)
The client can submit back-to-back command buffers with no semaphores
between them and it needs to adt the same as if there were a semaphore.
If job->serialize is set because of a barrier or something, we still
need to synchronize across HW queues by waiting on last_job_syncs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
In order to properly wait for a query to be complete, we need to first
wait for the end query job to flush through on the queue. Since query
end is always handled on the CPU, we can do this with a condition
variable. The 2s timeout is taken from ANV.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Vulkan requires that, once the device has been lost, you keep returning
VK_ERROR_DEVICE_LOST. We've got tracking for this in common code; it
just needs to be wired up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Instead of having the CPU job execute the CSD job, put both jobs on the
list with the CPU job first which modifies the GPU job which gets kicked
off next. This gives the queue code more visibility into what types of
jobs are actually in the list. In particular, if an indirect compute
job is the last job in a batch buffer, it currently appears as if the
batch ends with CPU work which isn't true because it kicks off GPU work.
In that case, the last job on the list is now a GPU job, which better
matches reality.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Thix fixes two bugs. First, we stop leaking in/out fences with
multisync. Because the in_syncs and out_syncs parameters to
set_multisync were arrays and not pointers to arrays, the caller's
in_syncs and out_syncs pointers never got set and remained NULL so
multisync_free() always sees to NULL pointers and does nothing, leaking
both arrays. Not sure how this isn't showing up in the dEQP leak check
tests.
Second, the struct drm_v3d_multi_sync was in the scope of the then
clause of the `if (device->pdevice->caps.multisync)` so it goes out of
scope before the ioctl. This is, effectively, a use-after-free and,
depending on stack allocation details, may result in the multisync
extension struct getting stompped before the ioctl.
Fixes: ff8586c345 ("v3dv: enable multiple semaphores on cl submission")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15512>
This was waiting for multisync support in our kernel interface so
we can wait on the actual imported payload of a semaphore rather
than the last job we submitted.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
Any thread we create may end up creating/submitting at least a
noop job, which is a shared object. Before multisync, this was
an issue only for the creation of the job itself, but with
multisync we can also modify parameters of the noop job
every time it is used (for signaling and serialization
configuration).
This change adds a noop mutex that all threads (main, wait and
master) take before submitting a noop job to ensure concurrent
access is not an issue.
Fixes flakyness observed with multisync with the following test:
dEQP-VK.api.command_buffers.secondary_execute_twice
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
If a CPU job comes first in a command buffer with a semaphore wait operation
we need to wait on the CPU for the semaphore to be signaled before we process
the job.
We have been doing this with a WaitForIdle operation, but that only works
if the semaphore has been submitted for signaling from the same instance
of the driver. If we have an imported payload from another instance in our
semaphore however, waitForIdle may return too early since the submission
to signal the semaphore may have been submitted by a different instance
of the driver as well, and our wait for idle checks only know about this
instance submissions.
To fix this, we always submit a noop job from our instance that waits on
the semaphores on the GPU and follow up with WaitForIdle to wait for that
to complete.
Fixes test failures and/or assert crashes in:
dEQP-VK.synchronization.cross_instance.*
(when enabling support for semaphore imports)
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
When we have a wait thread we can't ensure that the last job in the last
command buffer will be the one to signal semaphores because in this case
there is no gurantee that jobs from command buffers in the batch will be
submitted to the GPU in order, as those put in a wait thread will be
submitted later when the event wait operation is completed.
Instead, we need to wait for all outstanding wait threads to complete
and only then we should signal any semaphores or fences.
This also fixes a bug where the wait for events was the last job in
the command buffer. In this case, once the event wait is completed
we have no additional jobs to submit and thus would never try to
signal semaphores or fences.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
This is preparatory work to expose support for importing semaphores, which
was waiting on kernel multisync support.
When we implemented user-space multisync support we didn't handle
temporary fence/semaphore payload imports at all, so we fix that here.
Also, we add a has_temp boolean flag to identify the case where we have
a temporary payload in a fence/sempahore instead of just checking if
temp_sync is not 0. This is necessary to support semaphore imports
(for which we are not exposing support yet) because these need to drop
the temporary payload when they are used as wait semaphores in a submit,
but we can't destroy the underlying temp_sync at that point because it
needs to survive at least until the submit is finished, so instead
we use a flag to tell if we have an active temporary payload or not,
and we simply destroy any temp_sync on a semaphore destroy or any new
import on the same semaphore. We only strictly need this flag for
semaphores because fences drop the temporary payload when they are
reset, which happens in the CPU and can only be done if the GPU is not
using the fence, but we add the same flag for the fence for consistency.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15342>
Semaphores info was stored as an info of event_wait cpu jobs and this
leads to mem leak when the same event_wait job in the same cmd buffer
batch was submitted more than once. As a result,
`dEQP-VK.api.command_buffers.record_simul_use_primary` fails due to a
double-free of sems_info.
In this patch, we no longer use v3dv_event_wait_cpu_job_info to store
semaphores from a submit info, since semaphores is related to a queue
submission and not to the event_wait job type. If we spawn a wait_thread,
we copy semaphores to an auxiliary struct (v3dv_wait_thread_info) that
will be used in wait_thread to get job and semaphores information. When
the spawned thread finishes, it releases the related
v3dv_wait_thread_info and the semaphores copy as well.
Fixes: d5bd18fb ("v3dv: store wait semaphores in event_wait_cpu_job_info")
Suggested-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14736>
We track last submitted jobs by queue type. After all cmd buffer
batches have been submitted, we emit a noop job that waits all jobs
submitted to each GPU queue complete and signals the fence.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With multiple semaphores support, we can use a GPU job to handle
multiple signal semaphores in the end of a cmd buffer batch. It
means, the last job in the last cmd buffer will be in change of
signalling semaphores as long as it meets some conditions:
1 - A GPU-job signals semaphores whenever we only have submitted
jobs for the same queue (there is no syncobj created for any
other type). Otherwise, we emit a noop job that waits on the
completion of all jobs submitted and then signals semaphores.
2 - A CPU-job is never in charge of signalling semaphores. We
process it first and emit a noop job that depends on all jobs
previously submitted to signal semaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With multiple semaphore support, we can improve the way we handle
wait semaphores considering different job types and cmd buffer
batch scenarios, that means:
- A GPU job depends on wait semaphores whenever it is the first job
submitted to a queue in this command buffer batch (the `first` flag
for the job's queue type is set).
- For the first CPU job, if there are wait semaphores, we should
wait for the CPU and GPU being idle to process the job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
The order in which a GPU job is scheduled is guaranteed within the
same queue type (CL, TFU, CSD), but the order of completion of jobs
from different queues cannot be guaranteed. Since we have multiple
semaphores support now, we can track the completion of the last job
submitted to each queue and therefore better determine when gpu is
idle. We do it using an array of syncobj (last_job_syncs) for each
GPU queue (CL, TFU, CSD). With this, job serialization also become
more accurate. We also keep tracking the very last job submitted
(last_job_sync became an element of the last_job_syncs array as
V3DV_QUEUE_ANY) for the case we don't have multisync support.
To help in handling wait semaphores, we set a flag per queue to
indicate we are starting a new cmd buffer batch and a job submitted
to this queue will be the first.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
In addition to keep a copy of wait semaphores, extend
v3dv_submit_info_semaphores to hold a copy of signal semaphores too.
With a copy of wait and signal semaphores, we can enable GPU jobs to
handle more than one wait and signal semaphores.
By now, we don't change the way as we signalling semaphores when all
jobs complete, i.e., we still use the master thread to signal
semaphores. In this context, no GPU job is actually in charge of
signalling, but the support for multiple signal semaphores is done
here.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Whenever v3d kernel-driver supports multisync extension, use it to
enable more than one semaphores in a tfu job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Whenever v3d kernel-driver supports multisync extension, use it to
enable more than one semaphores in cl submission. In CL, we have two
kind of job (bin and render), therefore, we need also to determine
the stage to sync, that means to add job dependencies/wait
semaphores.
Also, as we currently process all signal semaphores of a cmd buffer
batch together in the submit master thread (when the last wait
thread completes), there isn't now a situation in which GPU jobs
need to handle signal VkSemaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of a boolean (sem_wait) in v3dv_event_wait_cpu_job_info,
that is used to determine wait condition for jobs put in a wait
thread, copy the wait semaphores data and store it as struct
v3dv_submit_info_semaphores. In the following patches we enable
multiple semaphores in GPU jobs, and therefore we need this data
to add wait semaphores as job dependencies for pending jobs
submitted from a wait thread.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of pass pSubmit to queue_submit_cmd_buffer, create a struct
v3dv_submit_info_semaphores to wrap semaphores data from VkSubmitInfo.
In the next commit, this struct will help to handle wait condition for
jobs submitted in a wait event context, since we need to hold this
data when handle wait events and pass it to queue_submit_job() called
from wait threads. The main goal is to allow multiple wait semaphores
in a job submission. Later, this struct will be extended to include a
copy of signal semaphores too.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
is_wait_thread is passed, but not actually used; and cpu_queue_handle_idle
is in charge to handle wait threads spawned before this one.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
We had some with unlikely, some without it. Let's just put unlikely to
all of them.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13071>
Dumps the command list, excluding the binary resources.
v2 (Juan):
- Make this option independent from `cl`
v3 (Iago):
- Rename option name
- Fix style issues
- Do not print BO ranges
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12803>
This fixes an issue related with testing this with a kernel with the
performance counters enabled: it introduces a "pad" field that in the CL
submission structure that is not initialized.
Fixes: ca13868098 ("drm-uapi: add v3d performance counters")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12390>
When multiview is enabled, queries must begin and end in the
same subpass and N consecutive queries are implicitly used,
where N is the number of views enabled in the subpass.
Implementations decide how results are split across queries.
In our case, only one query is really used, but we still need
to flag all N queries as available by the time we flag the one
we use so that the application doesn't receive unexpected errors
when trying to retrieve values from them.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
The idea would be to move all the code that uses cl_emit,
cl_emit_with_prepack, v3dx_pack, and any enum/structure definition
defined on the v3d pack headers.
All those methods would be defined on v3dvx_private (that would be the
equivalent to v3dx_context.h on v3d).
This commit includes the definition of v3dX for the current version
supported (42), a function calling wrapper, and the move for v3dv_queue
methods as a reference.
About the function calling wrapper, I took the idea from anv. We don't
have on v3d, but we added it because we foresee that we will need that
functionality more often. So without that macro, in order to call the
correct version of the method from the general code we would need to
do like we do on v3d, and doing something like this:
if (devinfo->ver >= 42)
return v3d42_pack_sampler_state(sampler, pCreateInfo);
else
return v3d33_pack_sampler_state(sampler, pCreateInfo);
So with the macro we can just do this:
v3dv_X(device, pack_sampler_state)(sampler, pCreateInfo).
Note that as mentioned, that is to be used on the general code, so a
runtime decision. If we are already on version-dependant code (so at
v3dx_queue for example) we just use v3dX, as at that point is a build
time decision.
Also, fwiw, I don't like too much the name of that macro, but I was
not able to think on a better one.
v2: merge job_emit_noop_bin and job_emit_noop_render (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11310>
This provides most of the implementation, but there are some
things we cannot enable until we improve of kernel submit
interface, namely:
We don't expose capacity to export SYNC_FD, although we do
have the implementation in place. This requires that we
improve our kernel interface and event wait implementation
first so we can cover the corner case where the application
submits a command buffer that includes a VkCmdWaitForEvents
and tries to export a SYNC_FD from its signal semaphores or
fence before it the event is signaled and the command buffer
is sent to the kernel for execution in full.
Likewise, we can't currently import semaphores. This is because
our current kernel submit interface can only take one syncobj.
We have been working around this so far by waiting on the last
syncobj produced from the device whenever we had to wait on any
semaphores (which is obviously suboptimal already), but this
won't work as soon as we allow importing external semaphores,
as those could (and would typically) be produced from a
different device.
Once we address the kernel bits, we should come back and enable
SYNC_FD exports as well as semaphore imports.
Relevant CTS tests:
dEQP-VK.api.external.fence.*
dEQP-VK.api.external.semaphore.*
dEQP-VK.synchronization.cross_instance.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
We can (and should) close the descriptor immediately after the import.
Gets the following CTS test to pass without requiring to increase limits
for open file descriptors:
dEQP-VK.synchronization.basic.binary_semaphore.chain
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
Check if v3dv_ioctl() or v3dv_bo_map() fail, and print a proper error
message.
This check happens in the rest of the code, so it makes sense to apply
here too.
Fixes CID#1468162 "Unchecked return value (CHECKED_RETURN)".
v2:
- Fix message error (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10380>
Dedicated BOs waste memory and are also a significant cause of CPU
overhead when applications use hundreds of them per frame due to
all the work the kernel has to do to page in all these BOs for a job.
The UE4 Vehicle demo was hitting this causing it to freeze and stutter
under 1fps.
The hardware allows us to setup groups of 16 queries in consecutive
4-byte addresses, requiring only that each group of 16 queries is
aligned to a 1024 byte boundary. With this change, we allocate all
the queries in a pool in a single BO and we assign them different
offsets based on the above restriction. This eliminates the freezes
and stutters in the Vehicle sample.
One caveat of this solution is that we can only wait or test for
completion of a query by testing if the GPU is still using its BO,
which basically means that we can only wait for all active queries
in a pool to complete and not just the ones being requested by the
API. Since the Vulkan recommendation is to use a different query
pool per frame this should not be a big issue though.
If this ever becomes a problem (for example if an application does't
follow the recommendation and instead allocates a single pool and
splits its queries between frames), we could try to group queries
in a pool into a number of BOs to try and find a balance, but for
now this should work fine in most cases.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10253>
Instead, wait for the specific queries being reset to
not be in use by the GPU.
This takes query pool resets in the UE4 Shooter demo from
50-60ms down to 0.5-2ms.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8554>
We were allocating twice the size we need for this array. This was
probably caused by a copy and paste error from the GL driver which
grows this dynamically as BOs are added to the job.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7733>
Used as reference Hyujun's commit
5d3fdbc52b, that does the same for
turnip.
This commit also replaces in several cases alloc for zalloc, and adds
checks on more Destroy methods if the object to be free is NULL or
not. Most of them were needed to avoid crashes/weird behaviour due
trying to use un-initialized data. Note that now that vk_object_free
iterates over a array, making it more against un-initialized or just
NULL data.
Additionally, using zalloc we can also remove some memset to 0. In
fact we needed to remove them, as if not, they would override the
vk_object_base object to 0 (the alternative would me doing a memset
computing a pointer offset, but that's is not needed as we can just
use zalloc).
v2:
* Call memset(0) on reused descriptor sets when calling
ResetDescriptorPool, not when reallocating them (Iago)
* Add null check when calling DestroyImageView (detected by a full CTS run)
v3: Fixed rebase conflicts after last meta copy/clear changes
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7627>
So far, we have only been supporting X11, so we assumed that we were running
inside X11 and would always try to get an authenticated fd from Xorg during
device initialization. While this works for desktop Raspbian, it is not
really correct and it is not what we want to do when we start considering
other WSIs.
Initially, one could think we can still do this by guarding the WSI code
under the proper instance extension check. This, however, doesn't work
reliably, as the Vulkan loader can call vkEnumerateDevices without enabling
surface extensions on the instance, which then can lead to us not
initializing any display_fd and failing with VK_ERROR_INITIALIZATION_FAILED,
which is not correct, so while we can try to acquire the display_fd here,
it might not always work, and we should definitely not fail initialization
of the physical device for that.
Instead, with this change we move acquisition of display_fd to swapchain
creation time where required extensions need to be enabled in the instance.
This was also suggested by Daniel Stone during review of a work-in-progress
implementation for the Wayland WSI.
There is a special case to consider though: applications like Zink that
don't use Vulkan's swapchains at all but still allocate images that they
intend to use for WSI. We need to handle these by checking that we have
indeed acquired a display_fd before doing any memory allocation for WSI,
and acquiring one at that time if that's not the case.
This change also removes the render_fd and display_fd fields from the
logical device (which we were copying from the physical device), because
now there is no guarantee that we have acquired a display_fd at the
time we create a logical device. Instead, we now put a reference to the
physical device on the logical device from which we can access these.
Finally, this also fixes a regression introduced with VK_KHR_display, where
if that extension is enabled but we are running inside a compositor, we would
acquire a display_fd that is not authenticated and try to use that instead
of acquiring an authenticated display_fd from the display server.
Fixes: b1188c9451 (v3dv: VK_KHR_display extension support)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7546>
V3D doesn't provide any means to acquire timestamps from the GPU
so we have to implement these in the CPU.
v2: enable timestampComputeAndGraphics and set timestampPeriod (Piñeiro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7373>
Fix defect reported by Coverity Scan.
Assign instead of compare (PW.ASSIGN_WHERE_COMPARE_MEANT)
assign_where_compare_meant: use of "=" where "==" may have been intended
Fixes: ca86c7c65a ("v3dv: assert command buffers are executable when submitting to a queue")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7156>