Instead, wait for the specific queries being reset to
not be in use by the GPU.
This takes query pool resets in the UE4 Shooter demo from
50-60ms down to 0.5-2ms.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8554>
We were allocating twice the size we need for this array. This was
probably caused by a copy and paste error from the GL driver which
grows this dynamically as BOs are added to the job.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7733>
Used as reference Hyujun's commit
5d3fdbc52b, that does the same for
turnip.
This commit also replaces in several cases alloc for zalloc, and adds
checks on more Destroy methods if the object to be free is NULL or
not. Most of them were needed to avoid crashes/weird behaviour due
trying to use un-initialized data. Note that now that vk_object_free
iterates over a array, making it more against un-initialized or just
NULL data.
Additionally, using zalloc we can also remove some memset to 0. In
fact we needed to remove them, as if not, they would override the
vk_object_base object to 0 (the alternative would me doing a memset
computing a pointer offset, but that's is not needed as we can just
use zalloc).
v2:
* Call memset(0) on reused descriptor sets when calling
ResetDescriptorPool, not when reallocating them (Iago)
* Add null check when calling DestroyImageView (detected by a full CTS run)
v3: Fixed rebase conflicts after last meta copy/clear changes
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7627>
So far, we have only been supporting X11, so we assumed that we were running
inside X11 and would always try to get an authenticated fd from Xorg during
device initialization. While this works for desktop Raspbian, it is not
really correct and it is not what we want to do when we start considering
other WSIs.
Initially, one could think we can still do this by guarding the WSI code
under the proper instance extension check. This, however, doesn't work
reliably, as the Vulkan loader can call vkEnumerateDevices without enabling
surface extensions on the instance, which then can lead to us not
initializing any display_fd and failing with VK_ERROR_INITIALIZATION_FAILED,
which is not correct, so while we can try to acquire the display_fd here,
it might not always work, and we should definitely not fail initialization
of the physical device for that.
Instead, with this change we move acquisition of display_fd to swapchain
creation time where required extensions need to be enabled in the instance.
This was also suggested by Daniel Stone during review of a work-in-progress
implementation for the Wayland WSI.
There is a special case to consider though: applications like Zink that
don't use Vulkan's swapchains at all but still allocate images that they
intend to use for WSI. We need to handle these by checking that we have
indeed acquired a display_fd before doing any memory allocation for WSI,
and acquiring one at that time if that's not the case.
This change also removes the render_fd and display_fd fields from the
logical device (which we were copying from the physical device), because
now there is no guarantee that we have acquired a display_fd at the
time we create a logical device. Instead, we now put a reference to the
physical device on the logical device from which we can access these.
Finally, this also fixes a regression introduced with VK_KHR_display, where
if that extension is enabled but we are running inside a compositor, we would
acquire a display_fd that is not authenticated and try to use that instead
of acquiring an authenticated display_fd from the display server.
Fixes: b1188c9451 (v3dv: VK_KHR_display extension support)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7546>
V3D doesn't provide any means to acquire timestamps from the GPU
so we have to implement these in the CPU.
v2: enable timestampComputeAndGraphics and set timestampPeriod (Piñeiro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7373>
Fix defect reported by Coverity Scan.
Assign instead of compare (PW.ASSIGN_WHERE_COMPARE_MEANT)
assign_where_compare_meant: use of "=" where "==" may have been intended
Fixes: ca86c7c65a ("v3dv: assert command buffers are executable when submitting to a queue")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7156>
If we have a semaphore wait the job cannot be started before the semaphore
has been signaled, so we need to wait before starting the binning stage.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.binary_semaphore.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with finishing the current job in the
presence of a pipeline barrier and relying on the RCL serialization,
but of course this is not always enough.
This patch addresses synchronization across different GPU units
(i.e. draw indirect after compute), as well as cases where we need to
sync before binning.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.barrier.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The hardware can't do this, so we need to record a CPU job that will
map the indirect buffer at queue submission time, read the dispatch
parameters and then submit a regular dispatch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Both the API user and the driver may attempt to map a BO, possibly
only partially and using different ranges. This is a problem because
we only have a single map per BO. Fix this by making sure that when
a BO is mapped, we always map its entire range. This way if a BO
has been mapped before, we know that map is still valid no matter the
region we need to access now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The blit shader path for buffer to image copies is pretty bad,
since it needs to produce a tiled image from the linear buffer
prior to emitting the blit copy.
This patch adds a new preferential path where we implement the
copy using the CPU, similar to what the GL driver does for
texture uploads. This makes vkQuake2 at least 4x faster when
dynamic lights are enabled (which triggers dynamic texture
updates).
We also tested a GPU path where we use a shader that takes the
linear buffer as a UBO and copies directly from it. This also
shows a clear performance gain, but still worse than the CPU
implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asserting on them makes it easier to identify applications and tests that
try to use unimplemented features.
Also, there are some APIs that relate to optional features we don't
(or can't) support, such as sparse, so for these we just provide
the trivial implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are basically two types of scenarios to consider:
- Secondary command buffers that run inside a render pass.
- Secondary command buffers that run outside a render pass.
For the former we want to record their commands into a binning command
list that we can branch to when executed into a primary command
buffer. This means this kind of command buffers don't spawn new jobs,
just the default one where they record the binning commands which
won't include the frame setup, which will be provided by the primary
they will be executed in.
For the latter we don't require anything special, we just record as
many jobs as we need as usual and link that job list from the primary
job list when executed.
This handles most scenarios except:
- vkCmdWaitForEvents
- VkCmdClearAttachments
Both of these can spawn new jobs inside a render pass, which is not
what we want for secondary command buffers. We will address this is
follow-up patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This reverts a previous half-attempt at an implementation of events
using a BO to hold the event state, and provides a full
implementation. V3D doesn't have any built-in GPU functionality to
wait on any kind of events, so we need to implement this in the driver
an therefore we no longer need to use a BO for the event state.
Instead, we implement GPU waits by using a CPU job for the wait
operation that spawns a wait thread if the wait operation doesn't have
all its events signaled by the time it is processed. To implement the
semantics of the wait correctly, any jobs in the same command buffer
that come after the wait will not be emitted until the wait thread
completes.
If a submit spawns any wait threads for a command buffer we can't
signal any semaphores for it until all the wait threads complete and
we know that all the jobs for those command buffers have been
submitted. The same applies to the submit fence, if present.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
While very limited in scope, this might be the most efficient way to blit
when applicable. In fact, we might also want to use this for the image copy
commands when possible instead of the TLB.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is similar to the scenario where we have a submit without
any command buffers, even if we don't have any actual GPU work to do
we still might need to signal fences/semaphored and possibly wait on
previous jobs to finish, so we need to submit something to the kernel
to get all that done right.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The design for queries in Vulkan requires that some commands execute
in the GPU as part of a command buffer. Unfortunately, V3D doesn't
really have supprt for this, which means that we need to execute them
in the CPU but we still need to make it look as if they happened
inside the comamnd buffer from the point of view of the user, which
adds certain hassle.
The above means that in some cases we need to do CPU waits for certain
parts of the command buffer to execute so we can then run the CPU
code. For exmaple, we need to wait before executing a query resets
just in case the GPU is using them, and we have to do a CPU wait wait
for previous GPU jobs to complete before copying query results if the
user has asked us to do that. In the future, we may want to have
submission thread instead so we don't block the main thread in these
scenarios.
Because we now need to execute some tasks in the CPU as part of a
command buffer, this introduces the concept of job types, there is one
type for all GPU jobs, and then we have one type for each kind of job
that needs to execute in the CPU. CPU jobs are executed by the queue
in order just like GPU jobs, only that they are exclusively CPU tasks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It is valid to submit with an empty list ofcommand buffers, however,
we still need to wait on the pWaitSemaphores provided and only signal
the pSignalSemaphores and fence once we have finished waiting on them
to honor the semantics of the submission.
Because waiting and signaling happens in the kernel, the easiest way
to do this is to submit a trivial no-op job to the GPU. To do this,
we need to refactor some of our code so that code that might have been
operating on a command buffer starts operating on a job instead, so we
can resuse most of our infrastructure to create the no-op job.
Additionally, because no-op jobs are created internally by the driver,
we are responsible for destroying them too. For this, we bind a fence
to each no-op job we submit and we test for completion of in-flight
no-op jobs (and destory them if completed) every time vkQueueSubmit
is called.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not doing this and that could lead to the kernel refusing the
job if this happened to be gargabe. Make it zero, meaning that we don't
want to keep our bin jobs waiting for anything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is only used when doing a clif/cle dump, but makes it far easier
to understand.
Most names are the same that the ones used at v3d (CL, tile_alloc,
TDSA), except those that on v3d were labelled as "resource", as right
now we don't have a resource uploader that englobes different
things. In fact, the good thing of not having that uploader is that
individual bos has a more accurate description of their purpose.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we make progress towards more complex submissions we will need to split
our command buffers into smaller executable units (jobs) that we can
submit indepdently to the kernel. This will be required to implement
pipeline barriers, split subpasses that have depedencies on previous
subpasses, split render passes that use more than 4 render targets, etc.
For now we keep things simple and we only keep one job as current
recording target in the command buffer, and we generate a new one
with every subpass or with any commands we see outside of a render pass
(only vkCopyImageToBuffer for now). In the future we probably want to
optimize this by merging subpasses into the same job when possible,
etc.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Add support for V3D_DEBUG=clif. Useful to compare clif_dumps from
vulkan small-programs and the equivalent opengl ones.
As we are here we expand clif_dump_packet wrapper to use
v3d42_clif_dump_packet if needed, as the vulkan driver would use that
packet version.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Values still doesn't take into account having vertex elements data,
but keeps some of that half-done code in comments. It would be better
to do that when we get an example using it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>