If we have a semaphore wait the job cannot be started before the semaphore
has been signaled, so we need to wait before starting the binning stage.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.binary_semaphore.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with finishing the current job in the
presence of a pipeline barrier and relying on the RCL serialization,
but of course this is not always enough.
This patch addresses synchronization across different GPU units
(i.e. draw indirect after compute), as well as cases where we need to
sync before binning.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.barrier.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The hardware can't do this, so we need to record a CPU job that will
map the indirect buffer at queue submission time, read the dispatch
parameters and then submit a regular dispatch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Both the API user and the driver may attempt to map a BO, possibly
only partially and using different ranges. This is a problem because
we only have a single map per BO. Fix this by making sure that when
a BO is mapped, we always map its entire range. This way if a BO
has been mapped before, we know that map is still valid no matter the
region we need to access now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The blit shader path for buffer to image copies is pretty bad,
since it needs to produce a tiled image from the linear buffer
prior to emitting the blit copy.
This patch adds a new preferential path where we implement the
copy using the CPU, similar to what the GL driver does for
texture uploads. This makes vkQuake2 at least 4x faster when
dynamic lights are enabled (which triggers dynamic texture
updates).
We also tested a GPU path where we use a shader that takes the
linear buffer as a UBO and copies directly from it. This also
shows a clear performance gain, but still worse than the CPU
implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asserting on them makes it easier to identify applications and tests that
try to use unimplemented features.
Also, there are some APIs that relate to optional features we don't
(or can't) support, such as sparse, so for these we just provide
the trivial implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are basically two types of scenarios to consider:
- Secondary command buffers that run inside a render pass.
- Secondary command buffers that run outside a render pass.
For the former we want to record their commands into a binning command
list that we can branch to when executed into a primary command
buffer. This means this kind of command buffers don't spawn new jobs,
just the default one where they record the binning commands which
won't include the frame setup, which will be provided by the primary
they will be executed in.
For the latter we don't require anything special, we just record as
many jobs as we need as usual and link that job list from the primary
job list when executed.
This handles most scenarios except:
- vkCmdWaitForEvents
- VkCmdClearAttachments
Both of these can spawn new jobs inside a render pass, which is not
what we want for secondary command buffers. We will address this is
follow-up patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This reverts a previous half-attempt at an implementation of events
using a BO to hold the event state, and provides a full
implementation. V3D doesn't have any built-in GPU functionality to
wait on any kind of events, so we need to implement this in the driver
an therefore we no longer need to use a BO for the event state.
Instead, we implement GPU waits by using a CPU job for the wait
operation that spawns a wait thread if the wait operation doesn't have
all its events signaled by the time it is processed. To implement the
semantics of the wait correctly, any jobs in the same command buffer
that come after the wait will not be emitted until the wait thread
completes.
If a submit spawns any wait threads for a command buffer we can't
signal any semaphores for it until all the wait threads complete and
we know that all the jobs for those command buffers have been
submitted. The same applies to the submit fence, if present.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
While very limited in scope, this might be the most efficient way to blit
when applicable. In fact, we might also want to use this for the image copy
commands when possible instead of the TLB.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is similar to the scenario where we have a submit without
any command buffers, even if we don't have any actual GPU work to do
we still might need to signal fences/semaphored and possibly wait on
previous jobs to finish, so we need to submit something to the kernel
to get all that done right.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The design for queries in Vulkan requires that some commands execute
in the GPU as part of a command buffer. Unfortunately, V3D doesn't
really have supprt for this, which means that we need to execute them
in the CPU but we still need to make it look as if they happened
inside the comamnd buffer from the point of view of the user, which
adds certain hassle.
The above means that in some cases we need to do CPU waits for certain
parts of the command buffer to execute so we can then run the CPU
code. For exmaple, we need to wait before executing a query resets
just in case the GPU is using them, and we have to do a CPU wait wait
for previous GPU jobs to complete before copying query results if the
user has asked us to do that. In the future, we may want to have
submission thread instead so we don't block the main thread in these
scenarios.
Because we now need to execute some tasks in the CPU as part of a
command buffer, this introduces the concept of job types, there is one
type for all GPU jobs, and then we have one type for each kind of job
that needs to execute in the CPU. CPU jobs are executed by the queue
in order just like GPU jobs, only that they are exclusively CPU tasks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It is valid to submit with an empty list ofcommand buffers, however,
we still need to wait on the pWaitSemaphores provided and only signal
the pSignalSemaphores and fence once we have finished waiting on them
to honor the semantics of the submission.
Because waiting and signaling happens in the kernel, the easiest way
to do this is to submit a trivial no-op job to the GPU. To do this,
we need to refactor some of our code so that code that might have been
operating on a command buffer starts operating on a job instead, so we
can resuse most of our infrastructure to create the no-op job.
Additionally, because no-op jobs are created internally by the driver,
we are responsible for destroying them too. For this, we bind a fence
to each no-op job we submit and we test for completion of in-flight
no-op jobs (and destory them if completed) every time vkQueueSubmit
is called.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not doing this and that could lead to the kernel refusing the
job if this happened to be gargabe. Make it zero, meaning that we don't
want to keep our bin jobs waiting for anything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is only used when doing a clif/cle dump, but makes it far easier
to understand.
Most names are the same that the ones used at v3d (CL, tile_alloc,
TDSA), except those that on v3d were labelled as "resource", as right
now we don't have a resource uploader that englobes different
things. In fact, the good thing of not having that uploader is that
individual bos has a more accurate description of their purpose.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we make progress towards more complex submissions we will need to split
our command buffers into smaller executable units (jobs) that we can
submit indepdently to the kernel. This will be required to implement
pipeline barriers, split subpasses that have depedencies on previous
subpasses, split render passes that use more than 4 render targets, etc.
For now we keep things simple and we only keep one job as current
recording target in the command buffer, and we generate a new one
with every subpass or with any commands we see outside of a render pass
(only vkCopyImageToBuffer for now). In the future we probably want to
optimize this by merging subpasses into the same job when possible,
etc.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Add support for V3D_DEBUG=clif. Useful to compare clif_dumps from
vulkan small-programs and the equivalent opengl ones.
As we are here we expand clif_dump_packet wrapper to use
v3d42_clif_dump_packet if needed, as the vulkan driver would use that
packet version.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Values still doesn't take into account having vertex elements data,
but keeps some of that half-done code in comments. It would be better
to do that when we get an example using it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>