Here dropping is_companion_rcs_cmd_buffer parameter of a few functions
that don't need this information, it just need the right
anv_cmd_buffer for each case.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26882>
Got a error state on DG2 with a jump to secondary. The secondary is
empty and padded with MI_NOOPs to workaround the CS prefetching.
According to the error state, the return jump address from the
secondary to the primary is 0x0. The ACTHD register value is 0x10, so
it seems that the command streamer indeed jumped to 0x0 and hanged on
a few dwords after that.
The return address should have been set edited by a previous
MI_STORE_DATA_IMM instruction. So it appears it did not complete in
time for the command stream to catch it. On Gfx12+ this can happend if
we do not set ForceWriteCompletionCheck.
This change also takes the opportunity to remove the padding MI_NOOPs
at the end of secondaries on Gfx12+ by using disabling the prefetching
just before jumping into secondaries and reenabling it at the
beginning of each secondary.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26665>
Batch bos are always allocated with ANV_BO_ALLOC_HOST_CACHED_COHERENT
so there is no need to do cflush calls.
But if we ever decide to change that anv_bo_needs_host_cache_flush()
will make sure cflush is called.
Outside of batch bos, this patch is also removing the
intel_flush_range() call from anv_QueuePresentKHR because
device->debug_frame_desc is offset of workaround_bo that is also
allocated as ANV_BO_ALLOC_HOST_COHERENT.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26457>
We need to wait for the batches to complete before we return the BOs
to the pool. We were previously doing this completely synchronously,
which made the code unnecessarily wait. Now we have a timeline syncobj
that signals completion of the previous BOs, so sometimes we check
where we are in the timeline and then return the BOs that we know are
unused.
This, in addition to the previous patch that made us wait for the
other syncobjs through the execbuf ioctl instead of through the CPU,
makes TR-TT batches at least an order of magnitude faster. Still, I
don't think we'll notice any changes in games's FPS as they don't bind
sparse resources that often.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
For now it just wraps the bo and size, so there's really no value to
having it. In the next commit we'll add more elements to the struct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
When sparse images are being used, applications normally use
non-opaque binds and leave opaque binds just for the miptail part.
Since miptails are always at the end of the array layers, processing
the opaque binds after processing the non-opaque binds increases the
chance that anv_sparse_submission_add() will join the miptail bind
operation with the last non-opaque opreration, especially if the user
is trying to bind the last few non-miptail levels and the miptail in
the same vkQueueBindSparse opration.
In the real world this case does happen, so we're able to save a bind
operation every once in a while in Steam games.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Move waiting/signaling to the backends so we can fix each backend
separately.
As I write this patch the vm_bind backend is back to using synchronous
vm_binds so we can't pass syncobjs to the synchronous vm_bind ioctl
anymore. We'll need more discussions and possibly some rework before
we go back to asynchronous vm_binds. This commit should allow us to
fix the TR-TT backend in the next commit and leave vm_bind for later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Don't pass it as a parameter when it's also part of a struct. Have to
touch 9 files just for that...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
If we're going to move syncobj waiting/signaling down to the backend
we're going to need a queue to signal as lost in case those operations
fail.
In some places of the stack we don't have a queue available, such as
when we're creating or destroying resources. For those, for vm_bind
cases we don't use the queue for anything so passing it as NULL is
fine. For TR-TT we are already using device->trtt.queue.
For TR-TT specifically this also means we're going to start using the
actual queues from the call stack instead of trtt->queue, but that
shouldn't make any difference since we only ever have one queue.
Still, this is more technigally correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Our ultimate goal is to have the backend functions deal with the wait
and signal syncobjs instead of waiting for them on the CPU inside
anv_queue_submit_sparse_bind_locked(). For that, we'll need waits and
signals parameters to be passed all the way to the backend functions
that actually make the submission, and this is what this patch does,
through struct anv_sparse_submission.
This patch just deals with passing the parameters to the functions,
nothing is using the new variables yet. There should be no functional
changes here. The goal here is to make code review easier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Currently, a single vkQueueBindSparse() call may lead to multiple bind
calls in the backend (either a vm_bind ioctl or a command submission
that updates the TR-TT page tables). These operations can be quite
slow so it's better for us if we try to emit as few of them as
possible.
On top of that, this gives our "just extend the last operation's size
if possible" code a little more chance to act and save us real time.
Our ultimate goal here is to also pass submit->waits and
submit->signals to the backend so we can avoid doing CPU waits, so
having a single call to the backend helps simplify things a little
too, and we just created the structure to carry these extra pointers
forward.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
TR-TT is a hardware feature supported by both i915.ko and xe.ko, which
means we can now finally have Sparse Resources on i915.ko and we also
have 2 options for xe.ko (and whatever is the best should be the
default).
In this patch we use batch commands to write the page tables and
forever keep them in device memory. We maintain a mirror of both the
L3 and and L2 tables because that helps us never having to read the
tables that are in device memory.
We still have some things to improve, but with this commit, workloads
that didn't work at all due to the lack of sparse resources should
at least run.
This is still all disabled by default in i915.ko, you can turn it on
by exporting ANV_SPARSE=1 before launching the applications. For
xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
In case we run out of space, all the parts of the driver that rely on
this should deal with failure. The helpers will set the batch in error
state so that it cannot be submitted by the application.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
When the number of draw calls is very large, instead of allocating
large amounts of batch buffer space for the draws, use a ring buffer
and process the draw calls by batches.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8645
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25361>
Just tyding things a bit since we're about to add more.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25361>
I don't know this fixes anything but I noticed the generated draws
jump into addresses slightly different from CPU generated jumps.
After checking the genxml, I noticed MI_BATCH_BUFFER_START "Batch
Buffer Start Address" fields have different sizes in Gfx8 & Gfx9+.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25705>
Private memory for BVH builds doesn't need to be mapped on the host,
it's purely for use by the GPU. So it can be put into a different
buffer pool that can put into VRAM only buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25570>
Blitter command streamer supports MI_FLUSH_DW command so make sure we
don't end up emitting pipe control with CS stall and also handle the end
of pipe timestamp with MI_FLUSH_DW command.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18325>
This pollutes stderr a lot, but I've used it countless times while
developing this code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
This giant patch implements a huge chunk of the Vulkan Sparse
Resources API. I previously had this as a nice series of many smaller
patches that evolved as the xe.ko added more features, but once I was
asked to squash some of the major reworks I realized I wouldn't be
able easily rewrite history, so I just squased basically the whole
series into a giant patch. I may end up splitting this again later if
I find a way to properly do it.
If we want to support the DX12 API through vkd3d we need to support
part of the the Sparse Resources API. If we don't, a bunch of Steam
games won't work.
For now we only support the xe.ko backend, but the vast majority of
the code is KMD-independent and so an i915.ko implementation would use
most of what's here, just extending the part that binds and unbinds
memory.
v2+: There's no way to sanely track the version history of this patch
in this commit message. Please refer to Gitlab.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
The next patch is going to introduce some locking that needs to happen
before the submission to the backend.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24744>
We need to synchronize main (CCS/BCS) and companion rcs batch, so let's
create an empty batch and make both the batches (CCS/BCS) and companion
RCS batch wait on empty sync batch and signal the fence.
Reason to execute the empty batch is we need to make sure the companion
RCS batch finish as soon as the CCS/BCS batch finish. Preemption could
prevent the companion RCS batch execution and we might end up destroying
the CCS/BCS batch before companion RCS finishes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
Add all the wait fences from the main (CCS/BCS) command buffer to the
companion RCS command buffer so that the companion RCS batch starts at
the same time as the main (CCS/BCS) batch.
v2:
- Drop unncessary flush (Jose)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
The goal of this change it to move away from a single batch buffer
containing all kind of pipeline instructions to a list of instructions
we can emit separately.
We will later implement pipeline diffing and finer state tracking that
will allow fewer instructions to be emitted.
This changes the following things :
* instead of having a batch & partially packed instructions, move
everything into the batch
* add a set of pointer in the batch that allows us to point to each
instruction (almost... we group some like URB instructions,
etc...).
At pipeline emission time, we just go through all of those pointers
and emit the instruction into the batch. No additional packing is
involved.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>
This was left unused after 624ac55721 ("anv: move total_batch_size to
anv_batch"). We're now going to use it to store the total amount of
commands written in a command buffer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24628>
This name is confusing, the real thing it represents is the allocated
amount of batch space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24628>
Xe KMD don't need relocs, so calling a nop function and avoiding the
CPU cycles and memory waste with reloc.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24411>
Mismatch allocator could cause bad things, so better set the allocator
on anv_reloc_list_init() and use it in every reloc function.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24411>
Only genX_cmd_buffer.c makes use of READ_ONCE() but that file also
defines it so it can be removed from anv_batch_chain.c.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24411>
There are cases where we never need a binding table block, for example
compute only command buffers.
This has also the nice effect of not having
dEQP-VK.api.object_management.* tests allocate 1Gb of binding tables
which are staying around forever after you run those tests.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8806
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23079>
This also need to be executed in Xe kmd, so moving it to a function.
No changes in behavior intended here.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22171>
The decoder context needs to know what engine it's associated with.
Nowadays, we have render, compute, blitter, even video engines being
used from the same driver. Rather than trying to have a single decoder
and thwacking the engine field back and forth between calls, we make
one per queue family, and stash a pointer in anv_queue for easy access.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21149>
Submitting a batch with the first command buffer with the simultaneous
bit set followed by a command buffer without the bit set gets past the
check and triggers this assert attempting to chain them:
../src/intel/vulkan/anv_batch_chain.c:1147: anv_cmd_buffer_chain_command_buffers: Assertion `num_cmd_buffers == 1' failed.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21056>
u_vector_add() don't keep the returned pointers valid.
After the initial size allocated in u_vector_init() is reached it will
allocate a bigger buffer and copy data from older buffer to the new
one and free the old buffer, making all the previous pointers returned
by u_vector_add() invalid and crashing the application when trying to
access it.
This is reproduced when running
dEQP-VK.synchronization.signal_order.timeline_semaphore.* in DG2 SKUs
that has 4 CCS engines, INTEL_COMPUTE_CLASS=1 is set and of course
perfetto build is enabled.
To fix this issue here I'm moving the storage/allocation of
struct intel_ds_queue to struct anv_queue/iris_batch and using
struct list_head to maintain a chain of intel_ds_queue of the
intel_ds_device.
This allows us to append or remove queues dynamically in future if
necessary.
Fixes: e760c5b37b ("anv: add perfetto source")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20977>
There is no change in behavior here.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20428>
This functions will be used by i915 and Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20428>