It could be confusing that a newer platform named with a smaller
number than a half-generation of an older platform like 'gfx20' and
'gfx75' in xml files.
Down the road, it can be a little worse once we pass something like
'gfx40' when there is already a gfx45.xml for the oldest platform.
Unify naming xml files with verx10 numbers to resolve the issue.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31943>
Pretty big change... Sorry for that.
I can't exactly remember why I created 2 heaps. I think it's because I
mistakenly thought the samplers in the binding sampler pointers needed
to be indexed from the binding table. But that's not the case, they
just need to be in the dynamic state heap.
In the future, this change will allow to also allocate buffers for
push constant data in the newly created dynamic_visible_pool which
will be useful on < Gfx12.0 where this is the only place push constant
data can live for compute shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30047>
With this change, the engine initialization batches are build and
submitted at vkCreateDevice() but the function doesn't wait for them
to complete. Instead we wait at vkDestroyDevice() or whenever another
submission happens on the queue, we check whether the initialization
batch has completed (without waiting) and free it if completed.
Seems to be about 25% reduction time of vkCreateDevice()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
We can remove a bunch of TRTT specific code from the backends as well
as manual submission tracking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
We want to make this more generic so that it can be reused for device
initialization as well as TRTT submissions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
The debug identifier is put into the captured buffers for error
capture. This helps us figure out what version of the driver people
are running when encountering a GPU hang. This identifier has the
git-sha1 + driver name.
libintel_dev is also a dependency of the compiler so any change to the
git-sha1 also triggers recompile which we want to avoid.
This changes moves the debug identifier to src/intel/common which
drivers already depend on, so the compiler is not affected anymore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11136
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29128>
The device->using_sparse variable is only used at cmd_buffer_barrier()
to decide if we need to apply the heavier-weight flushes that are only
applicable to sparse resources. The big problem here is that we need
to apply the flushes to the non-image and non-buffer memory barriers,
so we were trying to limit those only to applications that ever submit
a sparse resource to the sparse queue.
The reason why we were applying this only to devices that ever
submitted sparse resources is that dxvk games have this thing where
during startup they create and then delete tiny sparse resources, so
switching device->using_sparse to true at resource creation would make
basically every dxvk game start applying the heavier-weight
workaround.
The problem with all that is that even if an application creates a
sparse resource but doesn't ever bind them, the resource should still
behave as an unbound resource (because they are bound with a NULL
bind), so the flushes affecting them should happen. This case is
exercised by vkd3d-proton/test_buffer_feedback_instructions_sm51.
In order to satisfy all the above cases and only really apply the
heavier-weight flushes to applications actually using sparse
resources, let's just count the number of sparse resources that
currently exist and then apply the workaround only if it's not zero.
That covers the dxvk case since dxvk deletes the resources as soon as
they create, so num_sparse_resources goes back to 0.
Testcase: vkd3d-proton/test_buffer_feedback_instructions_sm51
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10960
Fixes: 6368c1445f ("anv/sparse: add the initial code for Sparse Resources")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
Having just one place to check the Sparse type is less error prone.
For example in i915 it was always setting sparse_uses_trtt to true
even if running in gfx 9 that don't support sparse.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>
I need to add some sparse-related checks that require having the
anv_buffer and anv_image, and putting them directly inside
anv_queue_submit_sparse_bind_locked() doesn't feel like the right
thing to do. Here we change the interface so now we have
anv_sparse_bind_buffer() and anv_sparse_bind_image_opaque() as the
main interface into anv_sparse.c, so they both can call the lower
level anv_sparse_bind_resource_memory() function.
In the next patch we'll be adding changing the code of the functions
we just created, justifying their addition.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26410>
Here dropping is_companion_rcs_cmd_buffer parameter of a few functions
that don't need this information, it just need the right
anv_cmd_buffer for each case.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26882>
Got a error state on DG2 with a jump to secondary. The secondary is
empty and padded with MI_NOOPs to workaround the CS prefetching.
According to the error state, the return jump address from the
secondary to the primary is 0x0. The ACTHD register value is 0x10, so
it seems that the command streamer indeed jumped to 0x0 and hanged on
a few dwords after that.
The return address should have been set edited by a previous
MI_STORE_DATA_IMM instruction. So it appears it did not complete in
time for the command stream to catch it. On Gfx12+ this can happend if
we do not set ForceWriteCompletionCheck.
This change also takes the opportunity to remove the padding MI_NOOPs
at the end of secondaries on Gfx12+ by using disabling the prefetching
just before jumping into secondaries and reenabling it at the
beginning of each secondary.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26665>
Batch bos are always allocated with ANV_BO_ALLOC_HOST_CACHED_COHERENT
so there is no need to do cflush calls.
But if we ever decide to change that anv_bo_needs_host_cache_flush()
will make sure cflush is called.
Outside of batch bos, this patch is also removing the
intel_flush_range() call from anv_QueuePresentKHR because
device->debug_frame_desc is offset of workaround_bo that is also
allocated as ANV_BO_ALLOC_HOST_COHERENT.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26457>
We need to wait for the batches to complete before we return the BOs
to the pool. We were previously doing this completely synchronously,
which made the code unnecessarily wait. Now we have a timeline syncobj
that signals completion of the previous BOs, so sometimes we check
where we are in the timeline and then return the BOs that we know are
unused.
This, in addition to the previous patch that made us wait for the
other syncobjs through the execbuf ioctl instead of through the CPU,
makes TR-TT batches at least an order of magnitude faster. Still, I
don't think we'll notice any changes in games's FPS as they don't bind
sparse resources that often.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
For now it just wraps the bo and size, so there's really no value to
having it. In the next commit we'll add more elements to the struct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
When sparse images are being used, applications normally use
non-opaque binds and leave opaque binds just for the miptail part.
Since miptails are always at the end of the array layers, processing
the opaque binds after processing the non-opaque binds increases the
chance that anv_sparse_submission_add() will join the miptail bind
operation with the last non-opaque opreration, especially if the user
is trying to bind the last few non-miptail levels and the miptail in
the same vkQueueBindSparse opration.
In the real world this case does happen, so we're able to save a bind
operation every once in a while in Steam games.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Move waiting/signaling to the backends so we can fix each backend
separately.
As I write this patch the vm_bind backend is back to using synchronous
vm_binds so we can't pass syncobjs to the synchronous vm_bind ioctl
anymore. We'll need more discussions and possibly some rework before
we go back to asynchronous vm_binds. This commit should allow us to
fix the TR-TT backend in the next commit and leave vm_bind for later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Don't pass it as a parameter when it's also part of a struct. Have to
touch 9 files just for that...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
If we're going to move syncobj waiting/signaling down to the backend
we're going to need a queue to signal as lost in case those operations
fail.
In some places of the stack we don't have a queue available, such as
when we're creating or destroying resources. For those, for vm_bind
cases we don't use the queue for anything so passing it as NULL is
fine. For TR-TT we are already using device->trtt.queue.
For TR-TT specifically this also means we're going to start using the
actual queues from the call stack instead of trtt->queue, but that
shouldn't make any difference since we only ever have one queue.
Still, this is more technigally correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Our ultimate goal is to have the backend functions deal with the wait
and signal syncobjs instead of waiting for them on the CPU inside
anv_queue_submit_sparse_bind_locked(). For that, we'll need waits and
signals parameters to be passed all the way to the backend functions
that actually make the submission, and this is what this patch does,
through struct anv_sparse_submission.
This patch just deals with passing the parameters to the functions,
nothing is using the new variables yet. There should be no functional
changes here. The goal here is to make code review easier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Currently, a single vkQueueBindSparse() call may lead to multiple bind
calls in the backend (either a vm_bind ioctl or a command submission
that updates the TR-TT page tables). These operations can be quite
slow so it's better for us if we try to emit as few of them as
possible.
On top of that, this gives our "just extend the last operation's size
if possible" code a little more chance to act and save us real time.
Our ultimate goal here is to also pass submit->waits and
submit->signals to the backend so we can avoid doing CPU waits, so
having a single call to the backend helps simplify things a little
too, and we just created the structure to carry these extra pointers
forward.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
TR-TT is a hardware feature supported by both i915.ko and xe.ko, which
means we can now finally have Sparse Resources on i915.ko and we also
have 2 options for xe.ko (and whatever is the best should be the
default).
In this patch we use batch commands to write the page tables and
forever keep them in device memory. We maintain a mirror of both the
L3 and and L2 tables because that helps us never having to read the
tables that are in device memory.
We still have some things to improve, but with this commit, workloads
that didn't work at all due to the lack of sparse resources should
at least run.
This is still all disabled by default in i915.ko, you can turn it on
by exporting ANV_SPARSE=1 before launching the applications. For
xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
In case we run out of space, all the parts of the driver that rely on
this should deal with failure. The helpers will set the batch in error
state so that it cannot be submitted by the application.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25955>
When the number of draw calls is very large, instead of allocating
large amounts of batch buffer space for the draws, use a ring buffer
and process the draw calls by batches.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8645
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25361>
Just tyding things a bit since we're about to add more.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25361>
I don't know this fixes anything but I noticed the generated draws
jump into addresses slightly different from CPU generated jumps.
After checking the genxml, I noticed MI_BATCH_BUFFER_START "Batch
Buffer Start Address" fields have different sizes in Gfx8 & Gfx9+.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25705>
Private memory for BVH builds doesn't need to be mapped on the host,
it's purely for use by the GPU. So it can be put into a different
buffer pool that can put into VRAM only buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25570>
Blitter command streamer supports MI_FLUSH_DW command so make sure we
don't end up emitting pipe control with CS stall and also handle the end
of pipe timestamp with MI_FLUSH_DW command.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18325>
This pollutes stderr a lot, but I've used it countless times while
developing this code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
This giant patch implements a huge chunk of the Vulkan Sparse
Resources API. I previously had this as a nice series of many smaller
patches that evolved as the xe.ko added more features, but once I was
asked to squash some of the major reworks I realized I wouldn't be
able easily rewrite history, so I just squased basically the whole
series into a giant patch. I may end up splitting this again later if
I find a way to properly do it.
If we want to support the DX12 API through vkd3d we need to support
part of the the Sparse Resources API. If we don't, a bunch of Steam
games won't work.
For now we only support the xe.ko backend, but the vast majority of
the code is KMD-independent and so an i915.ko implementation would use
most of what's here, just extending the part that binds and unbinds
memory.
v2+: There's no way to sanely track the version history of this patch
in this commit message. Please refer to Gitlab.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
The next patch is going to introduce some locking that needs to happen
before the submission to the backend.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24744>
We need to synchronize main (CCS/BCS) and companion rcs batch, so let's
create an empty batch and make both the batches (CCS/BCS) and companion
RCS batch wait on empty sync batch and signal the fence.
Reason to execute the empty batch is we need to make sure the companion
RCS batch finish as soon as the CCS/BCS batch finish. Preemption could
prevent the companion RCS batch execution and we might end up destroying
the CCS/BCS batch before companion RCS finishes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
Add all the wait fences from the main (CCS/BCS) command buffer to the
companion RCS command buffer so that the companion RCS batch starts at
the same time as the main (CCS/BCS) batch.
v2:
- Drop unncessary flush (Jose)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
The goal of this change it to move away from a single batch buffer
containing all kind of pipeline instructions to a list of instructions
we can emit separately.
We will later implement pipeline diffing and finer state tracking that
will allow fewer instructions to be emitted.
This changes the following things :
* instead of having a batch & partially packed instructions, move
everything into the batch
* add a set of pointer in the batch that allows us to point to each
instruction (almost... we group some like URB instructions,
etc...).
At pipeline emission time, we just go through all of those pointers
and emit the instruction into the batch. No additional packing is
involved.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24536>