When sparse images are being used, applications normally use
non-opaque binds and leave opaque binds just for the miptail part.
Since miptails are always at the end of the array layers, processing
the opaque binds after processing the non-opaque binds increases the
chance that anv_sparse_submission_add() will join the miptail bind
operation with the last non-opaque opreration, especially if the user
is trying to bind the last few non-miptail levels and the miptail in
the same vkQueueBindSparse opration.
In the real world this case does happen, so we're able to save a bind
operation every once in a while in Steam games.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Move waiting/signaling to the backends so we can fix each backend
separately.
As I write this patch the vm_bind backend is back to using synchronous
vm_binds so we can't pass syncobjs to the synchronous vm_bind ioctl
anymore. We'll need more discussions and possibly some rework before
we go back to asynchronous vm_binds. This commit should allow us to
fix the TR-TT backend in the next commit and leave vm_bind for later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Don't pass it as a parameter when it's also part of a struct. Have to
touch 9 files just for that...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
If we're going to move syncobj waiting/signaling down to the backend
we're going to need a queue to signal as lost in case those operations
fail.
In some places of the stack we don't have a queue available, such as
when we're creating or destroying resources. For those, for vm_bind
cases we don't use the queue for anything so passing it as NULL is
fine. For TR-TT we are already using device->trtt.queue.
For TR-TT specifically this also means we're going to start using the
actual queues from the call stack instead of trtt->queue, but that
shouldn't make any difference since we only ever have one queue.
Still, this is more technigally correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Our ultimate goal is to have the backend functions deal with the wait
and signal syncobjs instead of waiting for them on the CPU inside
anv_queue_submit_sparse_bind_locked(). For that, we'll need waits and
signals parameters to be passed all the way to the backend functions
that actually make the submission, and this is what this patch does,
through struct anv_sparse_submission.
This patch just deals with passing the parameters to the functions,
nothing is using the new variables yet. There should be no functional
changes here. The goal here is to make code review easier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Currently, a single vkQueueBindSparse() call may lead to multiple bind
calls in the backend (either a vm_bind ioctl or a command submission
that updates the TR-TT page tables). These operations can be quite
slow so it's better for us if we try to emit as few of them as
possible.
On top of that, this gives our "just extend the last operation's size
if possible" code a little more chance to act and save us real time.
Our ultimate goal here is to also pass submit->waits and
submit->signals to the backend so we can avoid doing CPU waits, so
having a single call to the backend helps simplify things a little
too, and we just created the structure to carry these extra pointers
forward.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Having it helped us printing the resource offset, which made debugging
some situations easier. The problem is that we want to rework the code
a little bit and we won't have a 'sparse' struct anymore to pass
around. Since it's all debug code drop it for now so it doesn't get in
the way of the rework. If we need it later we can find a way to add it
back, or we find another way to print the value.
Drive-by drop the DEBUG_SPARSE check that's already in the caller.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Same as the L1 case, but this one deals with 64bit entry addresses and
pte addresses.
Consecutive L3/L2 writes are much rarer than L1 writes since they
require some pretty big buffers, but we can still those cases in the
wild. I just don't think any change will be noticeable though.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
If the addresses are sequential, we can emit only a single
MI_STORE_DATA_IMM instruction. This is a very common case, it should
save us some space: 4 bytes per extra_write.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
When using vm_bind (not TR-TT), in practice sparse addresses will be
allocated from the high_heap, so narrow down the available
sparseAddressSpaceSize from the whole address space to the part we can
actually allocate things from.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
TR-TT is a hardware feature supported by both i915.ko and xe.ko, which
means we can now finally have Sparse Resources on i915.ko and we also
have 2 options for xe.ko (and whatever is the best should be the
default).
In this patch we use batch commands to write the page tables and
forever keep them in device memory. We maintain a mirror of both the
L3 and and L2 tables because that helps us never having to read the
tables that are in device memory.
We still have some things to improve, but with this commit, workloads
that didn't work at all due to the lack of sparse resources should
at least run.
This is still all disabled by default in i915.ko, you can turn it on
by exporting ANV_SPARSE=1 before launching the applications. For
xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Yet another bit of branchiness we should tame. 99% of the time, sources are not
for if's, so we shouldn't need to do the extra checking to handle that 1%.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
We don't check the sizes for ALU srcs, which is the hot path here, so split out
that simplified version for ALU instructions to use, while deriving a sized
version for other kinds of instructions.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
There's no more nir_register.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
We have dominance validation elsewhere in the file.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Instead, check it at the call sites when actually required (basically just
intrinsics), reducing the branching required when not (ALU validation, the
hottest of hot paths for CI).
IMHO this is more obvious too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
No apparent performance difference, but documents the intention.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Nothing should ever be reading them, they logically do not exist. So there's no
point validating them, especially when the validation in question is so useless
(just checking the bit width, without any semantic awareness). Yet now that we
support vec16, this loop is quite hot even on scalar ISAs, and rather
pointlessly so. Just remove it and bring the ALU src validation complexity to
O(# of channels in source) instead of O(max # of channels in NIR).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
It doesn't inline and so is about 1% of M1 CTS time. Expand out the definition
and simplify the logic. Honestly, I think this is clearer too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Profiling showed that maintaining this ssa_srcs set consumes ~3% of CTS time
with a debugoptimized build. Unfortunately, we really do benefit from getting
this coverage in CI. So rather than remove the validation, let's optimize the
data structure used so we can keep the coverage at a fraction of the cost.
The expensive piece is the pointer set, which is backed by a relatively
expensive hash table. It would be much cheaper to use an invasive set instead,
with a single "present" bit. We don't want to bloat nir_src for this, however
there's an easy solution: use a tagged pointer to steal a bit in the nir_src for
the job. We untag everything at the end of validation (and this meta-invariant
is asserted with an auxiliary counter), so while we mutate the IR while
validating, the mutations do not escape nir_validate.
We tag the parent pointer and not the def pointer, because it is dramatically
less used and therefore has far fewer disrupted call sites.
The M1 job is improved from 3:03 to 2:55 of deqp-runner reported time, which is
excellent.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Deduplicates the "get # of channels" logic which was the same between the
helpers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Currently, multiple version scripts unconditionally use symbols from gallium
drivers that may not be enabled, which causes linking to fail with
--no-undefined-version (as is default in LLD 17), and can cause issues
with LTO. This commit adds logic to generate version scripts based on the
enabled gallium drivers, ensuring only defined symbols are used.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8003
Signed-off-by: Violet Purcell <vimproved@inventati.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25551>
The driver already need to track this WA for blorp. We can completely
remove any blorp code dealing with this and instead have the flush
required by the workaround be combined with potential other flushes
the driver already has to insert before blorp operations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26247>
ANV_CMD_DIRTY_PIPELINE also includes reprogramming of
3DSTATE_PUSH_CONSTANT_ALLOC_* instructions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 50f6903bd9 ("anv: add new low level emission & dirty state tracking")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26247>
This will make the job-frontend split easier, and it also makes sense
to update image attributes here for compute.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>
The dirty state tracking should allow us to conditionally re-emit the
vertex attribute and attribute buffer arrays if something relevant
changed.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>
We are unpacking a CALL instead of JUMP instruction. It doesn't
make a difference because the instruction layout is the same,
but let's fix that for the sake of correctness.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>
Some staging registers might be NULL, either because some arguments are
optional, or because the command stream is malformed. In any case, being
robust to such situations it probably a good thing.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>
Useful to quickly spot which stage of the pipeline is using a resource
table.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>
Bit 8 in the descriptor is not encoding the primary/secondary shader
information. It's a per shader-type field.
For fragment shader descriptors, it describes what the coverage bitmask
contains for per-sample execution:
- DX-style: bits for all covered samples are set
- GL-style: only the bit for the sample the shader is executed on is set
For vertex shader, it encodes the warp limit we want to apply to the
shader execution.
Patch the existing code to match the new semantics.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>
Recently had to debug an unbalanced ref/unref situation in some
code I added, and having an assert(refcnt > 0) in
panfrost_bo_unreference() would have made this simpler, so let's
add one.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>