We jump out of the loops whenever result is not VK_SUCCESS, there is
no need to check for it there. I guess I missed this detail in the
most recent rework for this function.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31698>
When the VkBuffer is of size 2^32 (which matches maxBufferSize), we
have vm_bind->size set to 2^32, which is fine because it fits in an
uint64_t. What is not fine is the 'i' variable being size_t, because
on 32bit systems it will loop forever since it will always be smaller
than 2^32.
Credits to Iván for not only reporting it, but also coming up with the
solution at the same time as I did, then testing it.
Cc: mesa-stable
Reported-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31698>
The code that initializes each queue got big enough that the
repetitive error handling is getting ugly and it could benefit from
being on its own function.
v2: Rebase, try to improve the comments.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
Since the L2 bug fix we've been overestimating l3l2_binds by a lot in
most of the cases: almost every single call to anv_sparse_bind_trtt
ends up using either 0 or 1 elements for l3l2_binds, with occasionally
something using 512 or more. By switching to util_dynarray we can
guarantee the best of every case:
- l1_binds will remain a stack array for the vast majority of the
calls
- even more than before, since STACK_ARRAY was limited to 8
elements and now we do 32
- l1 will be properly dimensioned without the need for reallocs
- l3l2_binds will be completely empty most of the times and only
trigger allocations when necessary
Here's the top 10 most common results of anv_sparse_bind_trtt() for a
trace of Assassin's Creed: Valhalla. The first column is how many
times we had that case while running the trace. After this patch, all
these cases will proceed without any memory allocations.
168 trtt_binds: num_vm_binds:04 l3l2:0000 l1:0004
344 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0004
420 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0012
422 trtt_binds: num_vm_binds:04 l3l2:0000 l1:0008
479 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0024
560 trtt_binds: num_vm_binds:03 l3l2:0000 l1:0003
1005 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0002
1024 trtt_binds: num_vm_binds:02 l3l2:0000 l1:0004
2145 trtt_binds: num_vm_binds:02 l3l2:0000 l1:0002
3735 trtt_binds: num_vm_binds:01 l3l2:0000 l1:0001
Only 70 out of total 11340 calls to anv_sparse_bind_trtt() contained
l3l2 elements.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
We use 2MB page table BOs, as defined by ANV_TRTT_PAGE_TABLE_BO_SIZE.
Each BO is enough to hold 512 pages, since each one has 4096 bytes.
Each L1 page can fit 1024 entries of 64kb size, which means our 512
pages should be able to fit a little less than 32gb of sparse resource
memory, since we also need some L2 pages and an L3 page. I don't see
any real world application using more than a single BO.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
Move it past the (n_l3l2_binds == 0 && n_l1_binds == 0) check so we
don't end up trying to do garbage collection more often than we submit
batches.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
Otherwise code such as anv_sparse_trtt_garbage_collect_batches() may
end up stuck waiting forever on a timeline of a submission that
failed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
The application can submit bind operations where it simply resets
state that is already in our page tables, so there's nothing to do.
Before commit 7da5b1caef ("anv: move trtt submissions over to the
anv_async_submit") we would simply return and not submit any batches
when this happened, but the commit reorganized things in a way where
we started submitting empty batches instead.
Fix this by simply jumping out when we detect this case. Because of
this, rename the "error" labels to "out" as they can now happen on a
happy case.
It should be noted that an alternative to this implementation would be
to move all the handling of 'submit' to after the n_lX_binds check,
but this would put all the initialization inside the trtt->mutex,
creating extra contention even when we have stuff to bind. Since the
"there's nothing to bind" check is now rare (after we stopped doing
NULL binds during resource creation), it is probably better to reduce
lock contention in the common case at the expense of a little more CPU
in the rare case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
We're missing a check for 'result' in the middle of a loop and we have
an unnecessary check for 'result' after the loop.
Fixes: 7da5b1caef ("anv: move trtt submissions over to the anv_async_submit")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
There is a lot that can go wrong during initialization after we assign
trtt->l3_addr, and we use its value to check if trtt is initialized.
If an initialization fails after l3_addr is already assigned, the next
bind will attempt to use the leftover values from the failed
initialization attempt and will likely cause all sorts of random
errors. So when we fail, just set l3_addr back to 0, causing the next
bind to attempt to initialize everything again.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
Since everything is always NULL-bound at init and we always bind
things back to NULL in anv_free_sparse_bindings(), this means we don't
need to do NULL bindings during anv_init_sparse_bindings(), saving us
a bunch of time, espcially since we don't track L1 entries so we may
end up submitting TR-TT batches just to write zeroes on top of zeroes.
v2: Don't unnecessarily check for uses_relocs (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
- There's only one caller.
- The caller is rather small.
- We want to introduce initialization code that's not exactly queue
state and reuse the 'submit'.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
In the next commits we're going to move this out of
anv_sparse_bind_trtt() and we're also going to add more code to it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
When we create sparse resources the first thing we do is a NULL bind
on them, as the Vulkan spec mandates certain behavior even for unbound
sparse resources. We do this with the minimal effort possible: if we
can get away with marking an L2 pointer as NULL in the L3 table, we
just do it and return, instead of going all the way to creating L1
tables and marking all the final entries as NULL.
The strategy we were using had a bug that could lead to previously
created NULL entries not being marked as NULL anymore. Let's give an
example:
(before proceeding, keep in mind that a NULL entry in the L3 and L2
tables has bit 1 set, it does *not* have the value 0)
- Create a 64mb buffer that uses an entire L1 table (needs to be
properly aligned), which triggers a NULL bind.
- Our algorithm will just set the L3 entry (pointing to the L2
table) as NULL.
- Create a 64kb buffer that uses the same L2 table (but a different
L1 table).
- The NULL bind triggered won't do anything as the L2 table is
already NULL.
- Bind the first buffer to actual memory. This will end up creating
the L2 table and the L1 table. The only entry we will set in the L2
table will be the one pointing to the L1 table. All the other
values will be 0 (so they won't have neither the NULL or Invalid
bits set: access to them will lead to page faults).
- Try to use the second buffer, which is still unbound. It was
relying on the fact that its L2 table pointer was NULL, but now
it's not anymore, so the page walker will fetch the L1 entries in
the L2 table and they will all be zero instead of having the NULL
bit set.
The fix is pretty simple: whenever we create a new L2 table, set every
entry to NULL (except the one we're about to set to non-NULL). This
preserves behavior for every other NULL resource relying on the L3
entry being set to NULL.
We don't need to do this for the L1 table because its entries are
different and instead of having bits to signal NULL entries we have
a special TR-TT register that we can set that gets compared to check
if an entry is NULL, and we conveniently program it to 0: see
ANV_TRTT_L1_NULL_TILE_VAL.
I am not aware of any real workloads that are triggering this
behavior, I found this issue while investigating something else,
running a custom sparse program in our pre-silicon environment, and it
told us about the page faults.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
Valgrind doesn't seem to know that drmSyncobjQuery() writes to the
variable that we pass as 'last_value'. This gets rid of:
==6275== Conditional jump or move depends on uninitialised value(s)
==6275== at 0x5308370: anv_sparse_trtt_garbage_collect_batches (anv_sparse.c:540)
==6275== by 0x53091E2: anv_sparse_bind_trtt (anv_sparse.c:825)
==6275== by 0x5309771: anv_sparse_bind (anv_sparse.c:953)
==6275== by 0x5309A3B: anv_free_sparse_bindings (anv_sparse.c:1041)
==6275== by 0x529FF21: anv_DestroyBuffer (anv_buffer.c:248)
==6275== by 0x932ADBD: ??? (in /usr/lib/x86_64-linux-gnu/libVkLayer_khronos_validation.so)
==6275== by 0x127AA2: MyVkBuffer::~MyVkBuffer() (sparse.cpp:364)
==6275== by 0x12B2D4: MyApp::test1_trivial_sparse() (sparse.cpp:1421)
==6275== by 0x13E01A: MyApp::run_test(int) (sparse.cpp:6594)
==6275== by 0x13E3B0: main (sparse.cpp:6656)
==6275== Uninitialised value was created by a stack allocation
==6275== at 0x53082D3: anv_sparse_trtt_garbage_collect_batches (anv_sparse.c:525)
An alternative to these Valgrind macros would simply have been to
zero-intialize last_value.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31332>
In the error handling path we end up creating a vk_sync and then later
we vk_sync_wait() on it. If that wait fails somehow we'll end up calling
vk_queue_set_lost(&queue->vk, ...) which would segfault if queue is
NULL.
If we end up in this situation (no queue), return directly whatever the
backend's vm_bind function returned, propagating the error up if
necessary.
Fixes: dd5362c78a ("anv/xe: try harder when the vm_bind ioctl fails")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31048>
From all the many possible errors returned by the vm_bind ioctl, some
can actually happen in the wild when the system is under memory
pressure. Thomas Hellström pointed to us that, due to its asynchronous
nature, the vm_bind ioctl itself has to pin some memory, so if the
number of bind operations passed is too big, there is a probability
that it may run out of memory.
Previously the Kernel would return ENOMEM when this condition
happened. Since commit e8babb280b5e ("drm/xe: Convert multiple bind
ops into single job") the Kernel has started returning ENOBUFS when it
doesn't have enough memory to do what it wants but thinks we'd succeed
if we tried to do one bind operation at a time (instead of doing
multiple operations in the same ioctl), and ENOMEM in some other
situations. Still-uncommitted commit "drm/xe: Return -ENOBUFS if a
kmalloc fails which is tied to an array of binds" proposes converting
a few more ENOMEM cases no ENOBUFS.
Still, even ENOMEM situations could in theory be possible to recover
from, because if we wait some amount of time, resources that may have
been consuming memory could end up being freed by other threads or
processes, allowing the operations to succeed. So our main idea in
this patch is that we treat both ENOMEM and ENOBUFS in the same way,
so our implementation can work with any xe.ko driver regardless of
having or not having the commits mentioned above.
So in this patch, when we detect the system is under memory pressure
(i.e., the vm_bind() function returns VK_ERROR_OUT_OF_HOST_MEMORY), we
throw away our performance expectations and try to go slowly and
steady. First we wait everything we're supposed to wait (hoping that
this alone could also help to alleviate the memory pressure), and then
we synchronously bind one piece at a time (as this will ensure ENOBUFS
can't be returned), hoping that this won't cause the Kernel to try to
reserve too much memory. All this while also hoping that whatever
thing that may be eating all the memory goes away in the meantime. If
even this fails, we give up and hope the upper layer will be able to
figure out what to do.
This fixes a bunch of LNL failures and flaky tests (as LNL is our
first officially supported xe.ko platform). This can be seen in dEQP
but only if multiple tests are being run parallel. Happens in multiple
tests, some of which may include:
- dEQP-VK.sparse_resources.image_sparse_binding.2d_array.rgba8_snorm.1024_128_8
- dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16_snorm.1024_128_8
- dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16ui.512_256_6
I don't ever see these errors when running Alchemist/DG2 with xe.ko.
Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30276>
We want to use actual sparse-capable queues as the default
trtt->queue, not copy queues that may have a companion_rcs_batch.
Before this patch, if we expose more than one queue *and* the
application creates a copy queue first, we'll end up setting
trtt->queue as the copy queue, which will GPU hang when we submit the
TR-TT batches as they don't support the pipe_control commands we
issue.
The trtt->queue queue is used for binding/unbinding buffers in code
paths where there's no specific queue coming from user space, such as
when we're creating or destroying a sparse resource.
This is not a problem yet on i915.ko since we are exposing
only a single queue, and it is not a problem for xe.ko since TR-TT is
not the default there. This is also not a problem in applications
that create the render or compute queue first. We plan to expose more
queues when using TR-TT, so this would become a problem without this
patch.
None of VK-GL-CTS seems to exercise that, and none of the Steam games
I tested exercise that as well. I was able to reproduce this issue
using our internal tracing tool.
v2: New implementation that doesn't break when we only have a compute
queue (Lionel).
Fixes: 04bfe828db ("anv/sparse: allow sparse resouces to use TR-TT as its backend")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
On Gen12 (the oldest we support on Mesa right now for TR-TT) we
started having per-engine TR-TT registers and we are supposed to make
all contexts share the same TR-TT programming.
On LNL+, this is documented in the BSpec page for the TRTT_CNTRL
register (68417), with more details in HSDs 14020454786 and
16022013154.
On Gen12 platforms this information is a little harder to find and
there's a whole trail of HSDs leading up to 1209977595, which links to
the documents that describe the programming. BSpec for TR-TT on Gen12
is very confusing as it still contains registers and other information
from Gen11 that were not removed.
Regarding the additional BLT and COMP registers, please notice that on
the BSpec pages for the TR-TT registers, the "Register Instance"
section only lists the GFX registers as non-privileged. However, the
"User Mode Privileged Commands" lists the other instances of the TR-TT
Regsiters as non-privileged, which matches what we see: there's no
need to put these addresses in the FORCE_TO_NONPRIV registers.
Notice that for now, when TR-TT is being used we only expose a single
queue, so this change effectively does nothing until we start exposing
extra queues. I left that part for later to help bisectability.
v2:
- s/trtt_init_context_state/trtt_init_queues_state/ (José)
- pass device as the argument to init_queues_state (José)
v3:
- use async_submit_end (José)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
Having this as a separate batch was the normal behavior until
7da5b1caef ("anv: move trtt submissions over to the
anv_async_submit").
While it certainly sounds better to do everything related to TR-TT
initialization in one batch, we need to revert it back to be a
separate batch (but now using the new anv_async_submit infrastructure)
because we'll want to run this batch on every engine.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
I haven't seen this happening anywhere, but let's have it for
correctness.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
When I wrote sparse resources support for Anv we didn't have TileYs
support so I made non-opaque binds work even for non-standard block
shapes, which meant the block size could be either 64k or 4k. Since
then we merged TileYs support and changed our sparse resources
implementation to treat all the non-standard block shape cases as
"everything is the miptail", which means non-opaque binds are not
possible. So here we adjust the code to more explicitly represent
that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
There are 3 different places in our code where we calculate the tile
size and until recently the 3 implementations were different and with
slight bugs. Unify everything and also change the calculation to use
tile_info->phys_extent_B.
While doing this we move the isl_surf_get_tile_info() calls from
anv_sparse_calc_block_shape() to its callers so we total amount of
times we call it doesn't change.
v2: Adjust the patch now that tile_info is not part of isl_surf
anymore.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
The code that tries to create a "pretend block shape" for linear
tiling surfaces was necessary back when we were going to support
sparse residency (non-opaque binds) for non-standard block shapes
(since there was uncertainty about TileYs support). That hasn't been
the case since before we merged sparse resources upstream, so remove
the code and leave an assertion instead, just in case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
Since commit 18d8c3ca33 we were allocating a little more than what
we were actually using (2621440 bytes instead of 2097152, aka 0x280000
instead of 0x200000), and we were not properly marking the BO as
internal. No applications should be misbehaving because of this.
Fixes: 18d8c3ca33 ("anv: Add missing ANV_BO_ALLOC_INTERNAL")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
I've found myself adding this piece of code to our codebase when
debugging some Zink sparse failures recently, so let's upstream it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
This calculation was wrong for both compressed formats and
multi-sampled images. As a result, we misreported the image as having
a single miptail.
No Vulkan or GL CTS tests were tripping on this bug. I found this
while looking for tile size calculations after fixing a similar bug
elsewhere in the code.
The calculation should now match what we have in
anv_sparse_bind_image_memory(), which is widely tested.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
We have to take the number of samples into account when calculating
the tile size. If we don't do this, multi-sampled images may end up
falling in the "goto out_everything_is_miptail" case, while in reality
multi-sampled images don't even have miptails.
Also assert that the value is one of the only two values we expect
this to be. This assert would have been useful to catch this issue,
since with multi-sampled images we were getting values like 16k or 32k
depending on the number of samples.
This helps move forward progress in some Zink tests, but does not
make them fully pass yet, as those tests are full of sub-cases and
this only helps some of them:
KHR-GL46.sparse_texture2_tests.UncommittedRegionsAccess
KHR-GL46.sparse_texture2_tests.SparseTexture2Commitment
KHR-GL46.sparse_texture2_tests.SparseTexture2Lookup
Fixes: 7ef3d652b2 ("anv/sparse: enable MSAA for Sparse when applicable")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
The Vulkan spec splits sparse resources in two different features:
sparse binding and sparse residency.
Sparse binding is much simpler. It requires the resources to be fully
bound before being used and it treats them as a black box. We're
required to support sparse binding for all the formats that are
supported by non-sparse, but that's easy beacause this feature is
simpler.
Now sparse residency is the one where we're allowed to partially bind
resources, and the one that comes with more complicated features such
as block shapes and non-opaque binding of images. This feature is
subdivided into:
- sparseResidencyBuffer
- sparseResidencyImage2D
- sparseResidencyImage3D
- sparseResidency{2,4,8,16}Samples (which refers to 2D images)
Notice that there's no sparseResidencyImage1D. And if you read the
specs it's clear that sparse residency is meant for non-1D images.
Still, supporting it didn't require any extra effort in Anv so we just
did it.
That's until we started running GL CTS tests on Zink. There's a CTS
test that checks for the standard block shapes. It creates 1D images
and expects the block shapes for them to be the standard 2D block
shapes. While we could very well just patch
anv_sparse_calc_image_format_properties() to return the standard 2D
block shapes for 1D images, that's just wrong (block shapes for 1D
images are just line segments, not rectangles!) so let's just reject
this all until maybe one day Vulkan defines sparseResidencyImage1D and
we get GL_ARB_sparse_texture3 to match it, or somebody decides to
change the GL CTS test.
Testcase: KHR-GL46.sparse_texture2_tests.StandardPageSizesTestCase
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29337>
We can remove a bunch of TRTT specific code from the backends as well
as manual submission tracking.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
Drop usage of pthread mutex so initialization never fails.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
A format can't be standard and non-standard at the same time. If we
ever hit this assertion, it's because something behind the scenes has
evolved (such as the tiling formats) so something that was marked as
non-standard became standard. Add an assertion so we can quickly catch
these issues in the future and adjust the code.
I don't want to mix this assertion with the one in the line above
since that one is the most useful assertion we have in all the sparse
code, so it's good to know which one we're hitting.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
The Tile64 format from Xe2 is weird and some of its MSAA shapes are
non-standard. Reject them. Otherwise, we'll get dEQP failures such as:
deqp-vk: ../../src/intel/vulkan/anv_sparse.c:829: anv_sparse_calc_image_format_properties: Assertion `is_standard || is_known_nonstandard_format' failed.
Many tests can reproduce this issue, including:
dEQP-VK.memory.requirements.extended.image.sparse_tiling_optimal
Testcase: dEQP-VK.memory.requirements.extended.image.sparse_tiling_optimal
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
This is all that's needed to make anv_sparse_bind_image_memory() work
with multi-sampled images.
The assert() we just added would have been really helpful when
debugging this.
All the dEQP tests with "sparse" in their names are passing *even*
without this patch. Real-world applications show very clear visual
corruption for sparse MSAA images bound through non-opaque binds since
only a fraction of the the actual image ends up being bound.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
Yes, I understand that this looks like the kind of check that the
applications should be doing instead of us, but if we don't that, dEQP
will have failures. If we claim support for any multi-sampled sparse
feature, dEQP will try to create multi-sampled sparse images with all
possible sample counts, including the ones supported by non-sparse but
not supported by sparse (x8 and x16 on Tile64 platforms) and also the
ones not supported at all, like x32 and x64.
This change affects a number of dEQP tests, including:
- dEQP-VK.api.info.sparse_image_format_properties2.2d.optimal.r32g32_sfloat
Without this patch, and with sparse multi-sampling enabled, this would
hit the following assertion:
anv_sparse.c:866: anv_sparse_calc_image_format_properties: Assertion `is_standard || is_known_nonstandard_format' failed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
Our hardware has more than one layout for multi-sampled images that
use the tiling formats that give us the sparse standard block shapes:
see enum isl_msaa_layout. Only the layout we use for colored images is
compatible with the standard block shapes, so it's the only one we can
expose for multi-sampled sparse.
This change affects a number of dEQP tests, including:
- dEQP-VK.memory.requirements.create_info.image.sparse_residency_aliased_tiling_optimal
Without this patch, and with sparse multi-sampling enabled, this test
would hit the following assertion:
anv_sparse.c:866: anv_sparse_calc_image_format_properties: Assertion `is_standard || is_known_nonstandard_format' failed.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
We're not enabling sparse on multi-sampled images yet, but having the
table here is a first step. The current approach should make the code
a little more compact.
These tables are in section 33.4.3: Standard Sparse Image Block Shapes
of the Vulkan 1.3 spec.
PS: I know we've questioned the need for us to have these tables here
as they are something dEQP should check, but I've hit the "this shape
is not standard" assertion multiple times during development of the
various sparse features, and that really helps narrowing down the
problems. For example, see the next 2 patches in this MR.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27306>
Some places doing driver internal allocations was not setting
ANV_BO_ALLOC_INTERNAL, so adding the flag in those places here.
This will increase the accuracy of the RMV report.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28677>
None of the callers of anv_free_sparse_bindings() check for its return
result, and they also don't have a way to propagate it up the stack.
So just don't return error codes that won't be checked. Instead,
add an assertion so at least we can detect failures in our CI or
development runs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
The device->using_sparse variable is only used at cmd_buffer_barrier()
to decide if we need to apply the heavier-weight flushes that are only
applicable to sparse resources. The big problem here is that we need
to apply the flushes to the non-image and non-buffer memory barriers,
so we were trying to limit those only to applications that ever submit
a sparse resource to the sparse queue.
The reason why we were applying this only to devices that ever
submitted sparse resources is that dxvk games have this thing where
during startup they create and then delete tiny sparse resources, so
switching device->using_sparse to true at resource creation would make
basically every dxvk game start applying the heavier-weight
workaround.
The problem with all that is that even if an application creates a
sparse resource but doesn't ever bind them, the resource should still
behave as an unbound resource (because they are bound with a NULL
bind), so the flushes affecting them should happen. This case is
exercised by vkd3d-proton/test_buffer_feedback_instructions_sm51.
In order to satisfy all the above cases and only really apply the
heavier-weight flushes to applications actually using sparse
resources, let's just count the number of sparse resources that
currently exist and then apply the workaround only if it's not zero.
That covers the dxvk case since dxvk deletes the resources as soon as
they create, so num_sparse_resources goes back to 0.
Testcase: vkd3d-proton/test_buffer_feedback_instructions_sm51
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10960
Fixes: 6368c1445f ("anv/sparse: add the initial code for Sparse Resources")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
This went unused a while ago. If we decide we want it again we can
just add it back.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
Since we moved the dump_anv_vm_bind() call to anv_sparse_bind(), that
BEGIN/END block stopped making sense, so just keep the first set of
messages.
Also wrap everything around a single INTEL_DEBUG() check so we'll only
run this check once when debug is disabled (we don't care about
running the check multiple times if it's enabled).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
In both cases we end up calling anv_image_aspect_to_plane(), which
already includes the same assertion.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
If isl_surf_get_tile_info() returned the struct instead of having it
passed as a pointer, gcc would have detected this. I can write patches
for that if we want it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
Having just one place to check the Sparse type is less error prone.
For example in i915 it was always setting sparse_uses_trtt to true
even if running in gfx 9 that don't support sparse.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>
All these vfuncs funnel down to either stubs or the xe_vm_bind_op()
function. By returning int we're shifting VkResult generation to the
callers, which are simply not doing the correct job. If they get
VkResult they can simply throw the errors up the stack without having
to erroneously try to figure out what really happened.
Today the callers are returning either VK_ERROR_UNKNOWN or
VK_ERROR_OUT_OF_DEVICE_MEMORY, but after the patch we're returning
either VK_ERROR_OUT_OF_HOST_MEMORY or VK_ERROR_DEVICE_LOST.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>
We can now finally leave the semaphore waits and signals to the
vm_bind ioctl, making vm_bind operations truly asynchronous.
This was previously done for TR-TT in 18bd00c024 ("anv/trtt: don't
wait/signal syncobjs using the CPU anymore").
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>