anv/trtt: set every entry to NULL when we create an L2 table

When we create sparse resources the first thing we do is a NULL bind
on them, as the Vulkan spec mandates certain behavior even for unbound
sparse resources. We do this with the minimal effort possible: if we
can get away with marking an L2 pointer as NULL in the L3 table, we
just do it and return, instead of going all the way to creating L1
tables and marking all the final entries as NULL.

The strategy we were using had a bug that could lead to previously
created NULL entries not being marked as NULL anymore. Let's give an
example:

 (before proceeding, keep in mind that a NULL entry in the L3 and L2
  tables has bit 1 set, it does *not* have the value 0)

 - Create a 64mb buffer that uses an entire L1 table (needs to be
   properly aligned), which triggers a NULL bind.
     - Our algorithm will just set the L3 entry (pointing to the L2
       table) as NULL.
 - Create a 64kb buffer that uses the same L2 table (but a different
   L1 table).
     - The NULL bind triggered won't do anything as the L2 table is
       already NULL.
 - Bind the first buffer to actual memory. This will end up creating
   the L2 table and the L1 table. The only entry we will set in the L2
   table will be the one pointing to the L1 table. All the other
   values will be 0 (so they won't have neither the NULL or Invalid
   bits set: access to them will lead to page faults).
 - Try to use the second buffer, which is still unbound. It was
   relying on the fact that its L2 table pointer was NULL, but now
   it's not anymore, so the page walker will fetch the L1 entries in
   the L2 table and they will all be zero instead of having the NULL
   bit set.

The fix is pretty simple: whenever we create a new L2 table, set every
entry to NULL (except the one we're about to set to non-NULL). This
preserves behavior for every other NULL resource relying on the L3
entry being set to NULL.

We don't need to do this for the L1 table because its entries are
different and instead of having bits to signal NULL entries we have
a special TR-TT register that we can set that gets compared to check
if an entry is NULL, and we conveniently program it to 0: see
ANV_TRTT_L1_NULL_TILE_VAL.

I am not aware of any real workloads that are triggering this
behavior, I found this issue while investigating something else,
running a custom sparse program in our pre-silicon environment, and it
told us about the page faults.

Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30953>
This commit is contained in:
Paulo Zanoni 2024-08-27 10:27:35 -07:00 committed by Marge Bot
parent 537ada2308
commit 4c366ef67b

View file

@ -527,6 +527,20 @@ anv_trtt_bind_add(struct anv_device *device,
anv_trtt_bind_list_add_entry(l3l2_binds, n_l3l2_binds,
trtt->l3_addr + l3_index *
sizeof(uint64_t), l2_addr);
/* We have just created a new L2 table. Other resources may already have
* been pointing to this L2 table relying on the fact that it was marked
* as NULL, so now we need to mark every one of its entries as NULL in
* order to preserve behavior for those entries.
*/
for (int i = 0; i < 512; i++) {
if (i != l2_index) {
trtt->l2_mirror[l3_index * 512 + i] = ANV_TRTT_L3L2_NULL_ENTRY;
anv_trtt_bind_list_add_entry(l3l2_binds, n_l3l2_binds,
l2_addr + i * sizeof(uint64_t),
ANV_TRTT_L3L2_NULL_ENTRY);
}
}
}
assert(l2_addr != 0 && l2_addr != ANV_TRTT_L3L2_NULL_ENTRY);
@ -697,6 +711,11 @@ anv_sparse_bind_trtt(struct anv_device *device,
* numbers are so small that a little overestimation won't hurt.
*
* We have assertions below to catch estimation errors.
*
* TODO: a bug fix caused us to put that "+ 1024" in the l3l2 capacity so
* now our estimations are super overestimating things in most cases, as
* most cases are still using a total capacity of just 1 or 2. We should
* replace this whole thing with something more efficient.
*/
int l3l2_binds_capacity = 1;
int l1_binds_capacity = 0;
@ -704,7 +723,7 @@ anv_sparse_bind_trtt(struct anv_device *device,
assert(sparse_submit->binds[b].size % (64 * 1024) == 0);
int pages = sparse_submit->binds[b].size / (64 * 1024);
l1_binds_capacity += pages;
l3l2_binds_capacity += (pages / 1024 + 1) * 2;
l3l2_binds_capacity += (pages / 1024 + 1) * 2 + 1024;
}
/* Turn a series of virtual address maps, into a list of L3/L2/L1 TRTT page