So far we only write a maximum of 4 dwords further into the batch and
it seems just going over the CS prefetch was enough.
Turns out writing more dwords can delay the writes and we start
prefetching stuff that hasn't landed in memory yet.
This fixes the issue by stalling the CS to ensure the writes have
landed before we go over the prefetch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 796fccce63 ("intel/mi-builder: add framework for self modifying batches")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8525>