mesa/src/intel/vulkan/xe
Paulo Zanoni 6d6b22b734 intel/xe: unify behavior with i915.ko regarding ENOMEM on DRM_IOCTL_XE_EXEC
When the system is under memory pressure (which can happen, for
example, during CI runs), don't immediately give up the exec ioctl
(which, for Vulkan, will result in the device being declared lost).
Instead, retry a little bit just like we do for i915.ko.

This is a trade-off.

One of the reasons to *not* have unified behavior regarding ENOMEM
between i915.ko and xe.ko is the fact that xe.ko uses vm_bind, so if
the user tried to bind more memory than it is able to, we'll just keep
getting ENOMEM as long as we retry the ioctl. We now have a retry
limit, so we'll eventually return the error.

On the other hand, if the problem is other applications consuming all
the memory, having the retry loop may really help avoid unnecessarily
marking the device as lost, since one of our retries may eventually
succeed.

I believe the tradeoff of "we'll now eventually succeed in some cases
where it's possible to succeed, at the expense of retrying for a few
seconds until giving up in cases where we would never be able to
succeed" is an improvement.

If xe.ko ever gives us a way to differentiate between the two
different reasons for ENOMEM, we'll be able to make things much
better. We can also tune our timeouts if needed.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37559>
2025-10-07 19:48:36 +00:00
..
anv_batch_chain.c intel/xe: unify behavior with i915.ko regarding ENOMEM on DRM_IOCTL_XE_EXEC 2025-10-07 19:48:36 +00:00
anv_batch_chain.h anv: make device initialization more asynchronous 2024-06-13 08:29:25 +00:00
anv_device.c anv/i915: Require HAS_EXEC_TIMELINE_FENCES 2025-08-16 00:04:46 -04:00
anv_device.h anv: hide exec_flags selection inside the i915 backend 2023-07-13 17:12:26 +00:00
anv_kmd_backend.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
anv_queue.c build: avoid redefining unreachable() which is standard in C23 2025-07-31 17:49:42 +00:00
anv_queue.h anv: Optimize vkQueueWaitIdle() on Xe KMD 2024-09-19 23:12:45 +00:00