ci/lava: Increase Docker action failure_retry counter

Our LAVA farm is currently experiencing issues with running and pulling
docker. LAVA has been detecting (with a low rate) timeouts during these
commands, causing some jobs to fail with infrastructure errors.

Increasing the failure_retry will make the job retry run the container
when LAVA detects the failure without losing its place in the job queue.

We are currently investigating why docker times out. But, when LAVA
fails to detect it, we cancel the job on our side and resubmit it to the
job queue. For more information, please refer to following dashboard:
https://ci-stats-grafana.freedesktop.org/goto/VjZvaA_4z?orgId=1

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23534>
This commit is contained in:
Guilherme Gallo 2023-06-08 12:34:35 -03:00 committed by Marge Bot
parent ec819a16b9
commit d222502624

View file

@ -117,7 +117,7 @@ def generate_docker_test(args):
init_stages_test = {
"namespace": "container",
"timeout": {"minutes": args.job_timeout_min},
"failure_retry": 1,
"failure_retry": 3,
"definitions": [
{
"name": "docker_ssh_client",