In our process of monitoring LAVA logs, we typically skip numerous
messages to enhance log clarity. We already exclude `feedback` messages
from display. These messages were just used as a heartbeat signal,
indicating that if we are receiving them, the Device Under Test (DUT) is
active.
Practically, if we continuously receive feedback messages without any
other message level (either `debug` or `target`) for several minutes,
this could be a cause for concern, as it likely indicates the device is
in a kind of livelock state.
Therefore, it is more prudent to ignore feedback messages, as they tend
to occur frequently in unstable scenarios. However, it's important to
note that any other message level will still be considered as a
heartbeat signal.
Real case where several minutes of feedback messages indicate a hang:
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/53546660
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26996>
This week we found that the r8152 issue can happen during the boot
phase, make the necessary adjustments to detect it.
https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/53651940
Notes:
- The kernel messages during the boot phase is being redirected to the
feedback messages due to the namespaces from the SSH job.
- Update the unit tests:
- Add boot phase detection
- Correctly set the boot phase when mocking LogFollower
Reported-by: Vignesh Raman <vignesh.raman@collabora.com>
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27081>
We were just detecting if a log like
[ 143.080663] r8152 2-1.3:1.0 eth0: Tx status -71
happened once before
[ 316.389695] nfs: server 192.168.201.1 not responding, still trying
But we can use a counter to be more assured that the device is
struggling to recover and we can add let this detection happen during
the boot phase.
This mimics how other freedreno devices deal with this problem, see
`cros_servo_run.py:64` for example.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27081>
We need update kernel often. We need test kernel changes often.
Introduced `KERNEL_EXTERNAL_TAG` to differ between `KERNEL_TAG` which is
also used to rebuild the containers. We don't need rebuild containers
for the external kernel, so this way we don't have to.
Updating kernel goes wruuuuuum.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23563>
Add two unit tests related to the LAVA job definition.
test_generate_lava_job_definition_sanity checks for the most important
fields, deploy actions, namespaces etc.
test_lava_job_definition compares the generated definition with static
skeleton YAML files committed inside tests/data folder.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
Simplify both UART and SSH job definitions module to share common
building blocks themselves.
- generate_lava_yaml_payload is now a LAVAJobDefinition method, so
dropped the Strategy pattern between both modules
- if SSH is supported and UART is not enforced, default to SSH
- when SSH is enabled, wrap the last deploy action to run the SSH server
and rewrite the test actions, which should not change due to the boot
method
- create a constants module to load environment variables
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
To absorb complexity from the building blocks to generate job
definitions for each mode:
- fastboot-uart
- uboot-uart
- uboot-ssh
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
Break it to smaller pieces with variable size (fastboot has 3 deploy
actions and uboot only one) to build the base definition nicely in the
end.
Extract kernel/dtb attachment and init_stage1 extraction into functions
to be later reused by SSH job definition.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
The LAVA job submitter is being used by other fd.o projects, such as
`drm/ci`, so let's make it generate more generic job definitions and
test cases.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
This should help us handling possibly slower downloads of the traces,
which leads into piglit not printing anything on the output.
After Infra will get stabilized again, needs to be reverted.
Acked-by: Helen Koike <helen.koike@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25097>
At least these two can be easily used in bare-metal or Labgrid setups.
Currently I already have MR for implementing these for Labgrid.
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24665>
Modify the build process for the images to include the build to have ci-kdl
available in the Mesa jobs. Modify also the init-stage2 to launch in the
background the process that will collect data and store a json file with the
relative changes on the recorded data.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24177>
This bring visible speedup while preparing the rootfs and containers.
Acked-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Acked-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24079>
Use a lightweight container for ping, ssh, curl and bash support.
Also use an image located at fd.o infrastructure, since we are having
some issues with Collabora's gitlab one lately.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23534>
- To keep things organized, create a base hidden jobs for alpine images,
as we have 2 now
- This image will have an SSH client ran by LAVA dispatchers
(x86_64-only) who will serve as a bypass channel of output and dmesg
and an alternative for UART.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23534>
Our LAVA farm is currently experiencing issues with running and pulling
docker. LAVA has been detecting (with a low rate) timeouts during these
commands, causing some jobs to fail with infrastructure errors.
Increasing the failure_retry will make the job retry run the container
when LAVA detects the failure without losing its place in the job queue.
We are currently investigating why docker times out. But, when LAVA
fails to detect it, we cancel the job on our side and resubmit it to the
job queue. For more information, please refer to following dashboard:
https://ci-stats-grafana.freedesktop.org/goto/VjZvaA_4z?orgId=1
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23534>
Since we use S3 artifacts for LAVA always, keep only one codepath.
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23527>
We don't use MINIO for a long time. Rename variable accordingly.
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23527>
Already in hard-freeze, so we don't have to worry about breaking changes.
Significant changes:
- LLVM 15 is used instead of 11 or 13
- /dev/shm has to be manually mounted
- Debian 12 uses libdrm 2.4.114
- reworked creating of rootfs, from debootstrap to mmdebstrap
- split `create-rootfs.sh` into `lava_build.sh`, `setup-rootfs.sh`, and `strip-rootfs.sh`
- dropped winehq repository for now (Debian wine is up-to-date enough)
- we use wine now, no need to call explicitly call wine64
- bumped libasan from version 6 to 8
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21977>
To ensure proper SSH functioning, the device IP should be added to the
LAVA device dictionary by setting device_ip. LAVA will then map the
value to lava-target-ip.
meson-g12b-a311d-khadas-vim3-cbg-4 has an IP in the dictionary, while
sun50i-h6-pine-h64-cbg-1 and meson-g12b-a311d-khadas-vim3-cbg-2 do not.
Since some devices are not yet properly configured, and device tag
fixing is not an option here, let's temporarily switch to a job
definition based on UART, until it gets fixed.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
In some devices, it takes a few dozens of seconds to LAVA post process
the job and give final metadata related to the job.
It is worth to wait a little more (up to 30 sec) to make structured log
data more accurate.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
Make hide_sensitive_data work in a block fashion, not only hiding the
JWT line, since these tokens are huge, it may break the line when it
extrapolates the YAML dump width.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
Some LAVA signals have similar log outputs and the regex associated with
the log section may conflict. Use the policy of the first regex as the
chosen one, otherwise one line may produce two Gitlab sections in a row.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
Test suite in the dut is just running SSH server and waiting for the
docker container to start the SSH session. So it can take all the test
cases accumulated duration, not just the init-stage1.sh part anymore.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
To use the supported job definition depending on some Mesa CI job
characteristics.
The strategy here, is to use LAVA with a containerized SSH session to
follow the job output, escaping from dumping data to the UART, which
proves to be error prone in some devices.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
Create a separate job definition that runs the job via SSH session.
The DUT test only sets up the SSH server via dropbear, and another
deployed docker runner in LAVA dispatcher access the DUT via SSH with
pseudo-terminal to propagate the logs in real time.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22870>
Based on work from Tomeu Vizoso.
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22604>
LAVA farm is giving datetime in UTC timezone, let's standardize it
locally for the script run, so datetimes coming from LAVA proxy calls
will be at the same timezone as the ones we use in structural logging
and traces.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>
Let's use the AutoSaveDict as structural logger abstraction to enable
real-time monitoring of LAVA jobs. Mainly used for local runs and
debugging of Mesa CI LAVA jobs.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>
Refactor some pieces of the submitter to improve the clarity of the
functions and create a simple dictionary with aggregated data from the
submitter execution which will be dumped to a file when the script
exits.
Add support for the AutoSaveDict based structured logger as well, which
will come in a follow-up commit.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>
Update the LogFollower class to improve section handling and provide a
history of sections encountered during log processing:
1. Add section_history attribute to store the history of encountered
GitlabSections.
2. Make LAVA job submitter use the section history feature to improve
structural logging.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22500>