Gathering workflow data

Gathering data about workflow jobs.

Writing options

Gathered data can be written to a CSV or JSON file, configured with --format [csv|json] option. There is no default option and without --format there will be no output file.

$ ./multivac/gather_job_data.py --format csv

The resulting output will look like this:

job_id,job_name,branch,commit_sha,status,queued_at,started_at,completed_at,platform,runner_label,runner_name,runner_version,failure_type
7952018403,fedora_34 (gc64),master,b7cb1421c322d93dc2893ad9e827a5b4d00e265f,success,2022-08-22T12:48:45Z,2022-08-22T12:48:51Z,2022-08-22T12:58:47Z,amd64,['ubuntu-20.04-self-hosted'],ghacts-tarantool-8-16-n5,2.295.0,
7952018262,fedora_34,master,b7cb1421c322d93dc2893ad9e827a5b4d00e265f,success,2022-08-22T12:26:26Z,2022-08-22T12:26:38Z,2022-08-22T12:35:47Z,amd64,['ubuntu-20.04-self-hosted'],,2.295.0,

Same with JSON:

$ ./multivac/gather_job_data.py --format json

{
  "7918651226": {
    "job_id": 7918651226,
    "job_name": "centos_8 (gc64)",
    "status": "success",
    "queued_at": "2022-08-19T13:01:30Z",
    "started_at": "2022-08-19T13:01:41Z",
    "completed_at": "2022-08-19T13:08:35Z",
    "runner_label": [
      "ubuntu-20.04-self-hosted"
    ],
    "platform": "amd64",
    "commit_hash": "02fae15a3adb8ea450ebbe3c250a4846cf1cca69",
    "branch": "master",
    "runner_name": "ghacts-shared-8-16-n10",
    "runner_version": "2.295.0"
  },
  "7918651223": {
    "job_id": 7918651223,
    "job_name": "opensuse_15_2 (gc64)",
    "status": "failure",
    "queued_at": "2022-08-19T13:01:30Z",
    "started_at": "2022-08-19T13:01:44Z",
    "completed_at": "2022-08-19T13:08:30Z",
    "runner_label": [
      "ubuntu-20.04-self-hosted"
    ],
    "platform": "amd64",
    "commit_hash": "02fae15a3adb8ea450ebbe3c250a4846cf1cca69",
    "branch": "master",
    "runner_name": "ghacts-shared-8-16-n3",
    "runner_version": "2.295.0",
    "failure_type": "testrun_test_failed"
  }
}

Or you can store data in InfluxDB (see InfluxDB connector):

$ multivac/gather_job_data.py --format influxdb

Limiting and filtering workflows

Workflows which were skipped or cancelled won’t be processed.

To gather data from a number of most recent workflows, use --latest:

$ ./multivac/gather_job_data.py --latest 1000

To gather data for the last N days or N hours, use --since:

$ # see data for the last week (7 days)
$ ./multivac/gather_job_data.py --since 7d
$ # see data for the last 12 hours
$ ./multivac/gather_job_data.py --since 12h

Writing data to a JSON file

To gather workflow data and write it to a JSON file, use --format json option. To gather data about tests as well, use --tests or -t option.

{
"8374189766": {
    "job_id": 8374189766,
    "workflow_run_id": 3061077644,
    "job_name": "out_of_source",
    "branch": "master",
    "commit_sha": "416500fed508968d6d890eb3ec3620ef954d1d0a",
    "conclusion": "success",
    "queued_at": "2022-09-15T14:02:44Z",
    "started_at": "2022-09-15T14:02:56Z",
    "completed_at": "2022-09-15T14:08:45Z",
    "platform": "amd64",
    "runner_label": [
        "ubuntu-20.04-self-hosted"
      ],
    "runner_name": "ghacts-shared-8-16-n17",
    "runner_version": "2.296.2"
    "tests": [
        {
        "name": "box/tx_man.test.lua",
        "configuration": null,
        "status": "fail",
        "test_number": 0
        },
        {
        "name": "replication/qsync_advanced.test.lua",
        "configuration": "memtx",
        "status": "fail",
        "test_number": 1
        }
      ]
    }
  }

Writing data to InfluxDB

To write data to InfluxDB, you need to set credentials as environmental variables (see .env-example file). Add them to the .env file and activate with the command:

$ source .env && export $(cut -d= -f1 .env)

Script uses different buckets for job data and test data. The job bucket receives the following structure:

{
"measurement":"failure_type or "success",
"tags":{
    "job_id":"job ID",
    "job_name":"job name",
    "workflow_run_id":"workflow run ID",
    "branch":"head branch",
    "commit_sha":"head commit sha",
    "gc64": "'True' or 'False', as a string,
    "platform": "aarch64 or amd64 runner platform",
    "runner_label":"runner label",
    "runner_version":"runner version",
    "runner_name":"runner name",
    "conclusion":"job conclusion, success or failure"
  },
"fields":{
    "value":1
  },
"time":"timestamp time the job started at, in nanoseconds"
}

The test bucket receives the following structure:

{
"measurement": "test_name",
"tags": {
    "job_id": job ID,
    "configuration": test configuration,
    "job_name": job name,
    "commit_sha": head commit sha,
    "test_attempt": number of test attempt
  },
"fields": {
    "value": 1
  },
"time": "time the job started at, in nanoseconds"
}

Detecting workflow failure reasons

Multivac can detect types of workflow failures and calculate detailed statistics. Detailed description of known failure reasons can be found in Types of detected workflow failures.

$ ./multivac/gather_job_data.py --latest 1000 --failure-stats

total 20
package_building_error 5
unknown 4
testrun_test_failed 3
telegram_bot_error 2
integration_vshard_test_failed 1
luajit_error 1
testrun_test_hung 1
git_repo_access_error 1
dependency_autoreconf 1
tap_test_failed 1

Command --watch-failure name will return a list of jobs where the named failure has been detected, along with the links to workflow runs on GitHub and matching log lines:

$ ./multivac/gather_job_data.py --latest 1000 --watch-failure testrun_test_failed
7008229080  memtx_allocator_based_on_malloc      https://github.com/tarantool/tarantool/runs/7008229080?check_suite_focus=true
                        2022-06-22T16:27:25.7389940Z * fail: 1
6936376158  osx_12       https://github.com/tarantool/tarantool/runs/6936376158?check_suite_focus=true
                        2022-06-17T13:11:18.6461930Z * fail: 1
6933185565  fedora_34 (gc64)     https://github.com/tarantool/tarantool/runs/6933185565?check_suite_focus=true
                        2022-06-17T09:24:50.6543965Z * fail: 1

This is useful when working with yet undetected failure reasons: