"Tailorbird" initiative: making CI your friend #9506

wainersm · 2024-04-17T14:27:24Z

Context

On Virtual Kata Containers PTG Planning of April 2024 there was a discussion session lead by @jodh-intel on regarding the current problems that Kata developers have faced with CI. Please, see the topics and notes of that session in https://etherpad.opendev.org/p/kata-ptg-planning-april-2024#L160 . We ended the session with a list of volunteers (myself, @ldoktor , @stevenhorsman , @gkurz , @littlejawa) to build a "task force" aiming to improve the CI situation as much as possible.

We want CI be your friend!

Work items

Find them on the the dashboard: https://github.com/orgs/kata-containers/projects/46/views/1

Old table:

Item	Owner	Issues	Status
"know your enemy" - gather data out of CI to identify unstable jobs	@wainersm
Establish and document polices (e.g. when to promote/demote jobs to "required")
Optimize execution of jobs (e.g. triggers by files touched)	@ldoktor
Foster the fix of current broken jobs / make nightly CI green

Done criteria

When we will be done with this initiative?

Syncing up

TBD - Every X days meeting? or Slack?

Volunteers

We need help! and everybody is welcomed! Please add your name:

@ldoktor , @stevenhorsman , @gkurz , @littlejawa, @sprt

Additional information

Common Tailorbird is a mostly green bird which has a stable population

sprt · 2024-04-17T21:27:57Z

Added myself to the list of volunteers!

ldoktor · 2024-04-18T10:29:50Z

I already started doing the Optimize execution of jobs (e.g. triggers by files touched), feel free to put me there as a owner

wainersm · 2024-05-31T21:18:42Z

I made a script to gather data, basic stuff out of the nightly jobs actually...

import requests
import sys
import os
import re

workflow_id='ci-nightly.yaml'
list_workflow_runs_url='https://api.github.com/repos/kata-containers/kata-containers/actions/workflows/' + workflow_id + '/runs'
headers = {"Accept": "application/vnd.github+json" ,"X-GitHub-Api-Version": "2022-11-28"}

token = os.getenv("GITHUB_TOKEN")
if token != None:
    headers['Authorization'] = "Bearer " + token

# Get latest 10 ran workflows.
# TODO: parametize it!
#
r = requests.get("%s?per_page=10" %(list_workflow_runs_url), headers=headers)
r.raise_for_status()

page_size=100
runs_map=[]

for run in r.json()['workflow_runs']:
    entry = {'id': run['id'],
             'created_at': run['created_at'],
             'conclusion': None,
             'jobs': []}

    jobs_map={}
    if run['status'] == "in_progress":
        runs_map.append(entry)
        continue
    else:
        entry['conclusion'] = run['conclusion']

    # Let's paginate as jobs can span in several pages.
    total_count = -1
    page=1
    while True:
        jobs_request = requests.get("%s?per_page=%s&page=%s" % (run['jobs_url'], page_size,page), headers=headers)
        jobs_request.raise_for_status()

        for job in jobs_request.json()['jobs']:
            entry['jobs'].append({'name': job['name'], 'run_id': job['run_id'],
                                    'conclusion': job['conclusion']})

        total_count = max(total_count, jobs_request.json()['total_count'])
        if len(entry['jobs']) >= total_count:
            break
        page += 1

    runs_map.append(entry)

def collect_jobs_stats(workflows_runs):
    '''
    Return a map of {'runs': NUMBER, 'fails': NUMBER} index by job's name
    '''
    stats = {}
    for run in workflows_runs:
        for job in run['jobs']:
            job_stat = stats.get(job['name'], {'runs': 0, 'fails': 0})
            job_stat['runs']+=1
            if job['conclusion'] != 'success':
                job_stat['fails']+=1
                stats[job['name']] = job_stat
    return stats

jobs_stats = collect_jobs_stats(runs_map)

regex = re.compile('kata-containers-ci-on-push / run-.*-tests.*')
for name, stat in jobs_stats.items():
    if regex.match(name):
        print('%s: (%s) fail=%s' % (name, stat['runs'], stat['fails']))

@ldoktor @beraldoleal ^^^^ in case you have free cycles to help with bugs and improving it.

I just ran it, see the results below. Notice that sometimes the parent fails but the children jobs don't get charged by the failure. This is something that could be improved on the script. Although interpreting the data by eyes isn't difficult.

kata-containers-ci-on-push / run-cri-containerd-tests-ppc64le: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-ppc64le: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (qemu, nydus, k3s): (9) fail=9
kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (clh): (9) fail=7
kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (qemu): (9) fail=8
kata-containers-ci-on-push / run-basic-amd64-tests / run-vfio (clh): (9) fail=5
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (cloud-hypervisor): (9) fail=2
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-coco-nontee (qemu-coco-dev, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-sev (qemu-sev, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, dragonball, small): (9) fail=1
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-tdx (qemu-tdx, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-sev-snp (qemu-snp, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, small, oci-distribution): (9) fail=1
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2): (9) fail=3
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s): (9) fail=3
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k3s): (9) fail=1
kata-containers-ci-on-push / run-metrics-tests / run-metrics (qemu): (9) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (clh): (8) fail=2
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, containerd): (8) fail=7
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s): (8) fail=2
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k3s): (8) fail=2
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, cloud-hypervisor, normal): (8) fail=1
kata-containers-ci-on-push / run-cri-containerd-tests-s390x: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-zvsi: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, qemu, normal): (7) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-ppc64le / run-k8s-tests (qemu, kubeadm): (7) fail=5
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, crio): (6) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-vfio (qemu): (4) fail=2
kata-containers-ci-on-push / run-kata-monitor-tests: (1) fail=1
kata-containers-ci-on-push / run-metrics-tests: (1) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests: (1) fail=1
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-garm: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-aks: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-with-crio-on-garm: (1) fail=1
kata-containers-ci-on-push / run-kata-coco-tests: (1) fail=1
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks: (1) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-docker-tests (clh): (2) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-runk: (1) fail=1

stevenhorsman · 2024-06-03T10:27:37Z

@wainersm - optionally suggestion - I wonder whether using the gh cli might be worth considering to avoid things like the pagination? e.g. gh -R kata-containers/kata-containers run list --workflow ci-nightly.yaml -s completed -L 10 --json attempt,conclusion,createdAt,databaseId,name,number to get the specified fields for the last 10 completed jobs
and something like: gh -R kata-containers/kata-containers run view <worflow id> --json databaseId,name,status,jobs to get the list of jobs and the status for the workflow id?

beraldoleal · 2024-06-03T17:19:07Z

There is a gh plugin that does that for us, iirc by default goes over the last 100 jobs:

$ gh workflow-stats -o kata-containers -r kata-containers -f ci-on-push.yaml jobs

I pasted the output in our slack channel, but pasting here too for visibility:

🏃 Total runs: 115
  ✔ Success: 0 (0.0%)
  ✖ Failure: 28 (24.3%)
  🤔 Others: 87 (75.7%)

📈 Top 3 jobs with the highest failure counts (failure jobs / total runs)
  kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (qemu): 33/43
    └──Run tracing tests: 33/43

  kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, containerd): 30/40
    └──Run kata-monitor tests: 30/40

  kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (clh): 22/43
    └──Run tracing tests: 22/43


📊 Top 3 jobs with the longest execution average duration
  kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, clh, small): 1554.00s
  kata-containers-ci-on-push / build-kata-static-tarball-s390x / build-asset (rootfs-image-confidential): 1324.60s
  kata-containers-ci-on-push / build-kata-static-tarball-s390x / build-asset (rootfs-initrd-confidential): 1227.36s

stevenhorsman · 2024-06-04T08:51:47Z

@beraldoleal - that's a cool plugin. I played a bit with the options and came up with: gh -r kata-containers -o kata-containers workflow-stats -f ci-nightly.yaml jobs -n 25 -c ">$(date -d "30 days ago" +%Y-%m-%d)" to show the 25 highest failure counts in the last 30 days, which I guess it getting closer to Wainer's aims. We could use this in combination with some json processing and the gh run list command to filter out only those with a failure percentage above 50% and get specifically the last 10 runs, but I think a time base approach is equally valid.

ldoktor · 2024-06-04T09:25:20Z

Very nice, @beraldoleal, with the --json it should be quite useful (either directly with jq or in python). It even includes the individual steps...

wainersm · 2024-06-04T14:05:40Z

hey @stevenhorsman @beraldoleal @ldoktor thanks for the feedback on the script. The workflow-stats plug-in is indeed very cool! I also played with it a little bit yesterday and it simplifies a lot the python script (that I should send a v2 soon).

One thing that intrigued me, though, is that I asked the tool to gen statistics of last 10 days but the "run count" of most jobs were "16" and I was expecting "~10" (more or less 10 because someone might have triggered the workflows manually).

wainersm · 2024-06-14T18:07:31Z

Hi folks,

Generated the report today again, considering the last 10 executions:

kata-containers-ci-on-push / run-cri-containerd-tests-ppc64le / run-cri-containerd (active, qemu): (10) fail=4 skips=0
kata-containers-ci-on-push / run-metrics-tests / Kata Setup: (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, qemu): (10) fail=1 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, qemu-runtime-rs): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-runk: (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, qemu): (10) fail=2 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, qemu-runtime-rs): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing: (10) fail=0 skips=10
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-vfio (qemu): (10) fail=7 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-docker-tests (clh): (10) fail=1 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-docker-tests (qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, crio): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (clh): (10) fail=6 skips=0
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (containerd, lts): (10) fail=10 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (cloud-hypervisor): (10) fail=4 skips=0
kata-containers-ci-on-push / run-cri-containerd-tests-s390x / run-cri-containerd (active, qemu): (8) fail=0 skips=0
kata-containers-ci-on-push / run-metrics-tests / run-metrics (clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-cri-containerd-tests-s390x / run-cri-containerd (active, qemu-runtime-rs): (8) fail=0 skips=0
kata-containers-ci-on-push / run-metrics-tests / run-metrics (qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-metrics-tests / run-metrics (stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-ppc64le / run-k8s-tests (qemu, kubeadm): (10) fail=9 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s): (10) fail=5 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, dragonball): (10) fail=2 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k3s): (10) fail=1 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2): (10) fail=6 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, qemu-runtime-rs): (3) fail=3 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s): (10) fail=5 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (cbl-mariner, clh): (10) fail=1 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k3s): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, clh, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, rke2): (10) fail=2 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, clh, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, dragonball, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, dragonball, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, qemu, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, qemu, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, stratovirt, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, stratovirt, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, cloud-hypervisor, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (clh, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-with-crio-on-garm / run-k8s-tests (qemu, k0s, garm-ubuntu-2204): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, cloud-hypervisor, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (clh, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-with-crio-on-garm / run-k8s-tests (qemu, k0s, garm-ubuntu-2204-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, small, oci-distribution): (10) fail=7 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (dragonball, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, small, containerd): (10) fail=7 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (dragonball, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-sev (qemu-sev, nydus, guest-pull): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (fc, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-tdx (qemu-tdx, nydus, guest-pull): (10) fail=6 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (fc, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (qemu, devmapper, k3s, garm-ubuntu-2004): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (qemu, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-coco-nontee (qemu-coco-dev, nydus, guest-pull): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-sev-snp (qemu-snp, nydus, guest-pull): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (cloud-hypervisor, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (cloud-hypervisor, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (devmapper, k3s): (8) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (nydus, k3s): (8) fail=3 skips=0
kata-containers-ci-on-push / run-cri-containerd-tests-s390x: (2) fail=0 skips=2
kata-containers-ci-on-push / run-k8s-tests-on-zvsi: (2) fail=0 skips=2

Above report is more accurate because I fixed two problems on my script:

wasn't counting skips
wasn't inserting jobs that passed 100%!

Note that it is counting 'canceled' as 'failed'. I might change that in future.

The new version:

import requests
import sys
import os
import re

workflow_id='ci-nightly.yaml'
list_workflow_runs_url='https://api.github.com/repos/kata-containers/kata-containers/actions/workflows/' + workflow_id + '/runs'
headers = {"Accept": "application/vnd.github+json" ,"X-GitHub-Api-Version": "2022-11-28"}

token = os.getenv("GITHUB_TOKEN")
if token != None:
    headers['Authorization'] = "Bearer " + token

# Get latest 10 ran workflows.
# TODO: parametize it!
#
r = requests.get("%s?per_page=10" %(list_workflow_runs_url), headers=headers)
r.raise_for_status()

page_size=100
runs_map=[]

for run in r.json()['workflow_runs']:
    entry = {'id': run['id'],
             'created_at': run['created_at'],
             'conclusion': None,
             'jobs': []}

    jobs_map={}
    if run['status'] == "in_progress":
        runs_map.append(entry)
        continue
    else:
        entry['conclusion'] = run['conclusion']

    # Let's paginate as jobs can span in several pages.
    total_count = -1
    page=1
    while True:
        jobs_request = requests.get("%s?per_page=%s&page=%s" % (run['jobs_url'], page_size,page), headers=headers)
        jobs_request.raise_for_status()

        for job in jobs_request.json()['jobs']:
            entry['jobs'].append({'name': job['name'], 'run_id': job['run_id'],
                                    'conclusion': job['conclusion']})

        total_count = max(total_count, jobs_request.json()['total_count'])
        if len(entry['jobs']) >= total_count:
            break
        page += 1

    runs_map.append(entry)

def collect_jobs_stats(workflows_runs):
    '''
    Return a map of {'runs': NUMBER, 'fails': NUMBER, 'skips': NUMBER} index by job's name
    '''
    stats = {}
    for run in workflows_runs:
        for job in run['jobs']:
            job_stat = stats.get(job['name'], {'runs': 0, 'fails': 0, 'skips': 0})
            job_stat['runs']+=1
            if job['conclusion'] != 'success':
                if job['conclusion'] == 'skipped':
                    job_stat['skips']+=1
                else: # failed and cancelled    
                    job_stat['fails']+=1
            stats[job['name']] = job_stat
    return stats

jobs_stats = collect_jobs_stats(runs_map)

regex = re.compile('kata-containers-ci-on-push / run-.*-tests.*')
for name, stat in jobs_stats.items():
    if regex.match(name):
        print('%s: (%s) fail=%s skips=%s' % (name, stat['runs'], stat['fails'], stat['skips']))

wainersm added enhancement Improvement to an existing feature needs-review Needs to be assessed by the team. area/ci Issues affecting the continuous integration labels Apr 17, 2024

katacontainersbot added this to To do in Issue backlog Apr 17, 2024

wainersm mentioned this issue Jun 21, 2024

Finding a dashboard to our github CI #9892

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Tailorbird" initiative: making CI your friend #9506

"Tailorbird" initiative: making CI your friend #9506

wainersm commented Apr 17, 2024 •

edited

Loading

sprt commented Apr 17, 2024

ldoktor commented Apr 18, 2024

wainersm commented May 31, 2024

stevenhorsman commented Jun 3, 2024

beraldoleal commented Jun 3, 2024

stevenhorsman commented Jun 4, 2024

ldoktor commented Jun 4, 2024 •

edited

Loading

wainersm commented Jun 4, 2024

wainersm commented Jun 14, 2024

"Tailorbird" initiative: making CI your friend #9506

"Tailorbird" initiative: making CI your friend #9506

Comments

wainersm commented Apr 17, 2024 • edited Loading

Context

Work items

Done criteria

Syncing up

Volunteers

Additional information

sprt commented Apr 17, 2024

ldoktor commented Apr 18, 2024

wainersm commented May 31, 2024

stevenhorsman commented Jun 3, 2024

beraldoleal commented Jun 3, 2024

stevenhorsman commented Jun 4, 2024

ldoktor commented Jun 4, 2024 • edited Loading

wainersm commented Jun 4, 2024

wainersm commented Jun 14, 2024

wainersm commented Apr 17, 2024 •

edited

Loading

ldoktor commented Jun 4, 2024 •

edited

Loading