Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Tailorbird" initiative: making CI your friend #9506

Open
wainersm opened this issue Apr 17, 2024 · 9 comments
Open

"Tailorbird" initiative: making CI your friend #9506

wainersm opened this issue Apr 17, 2024 · 9 comments
Labels
area/ci Issues affecting the continuous integration enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.

Comments

@wainersm
Copy link
Contributor

wainersm commented Apr 17, 2024

Context

On Virtual Kata Containers PTG Planning of April 2024 there was a discussion session lead by @jodh-intel on regarding the current problems that Kata developers have faced with CI. Please, see the topics and notes of that session in https://etherpad.opendev.org/p/kata-ptg-planning-april-2024#L160 . We ended the session with a list of volunteers (myself, @ldoktor , @stevenhorsman , @gkurz , @littlejawa) to build a "task force" aiming to improve the CI situation as much as possible.

We want CI be your friend!

Work items

Find them on the the dashboard: https://github.com/orgs/kata-containers/projects/46/views/1

Old table:

Item Owner Issues Status
"know your enemy" - gather data out of CI to identify unstable jobs @wainersm
Establish and document polices (e.g. when to promote/demote jobs to "required")
Optimize execution of jobs (e.g. triggers by files touched) @ldoktor
Foster the fix of current broken jobs / make nightly CI green

Done criteria

When we will be done with this initiative?

Syncing up

TBD - Every X days meeting? or Slack?

Volunteers

We need help! and everybody is welcomed! Please add your name:

@ldoktor , @stevenhorsman , @gkurz , @littlejawa, @sprt

Additional information

Common Tailorbird is a mostly green bird which has a stable population

@wainersm wainersm added enhancement Improvement to an existing feature needs-review Needs to be assessed by the team. area/ci Issues affecting the continuous integration labels Apr 17, 2024
@sprt
Copy link
Contributor

sprt commented Apr 17, 2024

Added myself to the list of volunteers!

@ldoktor
Copy link
Contributor

ldoktor commented Apr 18, 2024

I already started doing the Optimize execution of jobs (e.g. triggers by files touched), feel free to put me there as a owner

@wainersm
Copy link
Contributor Author

I made a script to gather data, basic stuff out of the nightly jobs actually...

import requests
import sys
import os
import re

workflow_id='ci-nightly.yaml'
list_workflow_runs_url='https://api.github.com/repos/kata-containers/kata-containers/actions/workflows/' + workflow_id + '/runs'
headers = {"Accept": "application/vnd.github+json" ,"X-GitHub-Api-Version": "2022-11-28"}

token = os.getenv("GITHUB_TOKEN")
if token != None:
    headers['Authorization'] = "Bearer " + token

# Get latest 10 ran workflows.
# TODO: parametize it!
#
r = requests.get("%s?per_page=10" %(list_workflow_runs_url), headers=headers)
r.raise_for_status()

page_size=100
runs_map=[]

for run in r.json()['workflow_runs']:
    entry = {'id': run['id'],
             'created_at': run['created_at'],
             'conclusion': None,
             'jobs': []}

    jobs_map={}
    if run['status'] == "in_progress":
        runs_map.append(entry)
        continue
    else:
        entry['conclusion'] = run['conclusion']

    # Let's paginate as jobs can span in several pages.
    total_count = -1
    page=1
    while True:
        jobs_request = requests.get("%s?per_page=%s&page=%s" % (run['jobs_url'], page_size,page), headers=headers)
        jobs_request.raise_for_status()

        for job in jobs_request.json()['jobs']:
            entry['jobs'].append({'name': job['name'], 'run_id': job['run_id'],
                                    'conclusion': job['conclusion']})

        total_count = max(total_count, jobs_request.json()['total_count'])
        if len(entry['jobs']) >= total_count:
            break
        page += 1

    runs_map.append(entry)

def collect_jobs_stats(workflows_runs):
    '''
    Return a map of {'runs': NUMBER, 'fails': NUMBER} index by job's name
    '''
    stats = {}
    for run in workflows_runs:
        for job in run['jobs']:
            job_stat = stats.get(job['name'], {'runs': 0, 'fails': 0})
            job_stat['runs']+=1
            if job['conclusion'] != 'success':
                job_stat['fails']+=1
                stats[job['name']] = job_stat
    return stats

jobs_stats = collect_jobs_stats(runs_map)

regex = re.compile('kata-containers-ci-on-push / run-.*-tests.*')
for name, stat in jobs_stats.items():
    if regex.match(name):
        print('%s: (%s) fail=%s' % (name, stat['runs'], stat['fails']))

@ldoktor @beraldoleal ^^^^ in case you have free cycles to help with bugs and improving it.

I just ran it, see the results below. Notice that sometimes the parent fails but the children jobs don't get charged by the failure. This is something that could be improved on the script. Although interpreting the data by eyes isn't difficult.

kata-containers-ci-on-push / run-cri-containerd-tests-ppc64le: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-ppc64le: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (qemu, nydus, k3s): (9) fail=9
kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (clh): (9) fail=7
kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (qemu): (9) fail=8
kata-containers-ci-on-push / run-basic-amd64-tests / run-vfio (clh): (9) fail=5
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (cloud-hypervisor): (9) fail=2
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-coco-nontee (qemu-coco-dev, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-sev (qemu-sev, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, dragonball, small): (9) fail=1
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-tdx (qemu-tdx, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-sev-snp (qemu-snp, nydus, guest-pull): (9) fail=9
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, small, oci-distribution): (9) fail=1
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2): (9) fail=3
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s): (9) fail=3
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k3s): (9) fail=1
kata-containers-ci-on-push / run-metrics-tests / run-metrics (qemu): (9) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (clh): (8) fail=2
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, containerd): (8) fail=7
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s): (8) fail=2
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k3s): (8) fail=2
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, cloud-hypervisor, normal): (8) fail=1
kata-containers-ci-on-push / run-cri-containerd-tests-s390x: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-zvsi: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, qemu, normal): (7) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-ppc64le / run-k8s-tests (qemu, kubeadm): (7) fail=5
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, crio): (6) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-vfio (qemu): (4) fail=2
kata-containers-ci-on-push / run-kata-monitor-tests: (1) fail=1
kata-containers-ci-on-push / run-metrics-tests: (1) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests: (1) fail=1
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-garm: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-on-aks: (1) fail=1
kata-containers-ci-on-push / run-k8s-tests-with-crio-on-garm: (1) fail=1
kata-containers-ci-on-push / run-kata-coco-tests: (1) fail=1
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks: (1) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-docker-tests (clh): (2) fail=1
kata-containers-ci-on-push / run-basic-amd64-tests / run-runk: (1) fail=1

@stevenhorsman
Copy link
Member

@wainersm - optionally suggestion - I wonder whether using the gh cli might be worth considering to avoid things like the pagination? e.g. gh -R kata-containers/kata-containers run list --workflow ci-nightly.yaml -s completed -L 10 --json attempt,conclusion,createdAt,databaseId,name,number to get the specified fields for the last 10 completed jobs
and something like: gh -R kata-containers/kata-containers run view <worflow id> --json databaseId,name,status,jobs to get the list of jobs and the status for the workflow id?

@beraldoleal
Copy link
Member

There is a gh plugin that does that for us, iirc by default goes over the last 100 jobs:

$ gh workflow-stats -o kata-containers -r kata-containers -f ci-on-push.yaml jobs

I pasted the output in our slack channel, but pasting here too for visibility:

🏃 Total runs: 115
  ✔ Success: 0 (0.0%)
  ✖ Failure: 28 (24.3%)
  🤔 Others: 87 (75.7%)

📈 Top 3 jobs with the highest failure counts (failure jobs / total runs)
  kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (qemu): 33/43
    └──Run tracing tests: 33/43

  kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, containerd): 30/40
    └──Run kata-monitor tests: 30/40

  kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing (clh): 22/43
    └──Run tracing tests: 22/43


📊 Top 3 jobs with the longest execution average duration
  kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, clh, small): 1554.00s
  kata-containers-ci-on-push / build-kata-static-tarball-s390x / build-asset (rootfs-image-confidential): 1324.60s
  kata-containers-ci-on-push / build-kata-static-tarball-s390x / build-asset (rootfs-initrd-confidential): 1227.36s

@stevenhorsman
Copy link
Member

@beraldoleal - that's a cool plugin. I played a bit with the options and came up with: gh -r kata-containers -o kata-containers workflow-stats -f ci-nightly.yaml jobs -n 25 -c ">$(date -d "30 days ago" +%Y-%m-%d)" to show the 25 highest failure counts in the last 30 days, which I guess it getting closer to Wainer's aims. We could use this in combination with some json processing and the gh run list command to filter out only those with a failure percentage above 50% and get specifically the last 10 runs, but I think a time base approach is equally valid.

@ldoktor
Copy link
Contributor

ldoktor commented Jun 4, 2024

Very nice, @beraldoleal, with the --json it should be quite useful (either directly with jq or in python). It even includes the individual steps...

@wainersm
Copy link
Contributor Author

wainersm commented Jun 4, 2024

hey @stevenhorsman @beraldoleal @ldoktor thanks for the feedback on the script. The workflow-stats plug-in is indeed very cool! I also played with it a little bit yesterday and it simplifies a lot the python script (that I should send a v2 soon).

One thing that intrigued me, though, is that I asked the tool to gen statistics of last 10 days but the "run count" of most jobs were "16" and I was expecting "~10" (more or less 10 because someone might have triggered the workflows manually).

@wainersm
Copy link
Contributor Author

Hi folks,

Generated the report today again, considering the last 10 executions:

kata-containers-ci-on-push / run-cri-containerd-tests-ppc64le / run-cri-containerd (active, qemu): (10) fail=4 skips=0
kata-containers-ci-on-push / run-metrics-tests / Kata Setup: (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, qemu): (10) fail=1 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (lts, qemu-runtime-rs): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-runk: (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (lts, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, qemu): (10) fail=2 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (lts, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, cloud-hypervisor): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-containerd-stability (active, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-cri-containerd (active, qemu-runtime-rs): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-tracing: (10) fail=0 skips=10
kata-containers-ci-on-push / run-basic-amd64-tests / run-nydus (active, stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-vfio (qemu): (10) fail=7 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-docker-tests (clh): (10) fail=1 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-docker-tests (qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, crio): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (clh): (10) fail=6 skips=0
kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (containerd, lts): (10) fail=10 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (dragonball): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-basic-amd64-tests / run-nerdctl-tests (cloud-hypervisor): (10) fail=4 skips=0
kata-containers-ci-on-push / run-cri-containerd-tests-s390x / run-cri-containerd (active, qemu): (8) fail=0 skips=0
kata-containers-ci-on-push / run-metrics-tests / run-metrics (clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-cri-containerd-tests-s390x / run-cri-containerd (active, qemu-runtime-rs): (8) fail=0 skips=0
kata-containers-ci-on-push / run-metrics-tests / run-metrics (qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-metrics-tests / run-metrics (stratovirt): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-ppc64le / run-k8s-tests (qemu, kubeadm): (10) fail=9 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, clh): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s): (10) fail=5 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, dragonball): (10) fail=2 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k3s): (10) fail=1 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, qemu): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2): (10) fail=6 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (ubuntu, qemu-runtime-rs): (3) fail=3 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s): (10) fail=5 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-aks / run-kata-deploy-tests (cbl-mariner, clh): (10) fail=1 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k3s): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, clh, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, rke2): (10) fail=2 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, clh, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, dragonball, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, dragonball, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, qemu, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, qemu, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, stratovirt, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, stratovirt, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, cloud-hypervisor, small): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (clh, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-with-crio-on-garm / run-k8s-tests (qemu, k0s, garm-ubuntu-2204): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (ubuntu, cloud-hypervisor, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (clh, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-with-crio-on-garm / run-k8s-tests (qemu, k0s, garm-ubuntu-2204-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, small, oci-distribution): (10) fail=7 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (dragonball, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, small, containerd): (10) fail=7 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (dragonball, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-aks / run-k8s-tests (cbl-mariner, clh, normal): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-sev (qemu-sev, nydus, guest-pull): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (fc, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-on-tdx (qemu-tdx, nydus, guest-pull): (10) fail=6 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (fc, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (qemu, devmapper, k3s, garm-ubuntu-2004): (10) fail=1 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (qemu, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-coco-nontee (qemu-coco-dev, nydus, guest-pull): (10) fail=0 skips=0
kata-containers-ci-on-push / run-kata-coco-tests / run-k8s-tests-sev-snp (qemu-snp, nydus, guest-pull): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (cloud-hypervisor, devmapper, k3s, garm-ubuntu-2004): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-garm / run-k8s-tests (cloud-hypervisor, devmapper, k3s, garm-ubuntu-2004-smaller): (10) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (devmapper, k3s): (8) fail=0 skips=0
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (nydus, k3s): (8) fail=3 skips=0
kata-containers-ci-on-push / run-cri-containerd-tests-s390x: (2) fail=0 skips=2
kata-containers-ci-on-push / run-k8s-tests-on-zvsi: (2) fail=0 skips=2

Above report is more accurate because I fixed two problems on my script:

  • wasn't counting skips
  • wasn't inserting jobs that passed 100%!

Note that it is counting 'canceled' as 'failed'. I might change that in future.

The new version:

import requests
import sys
import os
import re

workflow_id='ci-nightly.yaml'
list_workflow_runs_url='https://api.github.com/repos/kata-containers/kata-containers/actions/workflows/' + workflow_id + '/runs'
headers = {"Accept": "application/vnd.github+json" ,"X-GitHub-Api-Version": "2022-11-28"}

token = os.getenv("GITHUB_TOKEN")
if token != None:
    headers['Authorization'] = "Bearer " + token

# Get latest 10 ran workflows.
# TODO: parametize it!
#
r = requests.get("%s?per_page=10" %(list_workflow_runs_url), headers=headers)
r.raise_for_status()

page_size=100
runs_map=[]

for run in r.json()['workflow_runs']:
    entry = {'id': run['id'],
             'created_at': run['created_at'],
             'conclusion': None,
             'jobs': []}

    jobs_map={}
    if run['status'] == "in_progress":
        runs_map.append(entry)
        continue
    else:
        entry['conclusion'] = run['conclusion']

    # Let's paginate as jobs can span in several pages.
    total_count = -1
    page=1
    while True:
        jobs_request = requests.get("%s?per_page=%s&page=%s" % (run['jobs_url'], page_size,page), headers=headers)
        jobs_request.raise_for_status()

        for job in jobs_request.json()['jobs']:
            entry['jobs'].append({'name': job['name'], 'run_id': job['run_id'],
                                    'conclusion': job['conclusion']})

        total_count = max(total_count, jobs_request.json()['total_count'])
        if len(entry['jobs']) >= total_count:
            break
        page += 1

    runs_map.append(entry)

def collect_jobs_stats(workflows_runs):
    '''
    Return a map of {'runs': NUMBER, 'fails': NUMBER, 'skips': NUMBER} index by job's name
    '''
    stats = {}
    for run in workflows_runs:
        for job in run['jobs']:
            job_stat = stats.get(job['name'], {'runs': 0, 'fails': 0, 'skips': 0})
            job_stat['runs']+=1
            if job['conclusion'] != 'success':
                if job['conclusion'] == 'skipped':
                    job_stat['skips']+=1
                else: # failed and cancelled    
                    job_stat['fails']+=1
            stats[job['name']] = job_stat
    return stats

jobs_stats = collect_jobs_stats(runs_map)

regex = re.compile('kata-containers-ci-on-push / run-.*-tests.*')
for name, stat in jobs_stats.items():
    if regex.match(name):
        print('%s: (%s) fail=%s skips=%s' % (name, stat['runs'], stat['fails'], stat['skips']))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci Issues affecting the continuous integration enhancement Improvement to an existing feature needs-review Needs to be assessed by the team.
Projects
Issue backlog
  
To do
Development

No branches or pull requests

5 participants