Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update purge test resources (both azure and aws) workflows #7551

Merged
merged 1 commit into from
May 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions .github/workflows/purge-aws-test-resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
Expand All @@ -23,8 +23,8 @@ on:

env:
AWS_REGION: us-west-2
AWS_RESOURCE_TYPES: 'AWS::RDS::DBSubnetGroup,AWS::RDS::DBInstance,AWS::S3::Bucket,AWS::Logs::MetricFilter,AWS::Logs::LogGroup'
AWS_RESOURCE_TYPES: "AWS::RDS::DBSubnetGroup,AWS::RDS::DBInstance,AWS::S3::Bucket,AWS::Logs::MetricFilter,AWS::Logs::LogGroup"

jobs:
purge_aws_resources:
name: Delete old AWS resources created by tests
Expand All @@ -40,3 +40,15 @@ jobs:
- name: Delete old AWS resources
run: |
./.github/scripts/delete-aws-resources.sh ${{ env.AWS_RESOURCE_TYPES }}
- name: Create issue for failing purge aws test resources run

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think it is too verbose?

uses: actions/github-script@v7
if: failure()
with:
github-token: ${{ secrets.GH_RAD_CI_BOT_PAT }}
script: |
github.rest.issues.create({
...context.repo,
title: `Purge aws test resources failed - Run ID: ${context.runId}`,
labels: ['bug', 'test-failure'],
body: `## Bug information \n\nThis bug is generated automatically if the purge aws test resources workflow fails. For the further investigation, please visit [here](${process.env.ACTION_LINK}).`
})
20 changes: 16 additions & 4 deletions .github/workflows/purge-test-resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
Expand All @@ -18,15 +18,15 @@ name: Purge test resources

permissions:
id-token: write # Required for requesting the JWT
contents: read # Required for actions/checkout
contents: read # Required for actions/checkout

on:
schedule:
# Run twice a day
- cron: "30 0,12 * * *"

env:
AZURE_RG_DELETE_LIST_FILE: 'az_rg_list.txt'
AZURE_RG_DELETE_LIST_FILE: "az_rg_list.txt"
# The valid resource time window in seconds to delete the test resources. 6 hours
VALID_RESOURCE_WINDOW: 6*60*60
jobs:
Expand Down Expand Up @@ -93,5 +93,17 @@ jobs:
cat ${{ env.AZURE_RG_DELETE_LIST_FILE}} | while read line
do
echo " * $line" >> $GITHUB_STEP_SUMMARY
az group delete --resource-group $line --yes --verbose
az group delete --resource-group $line --yes --verbose --no-wait
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that because we have a lot of resource groups, it takes more than 5 minutes to delete them. Within that 5 minutes, the session loses its access because login is only granted for 5 minutes. Please see: Azure/login#372.

I may be wrong about this. Open to suggestions. cc/ @youngbupark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a similar issue to this one: #7494. There was a regression in the az cli with the login expiration so they provided some workarounds here: Azure/azure-cli#28737 (comment)

But we already use a service principal and that still seems to be failing for us. I think they plan to release a fix on 4/30 so we may need to revisit this once that version is out

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like az login issues 5 mins expiry token. I am ok to add --no-wait. The fix will be released in 2.60 CLI.

done
- name: Create issue for failing purge test resources run

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need this.. it is too verbose and noisy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think that we should know if this workflow fails? Otherwise, we may only understand when we hit resource limits. Happy to discuss further and listen to other ideas.

uses: actions/github-script@v7
if: failure()
with:
github-token: ${{ secrets.GH_RAD_CI_BOT_PAT }}
script: |
github.rest.issues.create({
...context.repo,
title: `Purge test resources failed - Run ID: ${context.runId}`,
labels: ['bug', 'test-failure'],
body: `## Bug information \n\nThis bug is generated automatically if the purge test resources workflow fails. For the further investigation, please visit [here](${process.env.ACTION_LINK}).`
})
Loading