Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 increase ironic-image build timeout to 3600s #519

Merged

Conversation

tuminoid
Copy link
Member

@tuminoid tuminoid commented Jun 11, 2024

Ironic-image build locally, with a fast disk, fast CPU, and fast net takes 500-600s, depending on pre-pulls, repository speeds etc. New cloud has none of those reliably, so it times out at 1000s far too often. Also scheduling time is considered when images are built in Jenkins, and often scheduling takes a long time. Thanks to #505 we get notifications of these failures now.

  • Ironic image: 1000s -> 3600s (build time 500-600s, less than 2x safety)
  • Sushy-tools: keep 1000s (build time 150s, 6x safety)
  • VMBC: keep 1000s (build time 100s, 10x safety)

I will cherry-pick this to release branches as well.

Ironic-image build locally, with a fast disk, fast CPU, and fast net
takes 500-600s, depending on pre-pulls, repository speeds etc. New
cloud has none of those reliably, so it times out at 1000s far too
often.

Ironic image: 1000s -> 3600s (build time 500-600s)
Sushy-tools: keep 1000s (build time 150s, 6x safety)
VMBC: keep 1000s (build time 100s, 10x safety)

Signed-off-by: Tuomo Tanskanen <tuomo.tanskanen@est.tech>
@metal3-io-bot metal3-io-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 11, 2024
@tuminoid
Copy link
Member Author

/override metal3-centos-e2e-integration-test-main metal3-ubuntu-e2e-integration-test-main
GH action.

@metal3-io-bot
Copy link
Contributor

@tuminoid: Overrode contexts on behalf of tuminoid: metal3-centos-e2e-integration-test-main, metal3-ubuntu-e2e-integration-test-main

In response to this:

/override metal3-centos-e2e-integration-test-main metal3-ubuntu-e2e-integration-test-main
GH action.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tuminoid
Copy link
Member Author

/cc @kashifest @Rozzii

@tuminoid
Copy link
Member Author

/cherry-pick release-24.1

@metal3-io-bot
Copy link
Contributor

@tuminoid: once the present PR merges, I will cherry-pick it on top of release-24.1 in a new PR and assign it to you.

In response to this:

/cherry-pick release-24.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tuminoid
Copy link
Member Author

/cherry-pick release-24.0

@metal3-io-bot
Copy link
Contributor

@tuminoid: once the present PR merges, I will cherry-pick it on top of release-24.0 in a new PR and assign it to you.

In response to this:

/cherry-pick release-24.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tuminoid
Copy link
Member Author

/cherry-pick release-23.1

@metal3-io-bot
Copy link
Contributor

@tuminoid: once the present PR merges, I will cherry-pick it on top of release-23.1 in a new PR and assign it to you.

In response to this:

/cherry-pick release-23.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tuminoid
Copy link
Member Author

/cc @NymanRobin

Copy link
Member

@NymanRobin NymanRobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is good. However, please note that there have been some issues with VM scheduling for the jobs. For instance, this job took 15 hours when scheduling time is included: Build #62.

Maybe these jobs could be moved from Jenkins to github action runners in the future to improve reliability

@Rozzii
Copy link
Member

Rozzii commented Jun 11, 2024

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2024
@Rozzii
Copy link
Member

Rozzii commented Jun 11, 2024

The change is good. However, please note that there have been some issues with VM scheduling for the jobs. For instance, this job took 15 hours when scheduling time is included: Build #62.

Maybe these jobs could be moved from Jenkins to github action runners in the future to improve reliability

WOW that is not expected IMO, anyways yeah I would guess container image building could be a github action, I think that was considered originally too.

@tuminoid
Copy link
Member Author

tuminoid commented Jun 11, 2024

The change is good. However, please note that there have been some issues with VM scheduling for the jobs. For instance, this job took 15 hours when scheduling time is included: Build #62.

Maybe these jobs could be moved from Jenkins to github action runners in the future to improve reliability

Yes agreed. We can see that after scheduling, the build itself took almost 21minutes (~1250 sec), which would've timeouted as well, so this fix is needed. It would then leave 30mins for scheduling as it is counted as well. Similarly for sushy/vmbc, the safety allows ~10 minutes of scheduling. We should be able to accomplish that in Jenkins or with runners. Jenkins scheduling we can fix ourselves, but runners would be at mercy of CNCF and other runner users.

edit: Correcting myself: for runners scheduling is not an issue, since if the workflow isn't scheduled, the time isn't running. Doh :)

Copy link
Member

@kashifest kashifest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@tuminoid
Copy link
Member Author

/cc @dtantsur @elfosardo

@dtantsur
Copy link
Member

/approve

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtantsur, kashifest, NymanRobin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2024
@metal3-io-bot metal3-io-bot merged commit 4731ff8 into metal3-io:main Jun 11, 2024
8 checks passed
@metal3-io-bot metal3-io-bot deleted the tuomo/increase-image-build-timeout branch June 11, 2024 12:57
@metal3-io-bot
Copy link
Contributor

@tuminoid: new pull request created: #520

In response to this:

/cherry-pick release-24.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot
Copy link
Contributor

@tuminoid: new pull request created: #521

In response to this:

/cherry-pick release-24.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot
Copy link
Contributor

@tuminoid: new pull request created: #522

In response to this:

/cherry-pick release-23.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

elfosardo pushed a commit to elfosardo/ironic-image that referenced this pull request Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants