Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual flow retries #7152

Merged
merged 97 commits into from
Oct 24, 2022
Merged

Manual flow retries #7152

merged 97 commits into from
Oct 24, 2022

Conversation

anticorrelator
Copy link
Contributor

@anticorrelator anticorrelator commented Oct 13, 2022

closes #7127

Implements the ability to manually retry flow runs in terminal states by proposing an AwaitingRetry scheduled state

This PR also includes a refactor of flow retry logic

  • we no longer check for failed task runs and marking them rerunnable on retry/restart
  • instead, we track whether or not a flow is retrying
  • while attempting to retry a flow, the engine attempts to orchestrate any tasks it discovers

This should significantly reduce the overhead of retrying flows with many task runs

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • This pull request includes tests or only affects documentation.
  • This pull request includes a label categorizing the change e.g. fix, feature, enhancement

@anticorrelator anticorrelator added the feature A new feature label Oct 13, 2022
@netlify
Copy link

netlify bot commented Oct 13, 2022

Deploy Preview for prefect-orion ready!

Name Link
🔨 Latest commit 83d98cc
🔍 Latest deploy log https://app.netlify.com/sites/prefect-orion/deploys/63571e23141967000975ac76
😎 Deploy Preview https://deploy-preview-7152--prefect-orion.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@billpalombi
Copy link
Contributor

This design is consistent with what we discussed yesterday. It's somewhat more complex than the "soft" restart, but I appreciate the protection against potential failure modes.

Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions for ya

src/prefect/orion/schemas/core.py Outdated Show resolved Hide resolved
src/prefect/orion/api/flow_runs.py Outdated Show resolved Hide resolved
src/prefect/orion/orchestration/core_policy.py Outdated Show resolved Hide resolved
@zanieb
Copy link
Contributor

zanieb commented Oct 14, 2022

Perhaps we can separate the changes required for a hard-restart from the changes required to expose the existing retry mechanism? If they're in two pull requests we can first review exposure of the retry/restart mechanism via an API route then , separately and without blocking the feature we need to deliver, we can have a more extended discussion about adding an option to that route to perform a hard restart.

Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions

src/prefect/orion/orchestration/core_policy.py Outdated Show resolved Hide resolved
Copy link
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems way overcomplicated to me, in particular the need to separately track "retries", "restarts" and "run counts" suggests that the core motivation of running the flow again is being dominated by some degree of edge cases.

@anticorrelator anticorrelator changed the title Flow restarts Manual flow retries Oct 24, 2022
Copy link
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 I really like how clean this feels now, nice work @anticorrelator!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restart Flow Runs from Failed Tasks
6 participants