Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use original task's start_date if a task continues after deferral #20062

Merged
merged 2 commits into from
Jan 5, 2022

Conversation

eskarimov
Copy link
Contributor

The PR intends to close the issue #19382, using task's original start_date when it continues after deferral.

Comment on lines -1203 to +1211
self.start_date = timezone.utcnow()
# If the task continues after being deferred (next_method is set), use the original start_date
self.start_date = self.start_date if self.next_method else timezone.utcnow()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would start_date be if the task is not deferred? If it’s None, perhaps a better solution would be

if self.start_date is None:
    self.start_date = timezone.utcnow()

but perhaps it would not be None when being rescheduled? I guess what I’m trying to say is we might want to consider more context for this, since relying on next_method is a bit unstable since that value is meaning something else and it’s too easy to change its logic elsewhere without realising it causes side effect here.

Copy link
Contributor Author

@eskarimov eskarimov Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right, when a new task instance starts the first time, then start_date would be None.
In case a task was rescheduled, then start_date will be equal to the previous run attempt.

Does it make sense to consider setting start_date to None when we clear the task instance?
https://github.com/apache/airflow/blob/1349cc62df9fa872fd24865ade7b281bb229d70c/airflow/models/taskinstance.py#L202-L204

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes the most sense to change start_date in TaskInstance.set_state().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here now:

  • There's no other indicator aside TaskInstance.next_method to use for identifying if the task was returned after deferral.
  • A task instance is refreshed from DB each time it starts. It happens in function check_and_change_state_before_execution()
    https://github.com/apache/airflow/blob/1349cc62df9fa872fd24865ade7b281bb229d70c/airflow/models/taskinstance.py#L1168-L1174
    where it got start_date assigned from the previous run.
    If we want to use the snippet from above with if self.start_date is None:, then start_date needs to be None in DB at the moment of refresh, but it doesn't feel right to set it to None in DB before the new task instance has actually started executing.
  • TaskInstance.set_state() isn't used inside TaskInstance class, I've checked for usages, used in different places like airflow.jobs, airflow.executors.debug_executor, etc.

Copy link
Member

@uranusjr uranusjr Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler calls set_state() to defer a task, so I was suggesting setting start_date to None when that happens (the scheduler would commit this change to the database), so when the task instance was fetched after deferral, the database record would have start_date of None. But if you feel clearing start_date in the database is wrong (based on your comment above), the best approach would be what you have right now, plus a comment reminding future maintainers to keep an eye on this part of the code base when they change the semantic of next_method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If start_date would be set to None when the scheduler goes to defer the task, then we won't be able to know what was the original start_date, i.e. when the task before deferral started.
Because we want to count the overall execution time, including time before deferral + defer time.

Sorry, I haven't explained it correctly what I meant about clearing start_date in database:
it's getting tricky, when the task is up for retry.
In case of the second and further attempts refresh_from_db() sets start_date equal to the previous try attempt, i.e. it'd be not None.
Which means if we use the snippet above

if self.start_date is None:
    self.start_date = timezone.utcnow()

then in case of re-try, the task would continue execution with the start_date from the previous attempt, which is wrong.

@eskarimov
Copy link
Contributor Author

Could someone check the PR if it looks good, please?

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Dec 24, 2021
@eskarimov
Copy link
Contributor Author

I've rebased the PR to run the full matrix of tests, everything looks green, do you think we could merge the PR?

@potiuk potiuk merged commit 5cc5f43 into apache:main Jan 5, 2022
@eskarimov eskarimov deleted the 19382-fix-task-start-date branch January 6, 2022 09:33
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants