-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restart: allow restart from a different checkpoint #2033
Conversation
Like before when using an old state file, using an old checkpoint requires a matching It is perhaps worth storing the corresponding |
Now that checkpoints are usable in this way, can you add a bit of documentation to the user guide? |
5a70aa2
to
f5a9367
Compare
Doc updated. |
f5a9367
to
3003509
Compare
Branch rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation tweaks, code otherwise looks good. Test battery passing in my environment. Just need to manually test gui.
run. This allows restarting a suite that was shut down or killed, without | ||
rerunning tasks that were already completed, or which were already submitted or | ||
running when the suite went down. | ||
A restart starts a suite run from the state recorded at a check point, which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't checkpoint be a single word?
previous recorded suite state so that it can carry on from wherever it got to | ||
before being shut down or killed. | ||
A restarted suite (see \lstinline=cylc restart --help=) is initialized from a | ||
previous recorded check point, which is normally the end of a previous run, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkpoint?
checkpoint to use to restart a suite. (See also | ||
\lstinline=cylc ls-checkpoints --help=.) | ||
The check point ID 0 (zero) is always used for latest state of the suite. The | ||
check point IDs of non-latest states are positive integers starting from 1, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkpoint?
\lstinline=cylc ls-checkpoints --help=.) | ||
The check point ID 0 (zero) is always used for latest state of the suite. The | ||
check point IDs of non-latest states are positive integers starting from 1, and | ||
incremented each time a new check point is stored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkpoint?
check point IDs of non-latest states are positive integers starting from 1, and | ||
incremented each time a new check point is stored. | ||
|
||
Once you have identified the check point to use, invoke the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkpoint?
@@ -95,6 +95,11 @@ def parse_commandline(is_restart): | |||
|
|||
if is_restart: | |||
parser.add_option( | |||
"--checkpoint", | |||
help="Use specified instead of latest checkpoint to restart", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use specified instead of latest checkpoint to restart
3003509
to
e21b719
Compare
Docs updated. |
Looks ok to me. Final say to you @hjoliver |
@matthewrmshin - I don't think you've explained when checkpoints are generated? Latest state (continually updated) plus reload (before and after), and restart? |
Command line option and GUI entry box updated to allow restart from a different checkpoint. Also make `state dump rolling archive length` in `global.rc` obsolete.
e21b719
to
3a84a7d
Compare
Document updated. Branch rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good.
Command line option and GUI entry box updated to allow restart from a
different checkpoint.
The new
--checkpoint=CHECKPOINT
option replaces the oldSTATE
argument.Also:
state dump rolling archive length
inglobal.rc
as obsolete.reference.log
.@arjclark @hjoliver please review.
(Despite a debate to death in #1827 and deferring to #1735, it looks like there is an immediate use case for doing a restart with a pre-reload checkpoint. Thank goodness I only have to put in a small logic change for the user interface, as
cylc.rundb
already leaves room in anticipation for this functionality.)