Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restart: allow restart from a different checkpoint #2033

Merged
merged 3 commits into from
Oct 14, 2016

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented Oct 10, 2016

Command line option and GUI entry box updated to allow restart from a
different checkpoint.

The new --checkpoint=CHECKPOINT option replaces the old STATE argument.

Also:

  • mark state dump rolling archive length in global.rc as obsolete.
  • fix the logging interface for diagnostic on restart loading from DB.
  • fix generation of reference.log.

@arjclark @hjoliver please review.

(Despite a debate to death in #1827 and deferring to #1735, it looks like there is an immediate use case for doing a restart with a pre-reload checkpoint. Thank goodness I only have to put in a small logic change for the user interface, as cylc.rundb already leaves room in anticipation for this functionality.)

@matthewrmshin
Copy link
Contributor Author

Like before when using an old state file, using an old checkpoint requires a matching suite.rc in the suite definition directory if the graph differs.

It is perhaps worth storing the corresponding suite.rc in log/suiterc/* with each check point so matching suite.rc is loaded automatically on restart? However, this will change the current user interface.

@hjoliver
Copy link
Member

Now that checkpoints are usable in this way, can you add a bit of documentation to the user guide?

@matthewrmshin
Copy link
Contributor Author

Doc updated.

@matthewrmshin
Copy link
Contributor Author

Branch rebased.

Copy link
Contributor

@arjclark arjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some documentation tweaks, code otherwise looks good. Test battery passing in my environment. Just need to manually test gui.

run. This allows restarting a suite that was shut down or killed, without
rerunning tasks that were already completed, or which were already submitted or
running when the suite went down.
A restart starts a suite run from the state recorded at a check point, which is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't checkpoint be a single word?

previous recorded suite state so that it can carry on from wherever it got to
before being shut down or killed.
A restarted suite (see \lstinline=cylc restart --help=) is initialized from a
previous recorded check point, which is normally the end of a previous run, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpoint?

checkpoint to use to restart a suite. (See also
\lstinline=cylc ls-checkpoints --help=.)
The check point ID 0 (zero) is always used for latest state of the suite. The
check point IDs of non-latest states are positive integers starting from 1, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpoint?

\lstinline=cylc ls-checkpoints --help=.)
The check point ID 0 (zero) is always used for latest state of the suite. The
check point IDs of non-latest states are positive integers starting from 1, and
incremented each time a new check point is stored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpoint?

check point IDs of non-latest states are positive integers starting from 1, and
incremented each time a new check point is stored.

Once you have identified the check point to use, invoke the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpoint?

@@ -95,6 +95,11 @@ def parse_commandline(is_restart):

if is_restart:
parser.add_option(
"--checkpoint",
help="Use specified instead of latest checkpoint to restart",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use specified instead of latest checkpoint to restart

@matthewrmshin
Copy link
Contributor Author

Docs updated.

@arjclark
Copy link
Contributor

Looks ok to me. Final say to you @hjoliver

@arjclark arjclark removed their assignment Oct 11, 2016
@hjoliver
Copy link
Member

hjoliver commented Oct 11, 2016

@matthewrmshin - I don't think you've explained when checkpoints are generated? Latest state (continually updated) plus reload (before and after), and restart?

Command line option and GUI entry box updated to allow restart from a
different checkpoint.
Also make `state dump rolling archive length` in `global.rc` obsolete.
@matthewrmshin
Copy link
Contributor Author

Document updated. Branch rebased.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@hjoliver hjoliver merged commit 153272f into cylc:master Oct 14, 2016
@matthewrmshin matthewrmshin deleted the restart-from-checkpoint branch December 13, 2016 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants