Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cylc suite-state command and cycle point format. #1332

Closed
hjoliver opened this issue Feb 15, 2015 · 17 comments
Closed

cylc suite-state command and cycle point format. #1332

hjoliver opened this issue Feb 15, 2015 · 17 comments
Assignees
Labels
Milestone

Comments

@hjoliver
Copy link
Member

cylc-6 has somewhat broken automatic suite state polling tasks, which are defined like this in the graph:

polling_task<REMOTE_SUITE::REMOTE_TASK> => get-data

because the automatically-generated cylc suite-state command scripting assumes that the remote suite has the same cycle point format as the local one - which, from NIWA experience, is often not the case.

So, it would be good if cylc suite-state could take cycle point format as an optional argument, or for automatic use as above, from the environment.

@hjoliver hjoliver added this to the soon milestone Feb 15, 2015
@hjoliver
Copy link
Member Author

Even better, could the cycle point format in the remote suite database be discoverable? @arjclark - would it be worth adding a small table to the db to hold static information such as this?

@arjclark
Copy link
Contributor

it would be good if cylc suite-state could take cycle point format as an optional argument

Strictly speaking that shouldn't be necessary. The argument for --cycle-CYCLE should be formatted correctly for the target suite. Could you not add a runtime entry for polling_task<REMOTE_SUITE::REMOTE_TASK> that contains the format it should pass the cycle argument in?

could the cycle point format in the remote suite database be discoverable

I don't have any major objections, but if we did that we'd have to be careful to ensure backwards compatibility with suites that were already running but didn't have the new table (so default to whatever format the asking suite is in), though again a configuration entry in the suite doing the polling may well be preferable e.g. something like this:

[scheduling]
  [[dependencies]]
    REMOTE_SUITE_POLLING_FORMAT(<suite_id1>)=DDMMYYHH
    REMOTE_SUITE_POLLING_FORMAT(<suite_id2>)=YYMMDD

N.B. we use CLI calls under command scripting here rather than the graphing syntax so we'd tend to do:

command scripting = cylc suite-state <suite_id>
--max-polls=<MAX_POLLS> --interval=60 -c $(rose date -c -f <TARGET_SUITE_CYCLEPOINT_FORMAT>) -t <task> -S succeeded

@hjoliver
Copy link
Member Author

Strictly speaking that shouldn't be necessary.

Fair enough - if I know the target suite format it's just about (although not quite) as easy to transform the current cycle point to it, as it is to specify the format on the command line.

On reflection though I think cylc suite-state should not need to know the cycle point format of the target suite at all. I've had several users complain that their polling tasks are broken because of this, and they (quite reasonably!) don't understand why 20150208T00Z is not considered to be the same as 20150208T0000Z.

I guess the db query has to use a string literal cycle point, which means we need to be able to discover the format before doing the intended suite state query. @benfitzpatrick - can isodatetime infer the format from a cycle point string value? If so, could we infer the format by looking at an arbitrary single entry in the task_states table (i.e. no new static table required).

@hjoliver
Copy link
Member Author

N.B. we use CLI calls under command scripting here rather than the graphing syntax ...

That's what we're having to do too, at the moment.

@arjclark
Copy link
Contributor

I guess the db query has to use a string literal cycle point

@hjoliver - correct.

...which means we need to be able to discover the format before doing the intended suite state query

Yes, if you wanted to have cylc suite-state automatically determine the format to query in.

@arjclark arjclark assigned arjclark and unassigned benfitzpatrick May 20, 2015
@arjclark
Copy link
Contributor

@hjoliver - have been thinking about this one, in particular what to do when two suites don't run at the same resolution (e.g. one at sub-hourly polling a daily only suite). Does this problem usually occur when an ISO suite tries to poll a non-ISO one, or are you playing round with cycle point formats in the suites themselves?

@hjoliver
Copy link
Member Author

@arjclark - yeah we've mostly run into this problem when a newer suite needs to poll one still running with the legacy cycle point format. However, cylc-6 allows a wide variety of formats (particularly if you consider suites that don't need hour or day resolution) that I don't think this will be an uncommon experience even with strictly ISO formats.

@arjclark
Copy link
Contributor

@hjoliver - so have been wondering, if you have an upstream suite like this:

[cylc]
    cycle point format = CCYY-MM-DD
[scheduling]
    initial cycle point = 2010-01-01
    [[dependencies]]
        [[[P1D]]]
            graph = foo[-P1D] => foo

and a suite from which it is polled like this:

[cylc]
    UTC mode = True
[scheduling]
    initial cycle point = 20100101T0000Z
    [[dependencies]]
        [[[PT1H]]]
            graph = poller[-PT1H] => poller

how would you like the loss of precision to be handled? i.e. polling suite is cycling hourly whereas upstream suite is cycling daily - re-formatting of the cycle point from the polling suite format to the upstream suite's format would result in loss of the hourly information. Similarly, if the polling suite were daily and the upstream one hourly, what hour should the poll target?

If both suites contain the same units but in different orders then it's just a reformatting problem so no issue there.

Detecting the format of cycle points is straightforward enough using the isodatetime library, as is reformatting them - I've pretty much done this in #1477 anyway.

@arjclark
Copy link
Contributor

@hjoliver - any thoughts on the above problem?

@hjoliver
Copy link
Member Author

@arjclark - apologies, it seems this one slipped through the cracks 😬

We should probably handle the loss or gain of precision like for string format templates, i.e. simply don't print the extra digits, or add on the minimal extra components (00 hours on 01 Jan, etc.), e.g.:

$ cylc cyclepoint --template %Y-%m-%d 20100808T0000Z
2010-08-08
$cylc cyclepoint --template %Y-%m-%dT%H%M 2010
2010-01-01T0000

I think this would almost always be what's wanted, and if it's not I'm not sure we could automatically figure out what is wanted, so for anything else the user should do the required conversion and use the command line rather than an automatic polling task. Does that sound reasonable to you?

@arjclark
Copy link
Contributor

Sounds good.

Any thoughts on how to handle different time zones in this, or at that point do you just want it to bail out? (so Z vs Z suites will be fine but non-Z suites vs Z suites wont)

@hjoliver
Copy link
Member Author

Hmm, that's a complication. Can we detect time zone and do the conversion before truncation or extension?

@arjclark
Copy link
Contributor

My thought was to do something like:

# pseudocode
typical_pt = select max(CYCLE) from task_states in target_suite
target_format = detect_format(typical_pt)
converted_pt = reformat_pt(query_pt, target_format)

So potentially the detect and reformat steps could do that if needed. Will give it a go and keep you posted.

@benfitzpatrick
Copy link
Contributor

cylc cyclepoint (via isodatetime) can detect the format of a cycle point string if it's valid ISO 8601 or cylc legacy. There is code there to do that.

If they couldn't, you wouldn't always be able to get the output in the same format, like this:

$ cylc cyclepoint --offset=P1D 2014365
2015001

and

cylc cyclepoint --offset=P1D 2014123100
2015010100

@arjclark
Copy link
Contributor

So, why doesn't this

cylc cyclepoint 20101230T1600+05 --template %H --time-zone Z

give "11" as the answer?

@benfitzpatrick
Copy link
Contributor

The --time-zone is really an extension to the --template:

  --time-zone=TEMPLATE  Control the formatting of the result's timezone e.g.
                        (Z, +13:00, -hh

but I'm not sure when that would actually get used!

@arjclark arjclark modified the milestones: later, soon Jun 7, 2016
@hjoliver hjoliver modified the milestones: soon, later Jun 20, 2016
@hjoliver hjoliver assigned hjoliver and unassigned arjclark Jun 20, 2016
@hjoliver
Copy link
Member Author

[meeting] - this becomes a small issue if we store cycle point format in the suite db. We can do this after #1827 adds a new table for suite parameters.

@arjclark arjclark assigned arjclark and unassigned hjoliver Jan 20, 2017
@matthewrmshin matthewrmshin modified the milestones: next release, soon Jan 30, 2017
hjoliver added a commit that referenced this issue Feb 8, 2017
#1332: Implement auto use of cycle-point format in a remote suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants