Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

id: pass tokens objects between interfaces #5328

Merged
merged 2 commits into from
Mar 8, 2023

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented Jan 25, 2023

Note: Built & tested on top of 8.1.x, the additional commits will disappear once #5327 is merged.

  • Change TaskProxy.tokens to hold the absolute ID rather than the
    relative ID to make the object useful in more situations.
  • Refactor the data_store_mgr interfaces to accept Tokens instances
    rather than raw inputs (e.g. cycle_point, task_name, etc).
  • This avoids doing Tokens(str(tokens)) when passing context into the
    data store interfaces.

Tested using this simple workflow which evaluates the scaling performance of internal-queues/job-submission/management against the number of tasks (note there are no dependencies or cycles to negotiate):

#!Jinja2

[meta]
    description = """
        A workflow to test the max throughput of Cylc measured as
        task submissions per second.
    """

[task parameters]
    x = 1..{{ TASKS | default(1000) }}

[scheduling]
    [[queues]]
        [[[default]]]
            limit = {{ LIMIT | default(1000) }}
    [[graph]]
        R1 = <x>

[runtime]
    [[<x>]]

With -s TASKS=100 -s LIMIT=100:

Before: 9.2s
After: 6.56s
Saving: ~28%

Note: Profiled using /usr/bin/time so this is an outer measurement including the overheads of Python and the Cylc codebase.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@oliver-sanders oliver-sanders added the efficiency For notable efficiency improvements label Jan 25, 2023
@oliver-sanders oliver-sanders added this to the cylc-8.2.0 milestone Jan 25, 2023
@oliver-sanders oliver-sanders self-assigned this Jan 25, 2023
@oliver-sanders oliver-sanders force-pushed the data-store-opt branch 2 times, most recently from 1b5518e to abd7cd1 Compare February 6, 2023 11:05
* Change `TaskProxy.tokens` to hold the absolute ID rather than the
  relative ID to make the object useful in more situations.
* Refactor the data_store_mgr interfaces to accept `Tokens` instances
  rather than raw inputs (e.g. cycle_point, task_name, etc).
* This avoids doing `Tokens(str(tokens))` when passing context into the
  data store interfaces.
Copy link
Member

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Comment on lines -2214 to +2236
node_type = TASKS
elif sub_num is None:
node_id = self.id_.duplicate(
cycle=str(point),
task=name,
).id
else:
node_id = self.id_.duplicate(
cycle=str(point),
task=name,
job=str(sub_num),
).id
node_type = JOBS
node_type = {
'task': TASK_PROXIES,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this was TASKS, just want to make sure this is correct

Copy link
Member Author

@oliver-sanders oliver-sanders Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had three branches for TASK_PROXIES (kwarg default), TASKS and JOBS.

The TASKS branch wasn't used so I dropped it (all store_node_fetcher calls provided the point argument).

@MetRonnie MetRonnie merged commit 5a752f4 into cylc:master Mar 8, 2023
@oliver-sanders oliver-sanders deleted the data-store-opt branch March 8, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants