Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching job-submit tasks to avoid blocking Stdout/err pipes #2870

Merged
merged 1 commit into from
Nov 24, 2018

Conversation

wxtim
Copy link
Member

@wxtim wxtim commented Nov 19, 2018

Address part of #2857
Temporary PR to check fix on codacy.

@matthewrmshin matthewrmshin added this to the next release milestone Nov 19, 2018
@matthewrmshin matthewrmshin added the bug Something is wrong :( label Nov 19, 2018
@matthewrmshin matthewrmshin modified the milestones: next release, soon Nov 19, 2018
@wxtim wxtim force-pushed the T002B_600_tasks_cause_problems branch 2 times, most recently from 196b987 to a6ee1d6 Compare November 19, 2018 14:58
@wxtim
Copy link
Member Author

wxtim commented Nov 19, 2018

  1. Does anyone have any thoughts on how (or whether) to add testing for this bug and solution?
  2. I created a new function as a sub-function of taskJobManager.submit_task_jobs called list_chopper. I'm not sure that it belongs here - is there a general utils module somewhere in the project?

@hjoliver
Copy link
Member

Temporary PR to check fix on codacy.

Note you can get codacy to analyze your own fork. Also fine to put up a "work in progress" PR though (which we seem to be labeling with "WIP" now, following @kinow).

@hjoliver
Copy link
Member

Does anyone have any thoughts on how (or whether) to add testing for this bug and solution?

Maybe add some debug output on the number of jobs submitted, e.g. "50 jobs submitted, batch 1/4", and search for that in test suite output.

Maybe also make batch size a site config item, so we can scale it down for test suites.

@hjoliver
Copy link
Member

I created a new function as a sub-function ... is there a general utils module somewhere in the project?

No, but there's several more specific utility modules (with "util" in their filenames). We don't generally bother unless/until functions are used in multiple places.

@kinow
Copy link
Member

kinow commented Nov 20, 2018

(@hjoliver on WIP prefix, in GitLab WIP in the title locks the pull request (called merge request there). Quite handy, as one cannot merge by accident. Others have submitted a feature request already)

@hjoliver
Copy link
Member

(@kinow - that is a cool feature of GitLab).

@wxtim wxtim closed this Nov 20, 2018
@wxtim wxtim force-pushed the T002B_600_tasks_cause_problems branch from de128a2 to e6ab1d3 Compare November 20, 2018 08:01
@wxtim
Copy link
Member Author

wxtim commented Nov 20, 2018

Does anyone have any thoughts on how (or whether) to add testing for this bug and solution?

Maybe add some debug output on the number of jobs submitted, e.g. "50 jobs submitted, batch 1/4", and search for that in test suite output.

Maybe also make batch size a site config item, so we can scale it down for test suites.

We had a discussion about this: I wanted to make it a config item but other people were not so sure it was necessary.

@wxtim wxtim reopened this Nov 20, 2018
@wxtim
Copy link
Member Author

wxtim commented Nov 20, 2018

I created a new function as a sub-function ... is there a general utils module somewhere in the project?

No, but there's several more specific utility modules (with "util" in their filenames). We don't generally bother unless/until functions are used in multiple places.

I pretty much got this response from @oliver-sanders review - alongside the suggestion that I collapse it into a generator comprehension for brevity.

@wxtim wxtim force-pushed the T002B_600_tasks_cause_problems branch from abdfe87 to 3390bad Compare November 21, 2018 09:50
@wxtim
Copy link
Member Author

wxtim commented Nov 21, 2018

Current State of Play with pull request:

  1. I've moved the sorting of the itasks list outside the part of the code I've changed so that the list gets sorted before being chopped up.
  2. I've changed the generator comprehension back into a list comprehension to facilitate using len(itasks_batchs).
  3. I've added a message to the LOG for debug mode that can be used by a test to check that the chunking has worked correctly.
  4. I've changed the size of the chunks back to 100. I have no idea why I changed it to 50.
  5. I've added a simple test designed to ensure that the batching is done correctly. Thanks to @matthewrmshin for the suite used to test the issue.

I haven't made the batch size a config item. The consensus here seems to be that it isn't necessary. In addition, my experience of writing the test suggests that it may be a bad idea® to allow this to be set. Setting it very low can cause large overheads in terms of numbers of processes spawned.

lib/cylc/task_job_mgr.py Outdated Show resolved Hide resolved
@wxtim wxtim changed the title T002 b 600 tasks cause problems Batching job-submit tasks to avoid blocking Stdout/err pipes Nov 21, 2018
@wxtim
Copy link
Member Author

wxtim commented Nov 22, 2018

@matthewrmshin - If you are now happy with this I'll squash it down.

Copy link
Contributor

@matthewrmshin matthewrmshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK. Please remove the debug echo from the test.

@matthewrmshin matthewrmshin modified the milestones: soon, next release Nov 22, 2018
fixed the sorting of itasks so that it still works

added a debug message for testing

added a working test for debug output

added a test

made the chunks roughly even sizes and smaller than 100.

squashlater:removed lines adding two lists that can be added in situ where they are used

squashlater:move the debug msg outside for batch in batches loop
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good here.

@hjoliver hjoliver merged commit 89d09c7 into cylc:master Nov 24, 2018
@wxtim wxtim deleted the T002B_600_tasks_cause_problems branch November 26, 2018 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants