Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[trainer / deepspeed] fix hyperparameter_search #16740

Merged
merged 7 commits into from
Apr 15, 2022

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Apr 13, 2022

This PR:

  • fixes hyperparameter_search deepspeed config reset fix up that got out of sync with the normal code path
  • adds a test so that will not happen in the future.
  • adds a new group of pip deps: deepspeed-testing

@sgugger

Fixes: #11966 (comment)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 13, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@@ -363,6 +363,32 @@ def test_stage3_nvme_offload(self):
trainer.train()
self.assertIn("DeepSpeed info", cl.out, "expected DeepSpeed logger output but got none")

def test_hyperparameter_search(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this test require one of the HP search lib?

Copy link
Contributor Author

@stas00 stas00 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, I didn't need to install anything, but it doesn't mean something else didn't install what was required:

would this work?

diff --git a/setup.py b/setup.py
index 4d386ae00..a7bf8449f 100644
--- a/setup.py
+++ b/setup.py
@@ -288,6 +288,7 @@ extras["testing"] = (
     )
     + extras["retrieval"]
     + extras["modelcreation"]
+    + extras["integrations"]
 )

 extras["quality"] = deps_list("black", "isort", "flake8", "GitPython", "hf-doc-builder")

which would add:

extras["integrations"] = extras["optuna"] + extras["ray"] + extras["sigopt"]

to testing and thus have all the deps in place.

if we don't automate this and there is a skip rule then the test won't run which is an issue too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't want to run the HP search tests at each PRs, so they shouldn't be in "testing". They are manually installed on one of the scheduled jobs. There are already require_optuna/require_ray/require_sigopt flags you can reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you, but what's the point of having a test if it is not going to be run since its prerequisites won't be installed on CI?

What am I missing?

Copy link
Contributor Author

@stas00 stas00 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see, I have to add the specific prerequisite install just for deepspeed tests. got it.

So just need to figure out which is it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have more than 10,000 tests and they don't run at each commit. That doesn't prevent us from adding more tests.

The regular CI job as well as the dev install do not install any of the integrations (wandb, neptuneAi, MThe regular CI job as well as the dev install do not install any of the integrations (wandb, neptuneAi, MLFlow, cometML, Ray, Optuna, Sigopt, and I probably forgot some) to avoid cluttering the testing env. I think you were very unhappy to have some of those installed in your envs for instance ;-)

Also, the maintenance of those integrations is not our responsibility but the one of each of those libraries maintainers. Running the corresponding tests as scheduled is enough to see if we accidentally broke something.

Copy link
Contributor Author

@stas00 stas00 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this makes total sense, Sylvain.

I figured out what you meant in the first place!

I wonder if it'd be easier to have for each "integration group" 2 targets:

pip install -e .[deepspeed]
pip install -e .[deepspeed-testing]

the first installing the deps for the integration to work, the second installing the deps for the tests for that integration to work. e.g. deepspeed doesn't care for optuna, but its tests do.

that way it'd be much simpler to manage CI configs as currently they are plastered with a lot of copy-n-paste and need to remember to update it in multiple workflow files.

so when adding new testing dependencies only setup.py will need to be updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add that deepspeed-testing group, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, deepspeed-testing group pushed, please have a look before I merge. Thank you!

@stas00 stas00 merged commit ce2fef2 into huggingface:main Apr 15, 2022
@stas00 stas00 deleted the ds-trainer-optuna branch April 15, 2022 00:24
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* [trainer / deepspeed] fix hyperparameter_search

* require optuna

* style

* oops

* add dep in the right place

* create deepspeed-testing dep group

* Trigger CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants