Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up CI runtime #3189

Merged
merged 7 commits into from
Jan 4, 2021
Merged

Speed up CI runtime #3189

merged 7 commits into from
Jan 4, 2021

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Dec 18, 2020

This PR contains a number of improvements that aim to speed up the building and testing times of the project. None of the changes alter the quality and quantity of tests that we do (aka we don't modify batch sizes, vector sizes, iterations, number of tests etc). Further improvements can be achieved if we make such modifications on the slowest tests.

See sub-PRs for more details:

To properly estimate the CI cost savings and dev time improvements of all the above changes we will require doing multiple runs which is both impractical and costly. Instead the stats provided above come from a couple of runs and thus should be taken with a grain of salt. At any case to get a rough idea of the impact of the improvements:

  • CI Costs: Given that the test in the CI jobs run serially, with a "back-of-the-envelope" calculation (factoring in the tests across all platforms, python versions and devices) we save about 70 minutes of total execution time (see sub-PRs for more info). This directly affects the CI costs, though it's not easy to get a dollar figure due to differences in pricing on various instances.
  • Development time: Since the CI jobs across platforms run in parallel, to estimate the improvements on the waiting times for reviewing/merging PRs, we focus only on Windows GPU which is the slowest CI job. By comparing the before and after, we get a rough improvement of about 45% or 22 minutes less waiting.

- Cache metadata
- Increase number of workers
* Adding parallelism to cmake.

* Adding parallelism to msbuild.
* Move definition outside of nested loop to avoid repeated jit scripting.

* Fix potentially undefined var.
* Separating unrelated checks to avoid unnecessary repetition.

* Add cache on get_fn_args().
* Upgrade scipy on windows.

* Minor clean ups.
@datumbox
Copy link
Contributor Author

datumbox commented Dec 21, 2020

The failing tests are not related to this PR but just to be safe, I'll wait until the CI issues are resolved before merging.

@datumbox datumbox changed the title [WIP] Speed up CI runtime Speed up CI runtime Dec 21, 2020
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments :)
Nice description of the PR @datumbox !

@@ -496,6 +497,7 @@ def expected_fn(self, x, weight, offset, mask, bias, stride=1, padding=0, dilati
out += bias.view(1, n_out_channels, 1, 1)
return out

@lru_cache(maxsize=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deciding on caching random data is not that simple IMO. There are pros/cons of that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there are pros/cons. Note that this is something we do on tests only and seems to help us speed-wise. I don't have a strong opinion about keeping it if it causes issues.

@@ -1,3 +1,3 @@
@echo on
msbuild "-p:Configuration=Release" torchvision.vcxproj
msbuild "-p:Configuration=Release" INSTALL.vcxproj
msbuild "-p:Configuration=Release" "-p:BuildInParallel=true" "-p:MultiProcessorCompilation=true" "-p:CL_MPCount=%1" torchvision.vcxproj -maxcpucount:%1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here /MP is not the right option ? Funny that on windows we have to set 2 more flags...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand after reading the Microsoft docs, MultiProcessorCompilation == /MP. And there are many more flags, related to whether the parallelism is across projects or across C++ files. Frankly I can't say I'm fully convinced that this helped much. If we have anyone who is seasoned on working with msbuild, I would love to get their input.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

@datumbox datumbox merged commit 4d2d8bb into master Jan 4, 2021
@datumbox datumbox deleted the tests/speedup branch January 4, 2021 11:21
facebook-github-bot pushed a commit that referenced this pull request Jan 21, 2021
Summary:
* Speedup test_ucf101 (#2623

* Speedup Cmake builds (#3186)

* Speedup test_autoaugment (#3190)

* Speedup DeformConvTester (#3191)

* Speedup InceptionV3 and GoogleNet on Windows (#3196)

Reviewed By: datumbox

Differential Revision: D25954568

fbshipit-source-id: bdcea84b112a9343f27619aef7036369598f631e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants