Speed up CI runtime #3189

datumbox · 2020-12-18T18:08:06Z

This PR contains a number of improvements that aim to speed up the building and testing times of the project. None of the changes alter the quality and quantity of tests that we do (aka we don't modify batch sizes, vector sizes, iterations, number of tests etc). Further improvements can be achieved if we make such modifications on the slowest tests.

See sub-PRs for more details:

Speedup test_ucf101 (Speedup test_ucf101 #3187): 40% improvement
Speedup Cmake builds (Speedup Cmake builds #3186): 15-50% improvement depending on platform
Speedup test_autoaugment (Speedup test_autoaugment #3190): 55% improvement
Speedup DeformConvTester (Speedup DeformConvTester #3191): 7-12% improvement depending on the platform
Speedup InceptionV3 and GoogleNet on Windows (Speedup InceptionV3 and GoogleNet tests on windows (upgrade scipy) #3196): 96% improvement

To properly estimate the CI cost savings and dev time improvements of all the above changes we will require doing multiple runs which is both impractical and costly. Instead the stats provided above come from a couple of runs and thus should be taken with a grain of salt. At any case to get a rough idea of the impact of the improvements:

CI Costs: Given that the test in the CI jobs run serially, with a "back-of-the-envelope" calculation (factoring in the tests across all platforms, python versions and devices) we save about 70 minutes of total execution time (see sub-PRs for more info). This directly affects the CI costs, though it's not easy to get a dollar figure due to differences in pricing on various instances.
Development time: Since the CI jobs across platforms run in parallel, to estimate the improvements on the waiting times for reviewing/merging PRs, we focus only on Windows GPU which is the slowest CI job. By comparing the before and after, we get a rough improvement of about 45% or 22 minutes less waiting.

- Cache metadata - Increase number of workers

* Adding parallelism to cmake. * Adding parallelism to msbuild.

* Move definition outside of nested loop to avoid repeated jit scripting. * Fix potentially undefined var.

* Separating unrelated checks to avoid unnecessary repetition. * Add cache on get_fn_args().

* Upgrade scipy on windows. * Minor clean ups.

datumbox · 2020-12-21T13:24:49Z

The failing tests are not related to this PR but just to be safe, I'll wait until the CI issues are resolved before merging.

vfdev-5

Few comments :)
Nice description of the PR @datumbox !

vfdev-5 · 2020-12-21T15:18:04Z

test/test_ops.py

@@ -496,6 +497,7 @@ def expected_fn(self, x, weight, offset, mask, bias, stride=1, padding=0, dilati
        out += bias.view(1, n_out_channels, 1, 1)
        return out

+    @lru_cache(maxsize=None)


Deciding on caching random data is not that simple IMO. There are pros/cons of that...

I agree there are pros/cons. Note that this is something we do on tests only and seems to help us speed-wise. I don't have a strong opinion about keeping it if it causes issues.

vfdev-5 · 2020-12-21T15:24:40Z

packaging/windows/internal/build_cmake.bat

@@ -1,3 +1,3 @@
 @echo on
-msbuild "-p:Configuration=Release" torchvision.vcxproj
-msbuild "-p:Configuration=Release" INSTALL.vcxproj
+msbuild "-p:Configuration=Release" "-p:BuildInParallel=true" "-p:MultiProcessorCompilation=true" "-p:CL_MPCount=%1" torchvision.vcxproj -maxcpucount:%1


So, here /MP is not the right option ? Funny that on windows we have to set 2 more flags...

As far as I understand after reading the Microsoft docs, MultiProcessorCompilation == /MP. And there are many more flags, related to whether the parallelism is across projects or across C++ files. Frankly I can't say I'm fully convinced that this helped much. If we have anyone who is seasoned on working with msbuild, I would love to get their input.

fmassa

LGTM, thanks a lot!

Summary: * Speedup test_ucf101 (#2623 * Speedup Cmake builds (#3186) * Speedup test_autoaugment (#3190) * Speedup DeformConvTester (#3191) * Speedup InceptionV3 and GoogleNet on Windows (#3196) Reviewed By: datumbox Differential Revision: D25954568 fbshipit-source-id: bdcea84b112a9343f27619aef7036369598f631e

datumbox added 2 commits December 18, 2020 18:00

Speedup test_ucf101 (#3187)

1a207e4

- Cache metadata - Increase number of workers

Speedup Cmake builds (#3186)

c95d841

* Adding parallelism to cmake. * Adding parallelism to msbuild.

facebook-github-bot added the cla signed label Dec 18, 2020

datumbox added 4 commits December 18, 2020 19:17

Speedup test_autoaugment (#3190)

ed0f98f

* Move definition outside of nested loop to avoid repeated jit scripting. * Fix potentially undefined var.

Speedup DeformConvTester (#3191)

e96bc8d

* Separating unrelated checks to avoid unnecessary repetition. * Add cache on get_fn_args().

Speedup InceptionV3 and GoogleNet on Windows (#3196)

7a028dd

* Upgrade scipy on windows. * Minor clean ups.

Merge branch 'master' into tests/speedup

1f037ac

datumbox changed the title ~~[WIP] Speed up CI runtime~~ Speed up CI runtime Dec 21, 2020

datumbox requested review from fmassa and vfdev-5 December 21, 2020 13:25

vfdev-5 reviewed Dec 21, 2020

View reviewed changes

fmassa approved these changes Jan 4, 2021

View reviewed changes

Merge branch 'master' into tests/speedup

974fb35

datumbox merged commit 4d2d8bb into master Jan 4, 2021

datumbox deleted the tests/speedup branch January 4, 2021 11:21

datumbox mentioned this pull request May 17, 2021

[NOMRG] Investigate kp regression #3847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up CI runtime #3189

Speed up CI runtime #3189

datumbox commented Dec 18, 2020 •

edited

Loading

datumbox commented Dec 21, 2020 •

edited

Loading

vfdev-5 left a comment

vfdev-5 Dec 21, 2020

datumbox Dec 21, 2020

vfdev-5 Dec 21, 2020

datumbox Dec 21, 2020

fmassa left a comment

Speed up CI runtime #3189

Speed up CI runtime #3189

Conversation

datumbox commented Dec 18, 2020 • edited Loading

datumbox commented Dec 21, 2020 • edited Loading

vfdev-5 left a comment

Choose a reason for hiding this comment

vfdev-5 Dec 21, 2020

Choose a reason for hiding this comment

datumbox Dec 21, 2020

Choose a reason for hiding this comment

vfdev-5 Dec 21, 2020

Choose a reason for hiding this comment

datumbox Dec 21, 2020

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

datumbox commented Dec 18, 2020 •

edited

Loading

datumbox commented Dec 21, 2020 •

edited

Loading