[feature] (activation stats): ability to track the statistics of each layer in the model when training #264

QuentinDuval · 2021-03-30T19:15:17Z

Tracking activation statistics during training

This PR features a new monitoring utility, plugged in VISSL via the Tensorboard hook, which allows to capture the output of the "leaf modules" (not all modules) and compute mean and spread statistics on the output of each layer, and track them in tensorboard.

@QuentinDuval to re-run performance tests after review comments and refactoring on SwAV
@QuentinDuval to try to add some quick performance tests in the unit tests

Output example:

Description

The configuration has been updated with the following option (by default disabled):

  MONITORING:
    # At which frequency do we monitor statistics on the activations:
    # - 0 means that we do not monitor statistics
    # - N > 0 means we monitor every N iterations
    MONITOR_ACTIVATION_STATISTICS: 0

Turn on the option with the following hydra override: config.MONITORING.MONITOR_ACTIVATION_STATISTICS=50 to gather the statistics on all activations every 50 iterations.

NOTE: This option requires the tensorboard hook to be enabled to take effect.

Performance impacts: The collect of statistics makes use of the following optimisation: for feature maps, we only compute the statistics on the central feature, which requires less compute and memory than on the full feature map and still exercises all the weights of the BN or Conv2D layer. With this optimisation, the impact in terms of memory is negligible and the impact in terms of runtime is about 1.4% on SimCLR, when the frequency of collection is every 50 iterations.

Further improvements

Support for other backend than just Tensorboard: ideally, we should be able to dump raw data / images
Profile what takes time when computing statistics and optimise it further if possible
Adding an "alerting" feature which varies the speed of tracking based on detected divergence

prigoyal · 2021-03-31T12:40:50Z

Hi @QuentinDuval , all the points you raised are great. I'd propose that we meet over VC to discuss these.

certainly interested in the fixes, feel free to make PR :)

prigoyal

I did a high level design overview of this and it looks great to me! :) early feedback but it is heading in the right direction :) great work @QuentinDuval :)

vissl/config/defaults.yaml

prigoyal

looks great to me @QuentinDuval , no comments code wise. Next steps as we discussed , and then it's good to go :)

tests/test_activation_statistics.py

prigoyal · 2021-04-06T22:54:15Z

vissl/utils/activation_statistics.py

+                h2 = m.register_forward_hook(self._create_post_forward_hook(name))
+                self._hooks.extend([h1, h2])
+
+    def stop(self):


facebook-github-bot · 2021-04-07T14:07:44Z

@QuentinDuval has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

prigoyal

this is super awesome work @QuentinDuval :)

quick clarification: leaf modules -> doesn't mean just head is tracked right? from the code, it seems like trunk+head both are tracked but figures in test plan are for the heads so just wanted to double check :)
we should insert the copyright headers everywhere -> blocker for landing this PR

QuentinDuval · 2021-04-07T16:55:42Z

this is super awesome work @QuentinDuval :)

quick clarification: leaf modules -> doesn't mean just head is tracked right? from the code, it seems like trunk+head both are tracked but figures in test plan are for the heads so just wanted to double check :)

we should insert the copyright headers everywhere -> blocker for landing this PR

Hi @prigoyal :)

By leaf module, I meant modules that are not encapsulating other modules. For instance, nn.Sequential(nn.Linear(...), nn.ReLU(...)): the leaf modules are nn.Linear() and nn.ReLU() and we ignore nn.Sequential. This also means we will ignore things like Bottleneck (which only contains modules) but might also ignore some interesting modules now that I think about it (*).

Indeed, otherwise, all modules are being monitored if they are in training mode: we ignore modules that are not trained, like frozen modules when doing a linear evaluation, for there is not much to monitor in that case.

(*) I think we can improve that in later PR: ignore modules who do not have parameters on their own but only parameters in their children, or simply hardcode what we want to ignore.

facebook-github-bot · 2021-04-07T17:04:32Z

@QuentinDuval has updated the pull request. You must reimport the pull request before landing.

QuentinDuval · 2021-04-07T17:05:24Z

this is super awesome work @QuentinDuval :)

quick clarification: leaf modules -> doesn't mean just head is tracked right? from the code, it seems like trunk+head both are tracked but figures in test plan are for the heads so just wanted to double check :)

we should insert the copyright headers everywhere -> blocker for landing this PR

I added the copyrights in d85434b 👍 Good catch !

facebook-github-bot · 2021-04-07T17:05:37Z

@QuentinDuval has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

… layer in the model when training

… layer in the model when training - greatly improving performance by sampling feature maps at the center (so that each parameter is used) and computing the maximum spread instead of the min and max

… layer in the model when training - decreasing GPU memory usage when using the "sample feature map" flag

… layer in the model when training - reset the _prev_module_name in post_forward_hook (avoid potential future bugs)

… layer in the model when training - renaming and documentation

… layer in the model when training - bug fixing: make stop idempotent

… layer in the model when training - add missing copyright header

facebook-github-bot · 2021-04-07T17:54:43Z

@QuentinDuval has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-04-07T17:55:31Z

@QuentinDuval has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-04-07T20:15:04Z

@QuentinDuval merged this pull request in 4c245c8.

Summary: Pull Request resolved: fairinternal/ssl_scaling#264 Reviewed By: mannatsingh Differential Revision: D35579626 Pulled By: QuentinDuval fbshipit-source-id: 42d25b576ed8451ddd6bc500fdb8dc39c072bb0e

QuentinDuval requested a review from prigoyal March 30, 2021 19:15

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 30, 2021

QuentinDuval mentioned this pull request Apr 1, 2021

[Proposal] TensorBoard fixes to discuss #266

Closed

prigoyal reviewed Apr 1, 2021

View reviewed changes

vissl/config/defaults.yaml Show resolved Hide resolved

prigoyal reviewed Apr 6, 2021

View reviewed changes

prigoyal reviewed Apr 7, 2021

View reviewed changes

QuentinDuval added 9 commits April 7, 2021 13:52

[feature] (activation stats): ability to track the statistics of each…

14cee3e

… layer in the model when training

[feature] (activation stats): ability to track the statistics of each…

b14b402

… layer in the model when training - greatly improving performance by sampling feature maps at the center (so that each parameter is used) and computing the maximum spread instead of the min and max

[feature] (activation stats): ability to track the statistics of each…

fb99345

… layer in the model when training - decreasing GPU memory usage when using the "sample feature map" flag

[feature] (activation stats): ability to track the statistics of each…

54c45d6

… layer in the model when training - reset the _prev_module_name in post_forward_hook (avoid potential future bugs)

[feature] (activation stats): ability to track the statistics of each…

2e926a4

… layer in the model when training - renaming and documentation

[feature] (activation stats): ability to track the statistics of each…

c6b6ade

… layer in the model when training - renaming and documentation

[feature] (activation stats): ability to track the statistics of each…

e3e8e52

… layer in the model when training - renaming and documentation

[feature] (activation stats): ability to track the statistics of each…

87478dc

… layer in the model when training - bug fixing: make stop idempotent

[feature] (activation stats): ability to track the statistics of each…

a626dd3

… layer in the model when training - add missing copyright header

QuentinDuval force-pushed the activation_stats branch from d85434b to a626dd3 Compare April 7, 2021 17:54

facebook-github-bot closed this Apr 7, 2021

facebook-github-bot closed this in 4c245c8 Apr 7, 2021

facebook-github-bot added the Merged label Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] (activation stats): ability to track the statistics of each layer in the model when training #264

[feature] (activation stats): ability to track the statistics of each layer in the model when training #264

QuentinDuval commented Mar 30, 2021 •

edited

Loading

prigoyal commented Mar 31, 2021

prigoyal left a comment

prigoyal left a comment

prigoyal Apr 6, 2021

facebook-github-bot commented Apr 7, 2021

prigoyal left a comment •

edited

Loading

QuentinDuval commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

QuentinDuval commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

[feature] (activation stats): ability to track the statistics of each layer in the model when training #264

[feature] (activation stats): ability to track the statistics of each layer in the model when training #264

Conversation

QuentinDuval commented Mar 30, 2021 • edited Loading

Tracking activation statistics during training

Output example:

Description

Further improvements

prigoyal commented Mar 31, 2021

prigoyal left a comment

Choose a reason for hiding this comment

prigoyal left a comment

Choose a reason for hiding this comment

prigoyal Apr 6, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Apr 7, 2021

prigoyal left a comment • edited Loading

Choose a reason for hiding this comment

QuentinDuval commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

QuentinDuval commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

facebook-github-bot commented Apr 7, 2021

QuentinDuval commented Mar 30, 2021 •

edited

Loading

prigoyal left a comment •

edited

Loading