-
Notifications
You must be signed in to change notification settings - Fork 332
[feature] (activation stats): ability to track the statistics of each layer in the model when training #264
[feature] (activation stats): ability to track the statistics of each layer in the model when training #264
Conversation
Hi @QuentinDuval , all the points you raised are great. I'd propose that we meet over VC to discuss these. certainly interested in the fixes, feel free to make PR :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a high level design overview of this and it looks great to me! :) early feedback but it is heading in the right direction :) great work @QuentinDuval :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great to me @QuentinDuval , no comments code wise. Next steps as we discussed , and then it's good to go :)
h2 = m.register_forward_hook(self._create_post_forward_hook(name)) | ||
self._hooks.extend([h1, h2]) | ||
|
||
def stop(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :)
@QuentinDuval has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super awesome work @QuentinDuval :)
-
quick clarification: leaf modules -> doesn't mean just head is tracked right? from the code, it seems like trunk+head both are tracked but figures in test plan are for the heads so just wanted to double check :)
-
we should insert the copyright headers everywhere -> blocker for landing this PR
Hi @prigoyal :) By leaf module, I meant modules that are not encapsulating other modules. For instance, Indeed, otherwise, all modules are being monitored if they are in training mode: we ignore modules that are not trained, like frozen modules when doing a linear evaluation, for there is not much to monitor in that case. (*) I think we can improve that in later PR: ignore modules who do not have parameters on their own but only parameters in their children, or simply hardcode what we want to ignore. |
@QuentinDuval has updated the pull request. You must reimport the pull request before landing. |
I added the copyrights in d85434b 👍 Good catch ! |
@QuentinDuval has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
… layer in the model when training
… layer in the model when training - greatly improving performance by sampling feature maps at the center (so that each parameter is used) and computing the maximum spread instead of the min and max
… layer in the model when training - decreasing GPU memory usage when using the "sample feature map" flag
… layer in the model when training - reset the _prev_module_name in post_forward_hook (avoid potential future bugs)
… layer in the model when training - renaming and documentation
… layer in the model when training - renaming and documentation
… layer in the model when training - renaming and documentation
… layer in the model when training - bug fixing: make stop idempotent
… layer in the model when training - add missing copyright header
d85434b
to
a626dd3
Compare
@QuentinDuval has updated the pull request. You must reimport the pull request before landing. |
@QuentinDuval has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@QuentinDuval merged this pull request in 4c245c8. |
Summary: Pull Request resolved: fairinternal/ssl_scaling#264 Reviewed By: mannatsingh Differential Revision: D35579626 Pulled By: QuentinDuval fbshipit-source-id: 42d25b576ed8451ddd6bc500fdb8dc39c072bb0e
Tracking activation statistics during training
This PR features a new monitoring utility, plugged in VISSL via the Tensorboard hook, which allows to capture the output of the "leaf modules" (not all modules) and compute mean and spread statistics on the output of each layer, and track them in tensorboard.
Output example:
Description
The configuration has been updated with the following option (by default disabled):
Turn on the option with the following hydra override:
config.MONITORING.MONITOR_ACTIVATION_STATISTICS=50
to gather the statistics on all activations every 50 iterations.NOTE: This option requires the tensorboard hook to be enabled to take effect.
Performance impacts: The collect of statistics makes use of the following optimisation: for feature maps, we only compute the statistics on the central feature, which requires less compute and memory than on the full feature map and still exercises all the weights of the BN or Conv2D layer. With this optimisation, the impact in terms of memory is negligible and the impact in terms of runtime is about 1.4% on SimCLR, when the frequency of collection is every 50 iterations.
Further improvements