Vision transformer #100

growlix · 2020-12-03T20:56:56Z

Vision transformer trunk and head, and functionality to run on slurm/FAIR cluster.

fbshipit-source-id: bad295c2b54e5d8258176d45951637725dd771bf

Summary: Bringing fully parity with the torchvision models. A user can now use any torchvision resnet model and all they have to do is append `trunk.base_model._feature_blocks` from the config file and that's it Reviewed By: imisra Differential Revision: D22116278 fbshipit-source-id: e8eafd4e0de61351956608b1b02096a81ebfaa9e

Summary: properly document and cleanup the logic for loading a model file specified by user. Now the logic is deduped and robust (throws errors and checks for compatibility) 1. remove the need of `LOAD_TRUNK_AND_HEADS` . this wasn't needed and could be confusing 2. if vissl compatible model is passed, we automatically figure out what prefixes should be appended to the model. No need for user to pass arguments like `APPEND_PREFIX` 3. if non-vissl compatible model is passed, the model compatibility is checked and suggestion is given to user on how to make it compaible. 3. rename `APPEND_SUFFIX` -> `APPEND_PREFIX` and `REMOVE_SUFFIX` -> `REMOVE_PREFIX` 4. add `check_model_compatibility` to make sure that the model is compatible to load. previously if the checkpoint is not compatible, it would not load the layers and would also NOT fail. now it will fail. 5. combine some of the model checkpoint logic from `base_ssl_model` and `checkpoint`. all centralized in latter. Reviewed By: blefaudeux Differential Revision: D22164172 fbshipit-source-id: 4c45682f9da679df0c91ee5b301e3551c9e7d349

#36) Summary: … my FAIR cluster devfairs Pull Request resolved: fairinternal/ssl_scaling#36 Reviewed By: blefaudeux Differential Revision: D22164969 Pulled By: prigoyal fbshipit-source-id: 1ccc44af359fcf06f174f24a20a7d277153bd65b

Summary: renaming from ssl_framework_plugin to vissl_plugin Reviewed By: blefaudeux Differential Revision: D22165297 fbshipit-source-id: 0973b007a1504d55b3bd098824888640397e4cb9

Summary: for the console handlers, the logging level was set as info which means it would skip the debug messages. settting the correct level now Reviewed By: blefaudeux Differential Revision: D22165926 fbshipit-source-id: 4246c7592e95eb7b319cfabdc23c8442451f4bd1

Summary: - Trunk declaration around ModuleDict: makes it trivial to index the features you want to pull, makes sure that names and modules are in sync, by design, and makes it possible to have the same forward for most trunks. - Tentatively fix EfficientNet, which I believe was buggy around the drop connection rate, and refactor a bit - Simplify and try to make the code easier Reviewed By: prigoyal Differential Revision: D22078597 fbshipit-source-id: 26d3e50469107a53cb3cb597d9d16eb59cbe51ec

Summary: Pull Request resolved: fairinternal/ssl_scaling#37 Up to now these configs would not actually run if the machine where they were scheduled already went through a test and were not wiped out Reviewed By: prigoyal Differential Revision: D22169259 fbshipit-source-id: 1ece8ac5a2c4f866f54eb7f5d97728ab5e3a365b

Summary: logging the metrics to a `metric.json` file. also used the opportunity to rename `tasks` folder to `ssl_tasks` and extract the `accuracy_list_meter.py` from the `__init__.py` for better code readability Reviewed By: blefaudeux Differential Revision: D22166885 fbshipit-source-id: d52728616a2f54223994d64267b4b9d0017d33cb

Summary: I noticed in the code that at several places we use local rank and get it from the env. given this is a helpful thing, I am creating a common utility function Reviewed By: blefaudeux Differential Revision: D22170823 fbshipit-source-id: 1da72d69235ac6da35287797eb50f026540e730c

Summary: Pull Request resolved: fairinternal/ssl_scaling#40 Unblocking master, moving all syncBNs to pytorch until we properly solve that Reviewed By: mannatsingh Differential Revision: D22196409 fbshipit-source-id: 0cae37bae5efcca5ffc17f9ca9d7982d2a3f0e55

Summary: while looking at the circleCI setup, I ran into the getting the dataset to run integration tests. The problem is easily solvable by adding a synthetic dataset class which is very minimal and returns a mean image. the dataset size is set to 500 max by default and user can control it (increase) from the yaml config Reviewed By: mannatsingh Differential Revision: D22186576 fbshipit-source-id: 06a1562abf2d7f1849ff2d83d3b9cc2849641205

Summary: Pull Request resolved: fairinternal/ssl_scaling#41 Unit testing some losses, would need more coverage but that's a start. Just checking that types and dimensions are correct, not a correctness check Reviewed By: prigoyal Differential Revision: D22198026 fbshipit-source-id: 161c3ce6948b06736056c69096f064b14ffdf470

…#2) Summary: adding all sorts of coding quality standards: isort, flake, black, pre-commit check and some improvements to setup.py including versioning, requirements.txt etc Pull Request resolved: #2 Reviewed By: mannatsingh Differential Revision: D22190024 Pulled By: prigoyal fbshipit-source-id: 58f8ee7c59c821272a89febf436e1bae35841832

Summary: Pull Request resolved: fairinternal/ssl_scaling#42 Pull Request resolved: #4 removed the third-party completely for classy vision, that;s in requirements.txt now for apex, we simplified install instructions with a tarball to pin to a specific version Reviewed By: mannatsingh Differential Revision: D22213297 fbshipit-source-id: 374174fae6ff91aad6f18af2c4557c6b7a157ef6

Re-sync with internal repository

…to fb (#5) Summary: Pull Request resolved: #5 1. moving regnet files to fb specific folder as the tests on github will fail since regnet is not OSS yet 2. for unit test, use non internal hydra function. make it work without hydra plugin 3. in test tasks, test the actual lib vissl and not the distributed_train which is a binary 4. re-organized test files under config/test folder for clarity 5. small fix to swav loss - wasn't working on gpus anymore Reviewed By: mannatsingh Differential Revision: D22219028 fbshipit-source-id: 94671e18b3adbb6f983284b03e3db60692f2813e

Summary: Rename img pil enhancements. Add docstrings to the image transforms. Reviewed By: prigoyal Differential Revision: D22222850 fbshipit-source-id: 08e6d33fb398b9080a5deefbccf0ca87286463d2

Summary: Pull Request resolved: #7 setting up the docker file so we can have the proper environment and also provide helpful scripts to use otherwise like conda install, etc. Reviewed By: mannatsingh Differential Revision: D22233535 fbshipit-source-id: 8aa06a5586ca49c4c61fd3a404daa7a6c3fec836

Summary: Pull Request resolved: fairinternal/ssl_scaling#43 Pull Request resolved: #8 setting up the config for circle ci testing - cpu and gpu tests both also had to make some changes to make pre-commit-hook compatible and working nicely Reviewed By: mannatsingh Differential Revision: D22257436 fbshipit-source-id: 56d952c014885450dde8ee8c3f4a1292746a4328

Summary: - adding more gpu tests to run on CI. since the CI machines have only 8GB gpu memory (I tried getting access to the large machine in the circle ci set but it didn't work (still got only 8gb). in the meantime, it's okay for us to run on smaller batch size per replica in gpu test since we are not checking correctness. - also disable the complexity for pirl since the model has multi-input but clarry vision api supports 1 input only cc mannatsingh - also one small fix in the deepclusterv2 loss in the logging function Reviewed By: mannatsingh Differential Revision: D22266538 fbshipit-source-id: a671f10fe0b71b5d84aa44aafe88f9fef7bcfdb9

Summary: hydra plugin isn't needed anywhere (fbcode/github) so removing it Reviewed By: mannatsingh Differential Revision: D22264013 fbshipit-source-id: fafcc23fd994af855a8fcdbff768666476db781a

Summary: fixing usage of hydra.experimental after Hydra update Reviewed By: jieru-hu Differential Revision: D22264458 fbshipit-source-id: 4f42a555e9385c72b428c7e4481a45e255583d3e

Summary: Pull Request resolved: #10 tracking the hydra1.0 branch on github as per recommendation from omry Reviewed By: mannatsingh Differential Revision: D22268778 fbshipit-source-id: f9b1e976c157d5ee3e5646154b4930c6569a5c21

…#12) Summary: Pull Request resolved: #12 moved the gpu tests to the test script so we don't need to change the circleci config file everytime Reviewed By: blefaudeux Differential Revision: D22284063 fbshipit-source-id: fe925a842b1175bf27b9d82988202053cdba6b3b

Summary: Pull Request resolved: #9 Pull Request resolved: fairinternal/ssl_scaling#39 This unit test includes a couple of FW passes, I feel that's important to catch errors in an easier fashion than integration test - enforce the task un even if checkpoints - add a resnet trunk test task - add an efficientnet trunk task - switch off the complexity computation for EfficientNet, until this is fixed Reviewed By: prigoyal Differential Revision: D22193592 fbshipit-source-id: d29c797c029dd027ddff9c01c8ab9fe07483a3f1

Summary: Pull Request resolved: #13 conda packaging vissl for various cuda, pytorch, python versions - cuda: 9.1, 10.0, 10.1, 10.2 - pytorch: 1.4 , 1.5 - python: 3.6, 3.7, 3.8 Reviewed By: blefaudeux Differential Revision: D22286187 fbshipit-source-id: efe7d4f4a9805f1eb9af92a9c8facfa410c53d5a

Summary: Pull Request resolved: #16 - The DiskImageDataset can now use the labels that are computed by the torchvision ImageFolder dataset. - The DiskImageDataset accepts a `root_dir` argument which makes it so that the image paths used in `npy` files can be relative paths. Reviewed By: prigoyal Differential Revision: D22259478 fbshipit-source-id: 34373e02661903840b379a86270ff4590acd2730

Summary: Pull Request resolved: #15 Pull Request resolved: fairinternal/ssl_scaling#44 Reviewed By: mannatsingh Differential Revision: D22308252 Pulled By: prigoyal fbshipit-source-id: 668c22b9edfbac823177a1567815b7d4378a6c33

facebook-github-bot · 2021-01-08T23:26:53Z

@growlix has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot

@growlix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@growlix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-09T02:53:52Z

@growlix has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot

@growlix has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-12T20:28:44Z

@growlix has updated the pull request. You must reimport the pull request before landing.

…etimes crash because 'empty histogram'

facebook-github-bot · 2021-01-15T20:20:45Z

@growlix has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-01-15T20:20:54Z

@growlix has updated the pull request. You must reimport the pull request before landing.

…fine-tuning

facebook-github-bot · 2021-01-19T17:03:21Z

@growlix has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-01-20T03:53:12Z

@growlix has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-01-20T19:15:21Z

@growlix has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-01-22T02:23:42Z

@growlix has updated the pull request. You must reimport the pull request before landing.

…l into vision_transformer "Added deit model implementation"

facebook-github-bot · 2021-01-22T03:13:42Z

@growlix has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-01-24T18:25:43Z

@growlix has updated the pull request. You must reimport the pull request before landing.

Summary: # Layer by layer memory profiling A first version of the memory profiling, tracking the memory used through the forward/backward passes, with a breakdown of the memory dedicated to activations (issue fairinternal/ssl_scaling#97). - [x] Define the test plan - [x] Provide example curves and data output - [x] Run on FSDP vs DDP - [x] Run on FSDP with or without checkpointing ## Using the feature Just add `cfg.PROFILING.TRACK_BY_LAYER_MEMORY=True` in the command line when running a job to track the memory usage, layer by layer, during both the forward and backward. Further configuration is available to chose: - which rank is monitored - for how many iterations - starting from which iteration Pull Request resolved: fairinternal/ssl_scaling#100 Test Plan: The feature comes with its own set of unit tests ## Example outputs The output directory will contain the following files for each rank and iteration monitored: ``` memory_rank_0_iteration_0.json memory_rank_0_iteration_0.jpg ``` The JSON file contains the raw data, while the JPG file provides an overview of what happening in terms of memory: <img width="1047" alt="Screenshot 2021-04-19 at 11 26 06" src="https://user-images.githubusercontent.com/7412790/115261974-19376780-a102-11eb-838c-688d807094d3.png"> Reviewed By: prigoyal Differential Revision: D27977734 Pulled By: QuentinDuval fbshipit-source-id: 4000f84e418afecb7c02dee5c5add260a04046ba

facebook-github-bot and others added 30 commits June 22, 2020 05:47

Initial commit

45fd5c1

fbshipit-source-id: bad295c2b54e5d8258176d45951637725dd771bf

Rename hydra plugin files to vissl_plugin

ed0fc2a

Summary: renaming from ssl_framework_plugin to vissl_plugin Reviewed By: blefaudeux Differential Revision: D22165297 fbshipit-source-id: 0973b007a1504d55b3bd098824888640397e4cb9

apex <> pytorch (#40)

d8264ed

Summary: Pull Request resolved: fairinternal/ssl_scaling#40 Unblocking master, moving all syncBNs to pytorch until we properly solve that Reviewed By: mannatsingh Differential Revision: D22196409 fbshipit-source-id: 0cae37bae5efcca5ffc17f9ca9d7982d2a3f0e55

Re-sync with internal repository

5c4885b

Merge pull request #6 from prigoyal/fixup-T69036893-master

ab5bcfc

Re-sync with internal repository

rename img pil enhancements. add docstrings.

a7eb4f1

Summary: Rename img pil enhancements. Add docstrings to the image transforms. Reviewed By: prigoyal Differential Revision: D22222850 fbshipit-source-id: 08e6d33fb398b9080a5deefbccf0ca87286463d2

Remove unused hydra plugin for vissl

9f68e26

Summary: hydra plugin isn't needed anywhere (fbcode/github) so removing it Reviewed By: mannatsingh Differential Revision: D22264013 fbshipit-source-id: fafcc23fd994af855a8fcdbff768666476db781a

fixed vissl framework tests after Hydra update

3f8a04e

Summary: fixing usage of hydra.experimental after Hydra update Reviewed By: jieru-hu Differential Revision: D22264458 fbshipit-source-id: 4f42a555e9385c72b428c7e4481a45e255583d3e

Track hydra1.0 branch on github (#10)

4b0c17c

Summary: Pull Request resolved: #10 tracking the hydra1.0 branch on github as per recommendation from omry Reviewed By: mannatsingh Differential Revision: D22268778 fbshipit-source-id: f9b1e976c157d5ee3e5646154b4930c6569a5c21

multi crop simclr (#15)

918e5e6

Summary: Pull Request resolved: #15 Pull Request resolved: fairinternal/ssl_scaling#44 Reviewed By: mannatsingh Differential Revision: D22308252 Pulled By: prigoyal fbshipit-source-id: 668c22b9edfbac823177a1567815b7d4378a6c33

facebook-github-bot reviewed Jan 8, 2021

View reviewed changes

facebook-github-bot reviewed Jan 9, 2021

View reviewed changes

Fixed swav config

7779036

facebook-github-bot reviewed Jan 9, 2021

View reviewed changes

Fixed some config files

60703d2

updated configs. Also updated tensorboard hook because runs would som…

849ea01

…etimes crash because 'empty histogram'

growlix added 3 commits January 18, 2021 13:07

Updated configs

392ebbc

Added [hacky] implementation of hybrid ViT

668d4e5

Added [hacky] functionality for interpolating position embedding for …

bb79e39

…fine-tuning

256-d swav projection head congfig

7163301

finetuning configs

adaba64

DeiT/TIMM implementation

1076da8

growlix added 3 commits January 21, 2021 18:37

Script to save activations

5e3b35b

Merge branch 'vision_transformer' of github.com:facebookresearch/viss…

5c4e09a

…l into vision_transformer "Added deit model implementation"

Hybrid model and stochastic depth param for deit

81a6688

deit model and updated configs

e0c89ff

prigoyal closed this Jan 26, 2021

prigoyal force-pushed the master branch from 09310c0 to a5c973b Compare January 26, 2021 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision transformer #100

Vision transformer #100

growlix commented Dec 3, 2020

facebook-github-bot commented Jan 8, 2021

facebook-github-bot left a comment

facebook-github-bot left a comment

facebook-github-bot commented Jan 9, 2021

facebook-github-bot left a comment

facebook-github-bot commented Jan 12, 2021

facebook-github-bot commented Jan 15, 2021

facebook-github-bot commented Jan 15, 2021

facebook-github-bot commented Jan 19, 2021

facebook-github-bot commented Jan 20, 2021

facebook-github-bot commented Jan 20, 2021

facebook-github-bot commented Jan 22, 2021

facebook-github-bot commented Jan 22, 2021

facebook-github-bot commented Jan 24, 2021

Vision transformer #100

Vision transformer #100

Conversation

growlix commented Dec 3, 2020

facebook-github-bot commented Jan 8, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 9, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 12, 2021

facebook-github-bot commented Jan 15, 2021

facebook-github-bot commented Jan 15, 2021

facebook-github-bot commented Jan 19, 2021

facebook-github-bot commented Jan 20, 2021

facebook-github-bot commented Jan 20, 2021

facebook-github-bot commented Jan 22, 2021

facebook-github-bot commented Jan 22, 2021

facebook-github-bot commented Jan 24, 2021