Name		Name	Last commit message	Last commit date
parent directory ..
__pycache__		__pycache__
exp		exp
ops		ops
README.md		README.md
main.py		main.py
opts.py		opts.py

README.md

2D Network for Video Recognition

Requirements

python 3.7
torch 1.11.0
torchvision 0.12.0

Implementation Details

We uniformly sample 8 frames during training and inference.
We use 1-clip 1-crop evaluation for 2D network with the resolution of 224x224.
lambda_av denotes the coefficient $\lambda_{av}$ in the loss function and we set it to be $1, 0.65, 0.4$ on Something-Something V1, V2, Kinetics400 datasets, respectively
We train 2D network TSM with 2 NVIDIA Tesla V100 (32GB) cards and the model is pretrained on ImageNet.

Training

Specify the directory of datasets with ROOT_DATASET in ops/dataset_config.py.

Simply run the training scripts in exp as followed:

bash exp/tsm_sthv1/run.sh  ## baseline training
bash exp/tsm_sthv1/run_MCA.sh   ## MCA training

Inference

Specify the directory of datasets with ROOT_DATASET in ops/dataset_config.py.
Please download pretrained models from Google Drive.
Specify the directory of the pretrained model with resume in test.sh.
Run the inference scripts in exp as followed:
```
bash exp/tsm_sthv1/test.sh
```