- python 3.7
- torch 1.11.0
- torchvision 0.12.0
- We uniformly sample 8 frames during training and inference.
- We use 1-clip 1-crop evaluation for 2D network with the resolution of 224x224.
-
lambda_av
denotes the coefficient$\lambda_{av}$ in the loss function and we set it to be$1, 0.65, 0.4$ on Something-Something V1, V2, Kinetics400 datasets, respectively - We train 2D network TSM with 2 NVIDIA Tesla V100 (32GB) cards and the model is pretrained on ImageNet.
-
Specify the directory of datasets with
ROOT_DATASET
inops/dataset_config.py
. -
Simply run the training scripts in exp as followed:
bash exp/tsm_sthv1/run.sh ## baseline training bash exp/tsm_sthv1/run_MCA.sh ## MCA training
-
Specify the directory of datasets with
ROOT_DATASET
inops/dataset_config.py
. -
Please download pretrained models from Google Drive.
-
Specify the directory of the pretrained model with
resume
intest.sh
. -
Run the inference scripts in exp as followed:
bash exp/tsm_sthv1/test.sh