[CVPR 2024] Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

zhixiang wei¹, lin chen², et al.
¹ University of Science of Techonology of China ² Shanghai AI Laboratory

Paper: https://arxiv.org/pdf/2312.04265.pdf

Rein is a efficient and robust fine-tuning method, specifically developed to effectively utilize Vision Foundation Models (VFMs) for Domain Generalized Semantic Segmentation (DGSS). It achieves SOTA on Cityscapes to ACDC, and GTAV to Cityscapes+Mapillary+BDD100K. Using only synthetic data, Rein achieved an mIoU of 78.4% on Cityscapes validation set! Using only the data from the Cityscapes training set, we achieved an average mIoU of 77.6% on ACDC test set!

Visualization

Trained on Cityscapes, Rein generalizes to unseen driving scenes and cities: Nighttime Shanghai, Foggy Countryside, and Rainy Hollywood.

night_shanghai.mp4

rain_chicago.mp4

fog_beijing.mp4

Performance Under Various Settings (DINOv2).

Setting	mIoU	Config	Log & Checkpoint
GTAV $\rightarrow$ Cityscapes	66.7	config	log & checkpoint
+Synthia $\rightarrow$ Cityscapes	72.2	config	log & checkpoint
+UrbanSyn $\rightarrow$ Cityscapes	78.4	config	log & checkpoint
+1/16 of Cityscapes training $\rightarrow$ Cityscapes	82.5	config	log & checkpoint
GTAV $\rightarrow$ BDD100K	60.0	config	log & checkpoint
Cityscapes $\rightarrow$ ACDC	77.6	config	log & checkpoint
Cityscapes $\rightarrow$ Cityscapes-C	60.0	config	log & checkpoint

Performance For Various Backbones (Trained on GTAV).

Setting	Pretraining	Citys. mIoU	Config	Log & Checkpoint
ResNet50	ImageNet1k	49.1	config	log & checkpoint
ResNet101	ImageNet1k	45.9	config	log & checkpoint
ConvNeXt-Large	ImageNet21k	57.9	config	log & checkpoint
ViT-Small	DINOv2	55.3	config	log & checkpoint
ViT-Base	DINOv2	64.3	config	log & checkpoint
CLIP-Large	OPENAI	58.1	config	log & checkpoint
SAM-Huge	SAM	59.2	config	log & checkpoint
EVA02-Large	EVA02	67.8	config	log & checkpoint

Citation

If you find our code or data helpful, please cite our paper:

@InProceedings{Wei_2024_CVPR,
    author    = {Wei, Zhixiang and Chen, Lin and Jin, Yi and Ma, Xiaoxiao and Liu, Tianle and Ling, Pengyang and Wang, Ben and Chen, Huaian and Zheng, Jinjin},
    title     = {Stronger Fewer \& Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {28619-28630}
}

🔥 News!

🔥 To facilitate users in integrating reins into their own projects, we provide a simplified version of reins: simple_reins. With this version, users can easily use reins as a feature extractor. (Note: This version has removed features related to mask2former)
We have uploaded the config for ResNet and ConvNeXt.
🔥 We have uploaded the checkpoint and config for +1/16 of Cityscapes training set, and it get 82.5% on the Cityscapes validation set!
Rein is accepted in CVPR2024!
🔥 Using only the data from the Cityscapes training set, we achieved an average mIoU of 77.56% on the ACDC test set! This result ranks first in the DGSS methods on the ACDC benchmark! Checkpoint is avaliable at release.
Using only synthetic data (UrbanSyn, GTAV, and Synthia), Rein achieved an mIoU of 78.4% on Cityscapes! Checkpoint is avaliable at release.

Try and Test

Experience the demo: Users can open demo.ipynb in any Jupyter-supported editor to explore our demonstration.

For testing on the cityscapes dataset, refer to the 'Install' and 'Setup' sections below.

Environment Setup

To set up your environment, execute the following commands:

conda create -n rein -y
conda activate rein
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia -y
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
pip install "mmsegmentation>=1.0.0"
pip install "mmdet>=3.0.0"
pip install xformers=='0.0.20' # optional for DINOv2
pip install -r requirements.txt
pip install future tensorboard

Dataset Preparation

The Preparation is similar as DDB.

Cityscapes: Download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from Cityscapes Dataset and extract them to data/cityscapes.

Mapillary: Download MAPILLARY v1.2 from Mapillary Research and extract it to data/mapillary.

GTA: Download all image and label packages from TU Darmstadt and extract them to data/gta.

Prepare datasets with these commands:

cd Rein
mkdir data
# Convert data for validation if preparing for the first time
python tools/convert_datasets/gta.py data/gta # Source domain
python tools/convert_datasets/cityscapes.py data/cityscapes
# Convert Mapillary to Cityscapes format and resize for validation
python tools/convert_datasets/mapillary2cityscape.py data/mapillary data/mapillary/cityscapes_trainIdLabel --train_id
python tools/convert_datasets/mapillary_resize.py data/mapillary/validation/images data/mapillary/cityscapes_trainIdLabel/val/label data/mapillary/half/val_img data/mapillary/half/val_label

(Optional) ACDC: Download all image and label packages from ACDC and extract them to data/acdc.

(Optional) UrbanSyn: Download all image and label packages from UrbanSyn and extract them to data/urbansyn.

The final folder structure should look like this:

Rein
├── ...
├── checkpoints
│   ├── dinov2_vitl14_pretrain.pth
│   ├── dinov2_rein_and_head.pth
├── data
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── bdd100k
│   │   ├── images
│   │   |   ├── 10k
│   │   │   |    ├── train
│   │   │   |    ├── val
│   │   ├── labels
│   │   |   ├── sem_seg
│   │   |   |    ├── masks
│   │   │   |    |    ├── train
│   │   │   |    |    ├── val
│   ├── mapillary
│   │   ├── training
│   │   ├── cityscapes_trainIdLabel
│   │   ├── half
│   │   │   ├── val_img
│   │   │   ├── val_label
│   ├── gta
│   │   ├── images
│   │   ├── labels
├── ...

Pretraining Weights

Download: Download pre-trained weights from facebookresearch for testing. Place them in the project directory without changing the file name.

Convert: Convert pre-trained weights for training or evaluation.

python tools/convert_models/convert_dinov2.py checkpoints/dinov2_vitl14_pretrain.pth checkpoints/dinov2_converted.pth

(optional for 1024x1024 resolution)

python tools/convert_models/convert_dinov2.py checkpoints/dinov2_vitl14_pretrain.pth checkpoints/dinov2_converted_1024x1024.pth --height 1024 --width 1024

Evaluation

Run the evaluation:

python tools/test.py configs/dinov2/rein_dinov2_mask2former_512x512_bs1x4.py checkpoints/dinov2_rein_and_head.pth --backbone dinov2_converted.pth

For most of provided release checkpoints, you can run this command to evluate

python tools/test.py /path/to/cfg /path/to/checkpoint --backbone /path/to/dinov2_converted.pth #(or dinov2_converted_1024x1024.pth)

Training

Start training in single GPU:

python tools/train.py configs/dinov2/rein_dinov2_mask2former_512x512_bs1x4.py

Start training in multiple GPU:

PORT=12345 CUDA_VISIBLE_DEVICES=1,2,3,4 bash tools/dist_train.sh configs/dinov2/rein_dinov2_mask2former_1024x1024_bs4x2.py NUM_GPUS

Generate full weights

Because we only fine-tune and save the REIN and head weights, if you need a complete set of segmentor weights, you need to use this script:

python generate_full_weights.py --segmentor_save_path SEGMENTOR_SAVE_PATH --backbone CONVERTED_BACKBONE --rein_head REIN_HEAD

FAQs

Detailed instruction for mmsegmentation.
How to train on new dataset?
How to visualize and save the segmentation results?
How to use new checkpoint?
What is the difference between the ReinMask2FormerHead and original Mask2FormerHead?
Multi-gpu training problem
How to Integrate Rein into Your Existing Backbone?(without mmsegmentation)

Q: How to Visualize?

A: Use tools/visualize.py, such as :

python tools/visualize.py /path/to/cfg /path/to/checkpoint /path/to/images --backbone /path/to/converted_backbone

here /path/to/images can be a filename or image folder.

Q: Why do we need to use multiple weight files during testing?**
- A: The weight files used during testing are:
  - Backbone: Pre-trained backbone weight files. Since Rein is a parameter-efficient fine-tuning method, there is no need to fine-tune the backbone. This means that for the same backbone, we only need to store one set of parameters, which can significantly reduce storage space.
  - Rein_head: Fine-tuned Rein weights and decode head weights.

Acknowledgment

Our implementation is mainly based on following repositories. Thanks for their authors.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.vscode		.vscode
checkpoints		checkpoints
configs		configs
docs		docs
rein		rein
simple_reins		simple_reins
tools		tools
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
demo.ipynb		demo.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2024] Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Visualization

Performance Under Various Settings (DINOv2).

Performance For Various Backbones (Trained on GTAV).

Citation

🔥 News!

Try and Test

Environment Setup

Dataset Preparation

Pretraining Weights

Evaluation

Training

Generate full weights

FAQs

Acknowledgment

Star History

About

Releases 16

Packages

Languages

License

w1oves/Rein

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2024] Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Visualization

Performance Under Various Settings (DINOv2).

Performance For Various Backbones (Trained on GTAV).

Citation

🔥 News!

Try and Test

Environment Setup

Dataset Preparation

Pretraining Weights

Evaluation

Training

Generate full weights

FAQs

Acknowledgment

Star History

About

Resources

License

Stars

Watchers

Forks

Releases 16

Packages 0

Languages

Packages