CAT-Seg:cat:: Cost Aggregation for Open-Vocabulary Semantic Segmentation

This is our official implementation of CAT-Seg!

[arXiv] [Project] [HuggingFace Demo] [Segment Anything with CAT-Seg]

by Seokju Cho*, Heeseong Shin*, Sunghwan Hong, Seungjun An, Seungjun Lee, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

Introduction

We introduce cost aggregation to open-vocabulary semantic segmentation, which jointly aggregates both image and text modalities within the matching cost.

For further details and visualization results, please check out our paper and our project page.

Demo Installation

We release the code for our interactive demo, where you can run the demo on your local or desired devices!

Please follow the original installation process below before getting started with the demo.

We use gradio, which can be installed as follows:

pip install gradio

For the demo, CAT-Seg (L) and SAM (ViT-H) is used as default. Please download each weights into the project directory.

The demo can be launched with the app.py file.

python __init__.py [-- opts [OPTS]]

# For CPU usage
python __init__.py --opts MODEL.DEVICE "cpu" [OPTS]

Installation

Please follow installation.

Data Preparation

Please follow dataset preperation.

Training

We provide shell scripts for training and evaluation. run.sh trains the model in default configuration and evaluates the model after training.

To train or evaluate the model in different environments, modify the given shell script and config files accordingly.

Training script

sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

# For ViT-B variant
sh run.sh configs/vitb_r101_384.yaml 4 output/
# For ViT-L variant
sh run.sh configs/vitl_swinb_384.yaml 4 output/
# For ViT-H variant
sh run.sh configs/vitl_swinb_384.yaml 4 output/ MODEL.SEM_SEG_HEAD.CLIP_PRETRAINED "ViT-H" MODEL.SEM_SEG_HEAD.TEXT_GUIDANCE_DIM 1024
# For ViT-G variant
sh run.sh configs/vitl_swinb_384.yaml 4 output/ MODEL.SEM_SEG_HEAD.CLIP_PRETRAINED "ViT-G" MODEL.SEM_SEG_HEAD.TEXT_GUIDANCE_DIM 1280

Evaluation

eval.sh automatically evaluates the model following our evaluation protocol, with weights in the output directory if not specified. To individually run the model in different datasets, please refer to the commands in eval.sh.

Evaluation script

sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

sh eval.sh configs/vitl_swinb_384.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth

Pretrained Models

We provide pretrained weights for our models reported in the paper. All of the models were evaluated with 4 NVIDIA RTX 3090 GPUs, and can be reproduced with the evaluation script above.

Name	Backbone	CLIP	A-847	PC-459	A-150	PC-59	PAS-20	PAS-20b	Download
CAT-Seg (B)	R101	ViT-B/16	8.9	16.6	27.2	57.5	93.7	78.3	ckpt
CAT-Seg (L)	Swin-B	ViT-L/14	11.4	20.4	31.5	62.0	96.6	81.8	ckpt
CAT-Seg (H)	Swin-B	ViT-H/14	13.1	20.1	34.4	61.2	96.7	80.2	ckpt
CAT-Seg (G)	Swin-B	ViT-G/14	14.1	21.4	36.2	61.5	97.1	81.4	ckpt

Acknowledgement

We would like to acknowledge the contributions of public projects, such as Zegformer, whose code has been utilized in this repository. We also thank Benedikt for finding an error in our inference code and evaluating CAT-Seg over various datasets!

Citing CAT-Seg 🐱🙏

@misc{cho2023catseg,
      title={CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation}, 
      author={Seokju Cho and Heeseong Shin and Sunghwan Hong and Seungjun An and Seungjun Lee and Anurag Arnab and Paul Hongsuck Seo and Seungryong Kim},
      year={2023},
      eprint={2303.11797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
cat_seg		cat_seg
configs		configs
datasets		datasets
demo		demo
open_clip		open_clip
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
__init__.py		__init__.py
eval.sh		eval.sh
plain_train_net.py		plain_train_net.py
requirements.txt		requirements.txt
run.sh		run.sh
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAT-Seg:cat:: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Introduction

Demo Installation

Installation

Data Preparation

Training

Training script

Evaluation

Evaluation script

Pretrained Models

Acknowledgement

Citing CAT-Seg 🐱🙏

About

Releases

Packages

Languages

r33m-m1kul5k1/CAT_Seg

Folders and files

Latest commit

History

Repository files navigation

CAT-Seg:cat:: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Introduction

Demo Installation

Installation

Data Preparation

Training

Training script

Evaluation

Evaluation script

Pretrained Models

Acknowledgement

Citing CAT-Seg 🐱🙏

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages