DreamLIP: Language-Image Pre-training with Long Captions

DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen
Project Page | Paper | Data

📰 News

[2024/08/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of CC3M and CC12M are released in huggingface~
[2024/07/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of CC3M and CC12M are released in google drive~
[2024/07/16] Upload the pretrained weight of VIT-B/16 pretrained in CC3M, CC12M, YFCC15M, and merged-30M (long captions of ShareGPT4V)!
[2024/07/08] DreamLIP is accepted by ECCV 2024!

💡 Highlights

🔥 Exploring how language-image pre-training could benefit from long captions.
🔥 Strong improvement on semantic segmentation, image-text retrieval, semantic segmentation, and image understanding in MLLM.

🔥 DreamLIP trained with 30M image-text pairs achieves on par or even better performance than CLIP trained with 400M pairs.

🎨 In-Progress

Release long captions of YFCC15M.
Release training code

🏝️ Overview of supported long captions:

Long Captions of Supported Datasets (5)

Long Captions of MLLMs (3)

Generated Long Captions

Raw/Long/Short Caption	Goole Drive	Huggingface Dataset
CC3M	Link	Link
CC12M	Link	Link
YFCC15M	Link	TODO

Pretrained checkpoints

Dataset	Model	ShareGPT4V	InstructBLIP + LLAVA1.5 + ShareGPT4V
CC3M	ViT-B/16	Link	Link
CC12M	ViT-B/16	Link	TODO
YFCC15M	ViT-B/16	Link	TODO
CC30M	ViT-B/16	Link	TODO

📣 Instructions

Environment installation

pip install -r requirments.txt

Evaluate zero shot classification

bash eval_zs.sh

📖 Citation

@inproceedings{DreamLIP,
  title={DreamLIP: Language-Image Pre-training with Long Captions},
  author={Zheng, Kecheng and Zhang, Yifei and Wu, Wei and Lu, Fan and Ma, Shuailei and Jin, Xin and Chen, Wei and Shen, Yujun},
  booktitle={ECCV},
  year={2024}
}

Acknowledgements

This project is based on open_clip, and thanks for the nice work! We also thank InstructBLIP, ShareGPT4V and LLAVA for the pretrained models and codes.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
dataloaders		dataloaders
figures		figures
open_clip		open_clip
training		training
Makefile		Makefile
README.md		README.md
dist_train.sh		dist_train.sh
eval_zs.sh		eval_zs.sh
main.py		main.py
requirements-test.txt		requirements-test.txt
requirements-training.txt		requirements-training.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamLIP: Language-Image Pre-training with Long Captions

📰 News

💡 Highlights

🎨 In-Progress

🏝️ Overview of supported long captions:

Generated Long Captions

Pretrained checkpoints

📣 Instructions

📖 Citation

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

zyf0619sjtu/DreamLIP

Folders and files

Latest commit

History

Repository files navigation

DreamLIP: Language-Image Pre-training with Long Captions

📰 News

💡 Highlights

🎨 In-Progress

🏝️ Overview of supported long captions:

Generated Long Captions

Pretrained checkpoints

📣 Instructions

📖 Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages