Skip to content

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Notifications You must be signed in to change notification settings

zyf0619sjtu/DreamLIP

Repository files navigation

DreamLIP: Language-Image Pre-training with Long Captions

DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen
Project Page | Paper | Data

📰 News

  • [2024/08/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of CC3M and CC12M are released in huggingface~
  • [2024/07/26] Long captions (LLAVA1.5, InstructBLIP and shareGPT4V) of CC3M and CC12M are released in google drive~
  • [2024/07/16] Upload the pretrained weight of VIT-B/16 pretrained in CC3M, CC12M, YFCC15M, and merged-30M (long captions of ShareGPT4V)!
  • [2024/07/08] DreamLIP is accepted by ECCV 2024!

💡 Highlights

  • 🔥 Exploring how language-image pre-training could benefit from long captions.
  • 🔥 Strong improvement on semantic segmentation, image-text retrieval, semantic segmentation, and image understanding in MLLM.

  • 🔥 DreamLIP trained with 30M image-text pairs achieves on par or even better performance than CLIP trained with 400M pairs. timeline.jpg

🎨 In-Progress

  • Release long captions of YFCC15M.
  • Release training code

🏝️ Overview of supported long captions:

Long Captions of Supported Datasets (5)
Long Captions of MLLMs (3)

Generated Long Captions

Raw/Long/Short Caption Goole Drive Huggingface Dataset
CC3M Link Link
CC12M Link Link
YFCC15M Link TODO

Pretrained checkpoints

Dataset Model ShareGPT4V InstructBLIP + LLAVA1.5 + ShareGPT4V
CC3M ViT-B/16 Link Link
CC12M ViT-B/16 Link TODO
YFCC15M ViT-B/16 Link TODO
CC30M ViT-B/16 Link TODO

📣 Instructions

Environment installation

pip install -r requirments.txt

Evaluate zero shot classification

bash eval_zs.sh

📖 Citation

@inproceedings{DreamLIP,
  title={DreamLIP: Language-Image Pre-training with Long Captions},
  author={Zheng, Kecheng and Zhang, Yifei and Wu, Wei and Lu, Fan and Ma, Shuailei and Jin, Xin and Chen, Wei and Shen, Yujun},
  booktitle={ECCV},
  year={2024}
}

Acknowledgements

This project is based on open_clip, and thanks for the nice work! We also thank InstructBLIP, ShareGPT4V and LLAVA for the pretrained models and codes.

About

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages