A curated list of prompt/adapter learning methods for vision-language models.
- If you know that some papers published in top conferences (CVPR, ICCV, ECCV, ICML, NeurlPS, ICLR) or journals (TPAMI, IJCV, TIP) have not been included in this list, please feel free to contact me at any time, either by sending an email (zhengli97[at]qq.com) or submitting an issue.
- We would appreciate more people joining us in maintaining this list of papers.
- Note that papers without open-source code are not recommended.
Use text-based learnable prompts/adapters.
Use image-based learnable prompts/adapters.
Use text- and image-based learnable prompts/adapters.
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [Paper]
- Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey. [Paper]
Base-to-Novel Generalization. (ViT-B/16 CLIP)
Methods | Pub | Base | Novel | HM (main) | Code |
---|---|---|---|---|---|
CLIP | ICML 21 | 69.34 | 74.22 | 71.70 | Link |
CoOp | IJCV 22 | 82.69 | 63.22 | 71.66 | Link |
CoCoOp | CVPR 22 | 80.47 | 71.69 | 75.83 | Link |
ProDA | CVPR 22 | 81.56 | 72.30 | 76.65 | Link |
KgCoOp | CVPR 23 | 80.73 | 73.60 | 77.00 | Link |
RPO | ICCV 23 | 81.13 | 75.00 | 77.78 | Link |
MaPLe | CVPR 23 | 82.28 | 75.14 | 78.55 | Link |
DePT | CVPR 24 | 83.62 | 75.04 | 79.10 | Link |
TCP | CVPR 24 | 84.13 | 75.36 | 79.51 | Link |
MMA | CVPR 24 | 83.20 | 76.80 | 79.87 | Link |
PromptSRC | ICCV 23 | 84.26 | 76.10 | 79.97 | Link |
HPT | AAAI 24 | 84.32 | 76.86 | 80.23 | Link |
CoPrompt | ICLR 24 | 84.00 | 77.23 | 80.48 | Link |
PromptKD | CVPR 24 | 86.96 | 80.73 | 83.73 | Link |
Table 1. Average results on 11 datasets. (Only works with open-source code will be listed.)
CoOp
Learning to Prompt for Vision-Language Models. IJCV 2022.
[Paper] [Code]CoCoOp
Conditional Prompt Learning for Vision-Language Models. CVPR 2022.
[Paper] [Code]ProDA
Prompt Distribution Learning. CVPR 2022.
[Paper] [Code]VPT
Visual Prompt Tuning. ECCV 2022.
[Paper] [Code]
MaPLe
MaPLe: Multi-modal Prompt Learning. CVPR 2023.
[Paper] [Code]KgCoOp
Visual-Language Prompt Tuningx with Knowledge-guided Context Optimization. CVPR 2023.
[Paper] [Code]LASP
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models. CVPR 2023.
[Paper]DAM-VP
Diversity-Aware Meta Visual Prompting. CVPR 2023.
[Paper] [Code]TaskRes
Task Residual for Tuning Vision-Language Models. CVPR 2023.
[Paper] [Code]RPO
Read-only Prompt Optimization for Vision-Language Few-shot Learning. ICCV 2023.
[Paper] [Code]KAPT
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models. ICCV 2023.
[Paper]CuPL
What does a platypus look like? Generating customized prompts for zero-shot image classification. ICCV 2023.
[Paper] [Code]ProGrad
Prompt-aligned Gradient for Prompt Tuning. ICCV 2023.
[Paper][Code]PromptSRC
Self-regulating Prompts: Foundational Model Adaptation without Forgetting. ICCV 2023.
[Paper] [Code]DeFo
Learning to Decompose Visual Features with Latent Textual Prompts. ICLR 2023.
[Paper]POMP
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition. NeurIPS 2023.
[Paper] [Code]
MetaPrompt
Learning Domain Invariant Prompt for Vision-Language Models. TIP 2024.
[Paper]SA2VP
SA2VP: Spatially Aligned-and-Adapted Visual Prompt. AAAI 2024.
[Paper] [Code]HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models. AAAI 2024.
[Paper] [Code]LaViP
LaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper]CoPrompt
Consistency-guided Prompt Learning for Vision-Language Models. ICLR 2024.
[Paper] [Code]ProText
Learning to Prompt with Text Only Supervision for Vision-Language Models. arxiv 24.
[Paper] [Code]PromptKD
Unsupervised Prompt Distillation for Vision Language Models. CVPR 2024.
[Paper] [Code]DePT
DePT: Decoupled Prompt Tuning. CVPR 2024.
[Paper] [Code]ArGue
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models. CVPR 2024.
[Paper]TCP
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model. CVPR 2024.
[Paper] [Code]MMA
MMA: Multi-Modal Adapter for Vision-Language Models. CVPR 2024.
[Paper] [Code]KDPL
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation. ECCV 2024.
[Paper] [Code]CoCoLe
Conceptual Codebook Learning for Vision-Language Models. ECCV 2024.
[Paper]
Methods | Pub | ImageNet | -A | -V2 | -R | -S | Avg. (main) | Code |
---|---|---|---|---|---|---|---|---|
CoOp | IJCV 22 | 71.51 | 49.71 | 64.20 | 75.21 | 47.99 | 59.28 | Link |
CoCoOp | CVPR 22 | 71.02 | 50.63 | 64.07 | 76.18 | 48.75 | 59.91 | Link |
TPT | NeurIPS 22 | 68.98 | 54.77 | 63.45 | 77.06 | 47.94 | 60.81 | Link |
TPT+CoOp | NeurIPS 22 | 73.61 | 57.95 | 66.83 | 77.27 | 49.29 | 62.84 | Link |
PromptAlign | NeurIPS 23 | --- | 59.37 | 65.29 | 79.33 | 59.37 | 63.55 | Link |
TPS+CoOp | Arxiv 24 | 73.73 | 60.49 | 66.84 | 77.44 | 49.08 | 65.52 | Link |
RLCF | ICLR 24 | 73.23 | 65.45 | 69.77 | 83.35 | 54.74 | 68.33 | Link |
RLCF+CoOp | ICLR 24 | 76.05 | 69.74 | 70.62 | 84.51 | 56.49 | 70.34 | Link |
Table 2. Test-time prompt tuning methods on OOD data.
TPT
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. NeurIPS 2022.
[Paper] [Code]SwapPrompt
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS 2023.
[Paper]PrompAlign
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization. NeurIPS 2023.
[Paper] [Code]TPS
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Arxiv 2024.
[Paper] [Code]RLCF
Test-time Adaptation with CLIP reward for zero-shot generalization in Vision-Language Models. ICLR 2024.
[Paper] [Code]InTTA
Invariant Test-Time Adaptation for Vision-Language Model Generalization. Arxiv 2024.
[Paper] [Code]
CLIP-Adapter
CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Arxiv 2021.
[Paper] [Code]
Efficient-Prompt
Prompting visual-language models for efficient video understanding. ECCV 2022.
[Paper] [Code]InTTA
Expanding Language-Image Pretrained Models for General Video Recognition. ECCV 2022.
[Paper] [Code]RePro
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. ICLR 2023.
[Paper] [Code]
L2P
Learning to Prompt for Continual Learning. CVPR 2022.
[Paper] [Code]DualPrompt
DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. ECCV 2022.
[Paper] [Code]EvoPrompt
Evolving Parameterized Prompt Memory for Continual Learning. AAAI 2024.
[Paper]CPrompt
Consistent Prompting for Rehearsal-Free Continual Learning. CVPR 2024.
[Paper] [Code]DIKI
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models. ECCV 2024.
[Paper] [Code]