Skip to content

Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, T2I-Adapter, IP-Adapter.

License

Notifications You must be signed in to change notification settings

atfortes/Awesome-Controllable-Diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

83 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Stargazers Forks Contributors Papers MIT License

Awesome Controllable Diffusion

Papers and Resources on Adding Conditional Controls to Diffusion Models in the Era of AIGC.

Buy Me A Coffee
πŸ—‚οΈ Table of Contents
  1. πŸ“ Papers
  2. πŸ”— Other Resources
  3. 🌟 Other Awesome Lists
  4. ✍️ Contributing

πŸ“ Papers

  1. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. πŸ”₯ [project] [paper]

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman. CVPR'23.

  2. Multi-Concept Customization of Text-to-Image Diffusion. [project] [paper] [code]

    Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu. CVPR'23.

  3. GLIGEN: Open-Set Grounded Text-to-Image Generation. πŸ”₯ [project] [paper] [code]

    Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee. CVPR'23.

  4. Adding Conditional Control to Text-to-Image Diffusion Models. πŸ”₯ [paper] [code]

    Lvmin Zhang, Anyi Rao, Maneesh Agrawala. ICCV'23.

  5. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. πŸ”₯ [paper] [code]

    Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie. Tech Report 2023.

  6. Subject-driven Text-to-Image Generation via Apprenticeship Learning. [paper]

    Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen. NeurIPS'23.

  7. InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning. [paper]

    Jing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung. CVPR'24.

  8. BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. πŸ”₯ [project] [paper] [code]

    Dongxu Li, Junnan Li, Steven C.H. Hoi. NeurIPS'23.

  9. StyleDrop: Text-to-Image Generation in Any Style. πŸ”₯ [project] [paper]

    Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan. NeurIPS'23.

  10. Face0: Instantaneously Conditioning a Text-to-Image Model on a Face. [paper]

    Dani Valevski, Danny Wasserman, Yossi Matias, Yaniv Leviathan. SIGGRAPH Asia'23.

  11. Controlling Text-to-Image Diffusion by Orthogonal Finetuning. [project] [paper] [code]

    Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard SchΓΆlkopf. NeruIPS'23.

  12. Zero-shot spatial layout conditioning for text-to-image diffusion models.

    Guillaume Couairon, Marlène Careil, Matthieu Cord, Stéphane Lathuilière, Jakob Verbeek. ICCV'23.

  13. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. πŸ”₯ [project] [paper] [code]

    Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, Wei Yang. Tech Report 2023.

  14. Kosmos-G: Generating Images in Context with Multimodal Large Language Models πŸ”₯ [project] [paper] [code]

    Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei. ICLR'24.

  15. Cross-Image Attention for Zero-Shot Appearance Transfer. [project] [paper] [code]

    Yuval Alaluf, Daniel Garibi, Or Patashnik, Hadar Averbuch-Elor, Daniel Cohen-Or. SIGGRAPH'24.

  16. The Chosen One: Consistent Characters in Text-to-Image Diffusion Models. [project] [paper] [code]

    Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski. SIGGRAPH'24.

  17. MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion. [project] [paper] [code]

    Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Xiao Yang, Mohammad Soleymani. ICML'24.

  18. ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs. [project] [paper]

    Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani. Preprint 2023.

  19. Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models. [project] [paper] [code]

    Daniel Geng, Inbum Park, Andrew Owens. CVPR'24.

  20. Style Aligned Image Generation via Shared Attention. πŸ”₯ [project] [paper] [code]

    Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or. CVPR'24.

  21. Context Diffusion: In-Context Aware Image Generation. [project] [paper]

    Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic. ECCV'24.

  22. PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding. [project] [paper] [code]

    Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan. CVPR'24. πŸ”₯

  23. SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing. [project] [paper] [code]

    Zeyinzi Jiang, Chaojie Mao, Yulin Pan, Zhen Han, Jingfeng Zhang. CVPR'24.

  24. PALP: Prompt Aligned Personalization of Text-to-Image Models. [project] [paper]

    Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen. Preprint 2024.

  25. InstantID: Zero-shot Identity-Preserving Generation in Seconds. [project] [paper] [code]

    Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, Yao Hu. Tech Report 2024. πŸ”₯

  26. Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs. πŸ”₯ [paper] [code]

    Ling Yang,Β Zhaochen Yu,Β Chenlin Meng,Β Minkai Xu,Β Stefano Ermon,Β Bin Cui. ICML'24.

  27. UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion. πŸ”₯ [project] [paper]

    Wei Li, Xue Xu, Jiachen Liu, Xinyan Xiao. ACL'24.

  28. Training-Free Consistent Text-to-Image Generation. [project] [paper]

    Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, Yuval Atzmon. SIGGRAPH'24.

  29. InstanceDiffusion: Instance-level Control for Image Generation. [project] [paper] [code]

    Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra. CVPR'24.

  30. Direct Consistency Optimization for Compositional Text-to-Image Personalization. [project] [paper] [code]

    Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin. Preprint 2024.

  31. RealCompo: Dynamic Equilibrium between Realism and Composition Improves Text-to-Image Diffusion Models. [paper] [code]

    Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui. Preprint 2024.

  32. Visual Style Prompting with Swapping Self-Attention. [project] [paper] [code]

    Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh. Preprint 2024.

    [project] [paper] [code]

  33. Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition. [project] [paper] [code]

    Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H.T. Kung, Yubei Chen. Tech Report 2024.

  34. Multi-LoRA Composition for Image Generation. [project] [paper] [code]

    Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen. Preprint 2024.

  35. FeedFace: Efficient Inference-based Face Personalization via Diffusion Models. πŸ”₯ [paper] [code]

    Chendong Xiang, Armando Fortes, Khang Hui Chua, Hang Su, Jun Zhu. Tiny Papers @ ICLR'24.

  36. Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation. [project] [paper] [code]

    Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan. ECCV'24.

  37. Continuous Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions. [project] [paper] [code]

    Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, BjΓΆrn Ommer. Preprint 2024.

  38. Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation. [project] [paper] [code]

    Omer Dahary, Or Patashnik, Kfir Aberman, Daniel Cohen-Or. ECCV'24. πŸ”₯

  39. FlashFace: Human Image Personalization with High-fidelity Identity Preservation. [project] [paper] [code]

    Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo. Preprint 2024.

  40. Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models. [paper]

    Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron. CVPR'24.

  41. Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models. [project] [paper]

    Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang. Preprint 2024.

  42. ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback. [project] [paper] [code]

    Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen. ECCV'24.

  43. Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [project] [paper] [code]

    Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal. Preprint 2024.

  44. MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models. [project] [paper] [code]

    Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal M Patel. ECCV'24.

  45. MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation. [project] [paper] [code]

    Kuan-Chieh Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman. Preprint 2024.

  46. Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding. [paper]

    Zezhong Fan, Xiaohan Li, Chenhao Fang, Topojoy Biswas, Kaushiki Nag, Jianpeng Xu, Kannan Achan. WWW'24.

  47. MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation. πŸ”₯ [project] [paper] [code]

    Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang. ECCV'24.

  48. StyleBooth: Image Style Editing with Multimodal Instruction. [project] [paper] [code]

    Zhen Han, Chaojie Mao, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang. Preprint 2024.

  49. MultiBooth: Towards Generating All Your Concepts in an Image from Text. [project] [paper] [code]

    Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Li Xiu. Preprint 2024.

  50. PuLID: Pure and Lightning ID Customization via Contrastive Alignment. [paper] [code]

    Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He. Tech Report 2024.

  51. InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation. [paper]

    Chanran Kim, Jeongin Lee, Shichang Joung, Bongmo Kim, Yeul-Min Baek. Preprint 2024.

  52. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation. [paper]

    Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou. Preprint 2024.

  53. Customizing Text-to-Image Models with a Single Image Pair. [project] [paper] [code]

    Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu. Preprint 2024.

  54. Compositional Text-to-Image Generation with Dense Blob Representations. πŸ”₯ [project] [paper]

    Weili Nie, Sifei Liu, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat. ICML 2024.

  55. Personalized Residuals for Concept-Driven Text-to-Image Generation. [project] [paper]

    Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz. CVPR'24.

  56. FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition. [project] [paper] [code]

    Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen. CVPR'24.

  57. RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control. [project] [paper] [code]

    Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu. Preprint 2024. πŸ”₯

  58. pOps: Photo-Inspired Diffusion Operators. πŸ”₯ [project] [paper] [code]

    Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or. Preprint 2024.

  59. Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis. [paper] [code]

    Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi. CVPR'24.

  60. Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance. πŸ”₯ [project] [paper] [code]

    Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou. Preprint 2024.

  61. Instant 3D Human Avatar Generation using Image Diffusion Models. [project] [paper]

    Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu. ECCV'24.

  62. Sketch-Guided Scene Image Generation. [paper]

    Tianyu Zhang, Xiaoxuan Xie, Xusheng Du, Haoran Xie. Preprint 2024.

  63. SEED-Story: Multimodal Long Story Generation with Large Language Model. [paper] [code]

    Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen. Preprint 2024.

  64. Training-free Composite Scene Generation for Layout-to-Image Synthesis. [paper] [code]

    Jiaqi Liu, Tao Huang, Chang Xu. ECCV'24.

  65. ViPer: Visual Personalization of Generative Models via Individual Preference Learning. [project] [paper] [code]

    Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann, Amir Zamir. ECCV'24.

  66. IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts. [project] [paper] [code]

    Ciara Rowles, Shimon Vainer, Dante De Nigris, Slava Elizarov, Konstantin Kutsy, Simon DonnΓ©. Preprint 2024.

  67. Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches. [project] [paper]

    Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li. Preprint 2024.

  68. Generative Photomontage. [project] [paper] [code]

    Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu. Preprint 2024.

  69. CSGO: Content-Style Composition in Text-to-Image Generation. [project] [paper] [code]

    Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li. Preprint 2024.

  70. IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation. πŸ”₯ [project] [paper] [code]

    Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang. Preprint 2024.

↑ Back to Top ↑

πŸ”— Other Resources#

  1. Regional Prompter Set a prompt to a divided region.

↑ Back to Top ↑

🌟 Other Awesome Lists

  1. Awesome-LLM-Reasoning Collection of papers and resources on Reasoning in Large Language Models.

  2. Awesome-Controllable-T2I-Diffusion-Models A collection of resources on controllable generation with text-to-image diffusion models.

↑ Back to Top ↑

✍️ Contributing #

  • Add a new paper or update an existing paper, thinking about which category the work should belong to.
  • Use the same format as existing entries to describe the work.
  • Add the abstract link of the paper (/abs/ format if it is an arXiv publication).

Don't worry if you do something wrong, it will be fixed for you!

Contributors

Star History

Star History Chart