Stars
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
A curated list of image inpainting and video inpainting papers and resources
[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Character Animation (AnimateAnyone, Face Reenactment)
Multi-object image datasets with ground-truth segmentation masks and generative factors.
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
[RSS 2024] Learning Manipulation by Predicting Interaction
A curated list of papers, code, and resources pertaining to generative image composition or object insertion.
[ICLR 2024 Poster] SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Code and data release for the paper "Learning Object State Changes in Videos: An Open-World Perspective" (CVPR 2024)
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
An efficient video loader for deep learning with smart shuffling that's super easy to digest
Taming Transformers for High-Resolution Image Synthesis
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Download YouTube videos faster using a large number of VMs
Easily create large video dataset from video urls
A high-throughput and memory-efficient inference and serving engine for LLMs
Official repository of Learning to Act from Actionless Videos through Dense Correspondences.
The official codebase for running the experiments described in the AVDC paper.
Official Code for MotionCtrl [SIGGRAPH 2024]
Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
[ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding.