Starred repositories
A Trimap-Free Portrait Matting Solution in Real Time [AAAI 2022]
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Code for the paper "DeepRemaster: Temporal Source-Reference Attention Networks for Comprehensive Video Enhancement". http://iizuka.cs.tsukuba.ac.jp/projects/remastering/
GPT4V-level open-source multi-modal model based on Llama3-8B
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
CAMixerSR: Only Details Need More “Attention” (CVPR 2024)
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.
Bringing Old Photo Back to Life (CVPR 2020 oral)
[ICCV 2023] DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders
Infinite Photorealistic Worlds using Procedural Generation
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video ta…
Automagically generate thumbnails, animated GIFs, and summaries from videos
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"
JavaScript player library / DASH & HLS client / MSE-EME player
Tesseract Open Source OCR Engine (main repository)
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
🎥 Python and OpenCV-based scene cut/transition detection program & library.