Skip to content
View valencebond's full-sized avatar
🎯
Focusing
🎯
Focusing
  • CASIA
  • beijing

Block or report valencebond

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Python 599 16 Updated Sep 18, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

4,019 219 Updated Oct 4, 2024

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 312 16 Updated Sep 25, 2024

Replication of the paper "Text Is All You Need: Learning Language Representations for Sequential Recommendation" on KDD'23.

Python 81 25 Updated Apr 23, 2024

OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems

Python 228 17 Updated Jun 21, 2024

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Python 633 24 Updated Sep 27, 2024

Kolors Team

Python 3,664 242 Updated Sep 4, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,139 850 Updated Sep 13, 2024

A collection of awesome video generation studies.

TeX 290 8 Updated Sep 30, 2024

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Python 108 3 Updated May 29, 2024

Implementation of MagViT2 Tokenizer in Pytorch

Python 543 35 Updated Jul 23, 2024
Python 115 3 Updated Jun 23, 2024

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 3,326 284 Updated Aug 15, 2024

A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

Python 12,152 2,511 Updated Aug 15, 2024

Open-source and strong foundation image recognition models.

Jupyter Notebook 2,776 271 Updated Aug 1, 2024
Python 2,533 190 Updated Oct 4, 2024

Official repository for the paper PLLaVA

Python 569 37 Updated Jul 28, 2024

A curated list of awesome resources about multimodal recommender systems.

276 21 Updated Apr 4, 2024

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Jupyter Notebook 506 26 Updated Jul 1, 2024

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,189 278 Updated May 4, 2024

A Collection of Papers and Codes for CVPR2024/ECCV2024 AIGC

419 12 Updated Sep 25, 2024

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 540 59 Updated Oct 4, 2024

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 4,689 148 Updated Oct 3, 2024

Karras et al. (2022) diffusion models for PyTorch

Python 2,277 374 Updated Jul 16, 2024

Comparison between Frechet Video Distance implementation from StyleGAN-V and the original paper

Python 78 5 Updated Dec 26, 2022

Open-Sora: Democratizing Efficient Video Production for All

Python 21,762 2,105 Updated Aug 9, 2024

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Python 5,454 801 Updated May 13, 2024

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Python 505 19 Updated Jun 26, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,291 1,008 Updated Oct 4, 2024

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,179 149 Updated Sep 25, 2024
Next