-
InstantX
-
23:24
(UTC +08:00) - wangqixun.ai@gmail.com
- https://instantid.github.io/
- https://github.com/instantX-research
Lists (1)
Sort Name ascending (A-Z)
Stars
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
InstantUnify: Integrates Multimodal LLM into Diffusion Models π₯
SigLIP-based Aesthetic Score Predictor
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
CSGO: Content-Style Composition in Text-to-Image Generation π₯
More suitable IP-Adapter for the DiT architecture
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation π₯
Enjoy the magic of Diffusion models!
A generative speech model for daily dialogue.
Official implementation of FIFO-Diffusion: Generating Infinite Videos from Text without Training (NeurIPS 2024)
InstantID-ROME: Improved Identity-Preserving Generation in Seconds π₯
Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation π₯
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Official Code for Stable Cascade
[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
An API wrapper for Discord written in Python.
π₯ StableIdentity: Inserting Anybody into Anywhere at First Sight
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)
Official implementations for paper: Anydoor: zero-shot object-level image customization