-
University of Science and Technology of China (USTC)
- Hefei, China
-
00:04
(UTC +08:00) - https://zhendongwang6.github.io/
- https://scholar.google.com.hk/citations?user=Ya5VDjQAAAAJ&hl=zh-CN
Highlights
- Pro
Block or Report
Block or report ZhendongWang6
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (24)
Sort Name ascending (A-Z)
chatgpt
clip
controlnet
dataset
diffusion model
face-anti-spoofing
face-forgery-detection
flow
gan
img2img
knowledge distillation
large language models
large vision model
ocr
pretrain
sam系列
score metrics
segmentation
subject driven generation
survey
tools
vae
vision_language
visual text generation
Stars
Language
Sort by: Recently starred
Official implementation of EG4D: Explicit Generation of 4D Object without Score Distillation
Lumina-T2X is a unified framework for Text to Any Modality Generation
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …
Latte: Latent Diffusion Transformer for Video Generation.
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
a state-of-the-art-level open visual language model | 多模态预训练模型
[WIP] Layer Diffusion for WebUI (via Forge)
A collection of resources on controllable generation with text-to-image diffusion models.
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Huggingface-compatible SDXL Unet implementation that is readily hackable
The code of "Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting"
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Better Aligning Text-to-Image Models with Human Preference. ICCV 2023
[NeurIPS2023] This is the official code of the paper "GlyphControl: Glyph Conditional Control for Visual Text Generation"
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.