Name	Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets	assets
example	example
gradio_demo	gradio_demo
src	src
README.md	README.md
inference_instantid.py	inference_instantid.py
inference_lora.py	inference_lora.py
requirements.txt	requirements.txt

OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

Zhe Kong · Yong Zhang* · Tianyu Yang · Tao Wang· Kaihao Zhang

Bizhu Wu · Guanying Chen · Wei Liu · Wenhan Luo*

^*corresponding authors

TL; DR: OMG is a framework for multi-concept image generation, supporting character and style LoRAs on Civitai.com. It also can be combined with InstantID for multiple IDs with using a single image for each ID.

Trailor Demo: A short trailor created by using OMG + SVD.

🏷️ Change Log

[2023/3/18] 🔥 We release the technical report.
[2023/3/17] 🔥 We release the source code and gradio demo of OMG.

🔧 Dependencies and Installation

The code requires python==3.10.6, as well as pytorch==2.0.1 and torchvision==0.15.2. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

conda create -n OMG python=3.10.6
conda activate OMG
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/segment-anything.git

For Visual comprehension, you can choose YoloWorld + EfficientViT SAM or GroundingDINO + SAM

1. (Recommend) YoloWorld + EfficientViT SAM:

pip install inference[yolo-world]==0.9.13
pip install  onnxsim==0.4.35

1. (Optional) If you can not install inference[yolo-world]. You can install GroundingDINO for visual comprehension.

GroundingDINO requires manual installation.

Run this so the environment variable will be set under current shell.

export CUDA_HOME=/path/to/cuda-11.3

In this example, /path/to/cuda-11.3 should be replaced with the path where your CUDA toolkit is installed.

git clone https://github.com/IDEA-Research/GroundingDINO.git

cd GroundingDINO/

pip install -e .

More installation details can be found in GroundingDINO

⏬ Pretrained Model Preparation

Download stable-diffusion-xl-base-1.0, InstantID, antelopev2, ControlNet, controlnet-openpose-sdxl-1.0, controlnet-canny-sdxl-1.0, controlnet-depth-sdxl-1.0, dpt-hybrid-midas.

For YoloWorld + EfficientViT SAM: EfficientViT-SAM-XL1, yolo-world.

For GroundingDINO + SAM: GroundingDINO, SAM.

For Character LoRAs: Civitai-Chris Evans, Civitai-Taylor Swift, Harry Potter, Hermione Granger.

For Style LoRAs: Anime Sketch Style.

And put them under checkpoint as follow:

OMG
├── checkpoint
│   ├── antelopev2
│   ├── ControlNet
│   ├── controlnet-openpose-sdxl-1.0
│   ├── controlnet-canny-sdxl-1.0
│   ├── controlnet-depth-sdxl-1.0
│   ├── dpt-hybrid-midas
│   ├── style
│   ├── InstantID
│   ├── GroundingDINO
│   ├── lora
│   │   ├── Harry_Potter.safetensors
│   │   └── Hermione_Granger.safetensors
│   ├── sam
│   │   ├── sam_vit_h_4b8939.pth
│   │   └── xl1.pt
│   └── stable-diffusion-xl-base-1.0
├── gradio_demo
├── src
├── inference_instantid.py
└── inference_lora.py

If you use YoloWorld, put yolo-world.pt to /tmp/cache/yolo_world/l/yolo-world.pt. And put ViT-B-32.pt (download from openai) to ~/.cache/clip/ViT-B-32.pt

Or you can manually set the checkpoint path as follows:

For OMG + LoRA:

python inference_lora.py  \
--pretrained_sdxl_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_checkpoint <path to controlnet-openpose-sdxl-1.0> \
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--lora_path <Lora path for character1|Lora path for character1>

For OMG + InstantID:

python inference_instantid.py  \
--pretrained_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_path <path to InstantID controlnet> \
--face_adapter_path <path to InstantID face adapter>
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--antelopev2_path <path to antelopev2>

💻 Usage

1: OMG with LoRA

The <TOK> for Harry_Potter.safetensors is Harry Potter and for Hermione_Granger.safetensors is Hermione Granger.

python inference_lora.py \
    --prompt <prompt for the two person> \
    --negative_prompt <negative prompt> \
    --prompt_rewrite "[<prompt for person 1>]-*-[<negative prompt>]|[<prompt for person 2>]-*-[negative prompt]" \
    --lora_path "[<Lora path for character1|Lora path for character1>]"

For example:

python inference_lora.py \
    --prompt "Close-up photo of the happy smiles on the faces of the cool man and beautiful woman as they leave the island with the treasure, sail back to the vacation beach, and begin their love story, 35mm photograph, film, professional, 4k, highly detailed." \
    --negative_prompt 'noisy, blurry, soft, deformed, ugly' \
    --prompt_rewrite '[Close-up photo of the Harry Potter in surprised expressions as he wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]|[Close-up photo of the Hermione Granger in surprised expressions as she wear Hogwarts uniform, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]' \
    --lora_path './checkpoint/lora/chris-evans.safetensors|./checkpoint/lora/TaylorSwiftSDXL.safetensors'

2: OMG with InstantID

python inference_instantid.py \
    --prompt <prompt for the two person> \
    --negative_prompt <negative prompt> \
    --prompt_rewrite "[<prompt for person 1>]-*-[<negative prompt>]-*-<path to reference image1>|[<prompt for person 2>]-*-[negative prompt]-*-<path to reference image2>",

For example:

python inference_instantid.py \
    --prompt 'Close-up photo of the happy smiles on the faces of the cool man and beautiful woman as they leave the island with the treasure, sail back to the vacation beach, and begin their love story, 35mm photograph, film, professional, 4k, highly detailed.' \
    --negative_prompt 'noisy, blurry, soft, deformed, ugly' \
    --prompt_rewrite '[Close-up photo of the a man, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]-*-./example/musk_resize.jpeg|[Close-up photo of the a man, 35mm photograph, film, professional, 4k, highly detailed.]-*-[noisy, blurry, soft, deformed, ugly]-*-./example/yann-lecun_resize.jpg'

3. Local gradio demo with OMG + LoRA

If you choose YoloWorld + EfficientViT SAM:

python gradio_demo/app.py --segment_type yoloworld

For GroundingDINO + SAM:

python gradio_demo/app.py --segment_type GroundingDINO

Connect to the public URL displayed after the startup process is completed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

🏷️ Change Log

🔧 Dependencies and Installation

⏬ Pretrained Model Preparation

💻 Usage

1: OMG with LoRA

2: OMG with InstantID

3. Local gradio demo with OMG + LoRA

About

Releases

Packages

Contributors 2

Languages

kongzhecn/OMG

Folders and files

Latest commit

History

Repository files navigation

OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

🏷️ Change Log

🔧 Dependencies and Installation

⏬ Pretrained Model Preparation

💻 Usage

1: OMG with LoRA

2: OMG with InstantID

3. Local gradio demo with OMG + LoRA

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages