Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective [arXiv]
This repository contains the official implementation of the paper titled "Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective".
This project is developed with Python 3.9. Please run the following script to install the required packages.
pip install -r requirements.txt
The original weight for Stable Diffusion (SD) v1.4 can be downloaded from here and placed at stable-diffusion/models/ldm/stable-diffusion-v1/sd-v1-4.ckpt
.
We use the Diffusers version of the SD model and require converting the original CompVis version to the Diffusers format by executing:
python stable-diffusion/train-scripts/compvis2diffusers.py
Additionally, download the following modules: vae
, tokenizer
, and text_encoder
from here, and place them in the folder stable-diffusion/diffusers_ckpt/ORI
.
We provide unlearned model checkpoints for object (e.g., Jeep) and ID (e.g., Angelina Jolie), which are placed in the folder stable-diffusion/diffusers_ckpt
. The download links are provided in the table below:
UCE | ESD | FMN | CA |
---|---|---|---|
ckpt | ckpt | ckpt | ckpt |
For ID evaluation, we integrate celeb-detection-oss into our code. Please download the facial recognition model weight from here and place it in the folder src/tasks/utils/metrics/celeb-detection-oss/examples/resources/face_recognition
.
Taking the restoration of Angelina Jolie as an example, we first generate images of Angelina Jolie using the original Stable Diffusion (SD). The csv file containing the prompts for image generation is placed in the prompts
folder. Generate images by running:
python src/execs/generate_dataset.py --prompts_path prompts/id/jolie.csv --concept jolie --save_path files/dataset/id --device cuda:0
(Optional but recommended): Use the classifiers to choose the good generated images for embedding search by running:
python src/execs/choose_dataset.py --concept_type 'id' --concept 'angelina jolie' --threshold 0.99
The configuration files for search are located in the configs
folder. Start the adversarial search by running:
python src/execs/attack.py --config-file configs/id/jolie/ORI_jolie.json --logger.name Adv_Search
To validate the obtained embedding, we feed it into the unlearned model (task.erase_ckpt
in the configuration file) to generate the image. We chooose the embedding based on its performance on the unlearned model (we use the UCE erased model for validation in the experiments).
We provide adversarial embeddings for the restoration of object (e.g., Jeep) and ID (e.g., Angelina Jolie) in the folder files/embeddings
. The test demonstration is available in test.ipynb
.
This repository is built upon the official codebase of UnlearnDiff, and we express gratitude for their helpful contributions.
@misc{han2024probing,
title={Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective},
author={Xiaoxuan Han and Songlin Yang and Wei Wang and Yang Li and Jing Dong},
year={2024},
eprint={2404.19382},
archivePrefix={arXiv},
primaryClass={cs.CV}
}