Skip to content
/ PUND Public

This repository contains the official implementation of the paper titled "Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective".

License

Notifications You must be signed in to change notification settings

hxxdtd/PUND

Repository files navigation

Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective [arXiv]

This repository contains the official implementation of the paper titled "Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective".

teaser

Environment

This project is developed with Python 3.9. Please run the following script to install the required packages.

pip install -r requirements.txt

Preparation

The original weight for Stable Diffusion (SD) v1.4 can be downloaded from here and placed at stable-diffusion/models/ldm/stable-diffusion-v1/sd-v1-4.ckpt.

We use the Diffusers version of the SD model and require converting the original CompVis version to the Diffusers format by executing:

python stable-diffusion/train-scripts/compvis2diffusers.py

Additionally, download the following modules: vae, tokenizer, and text_encoder from here, and place them in the folder stable-diffusion/diffusers_ckpt/ORI.

We provide unlearned model checkpoints for object (e.g., Jeep) and ID (e.g., Angelina Jolie), which are placed in the folder stable-diffusion/diffusers_ckpt. The download links are provided in the table below:

UCE ESD FMN CA
ckpt ckpt ckpt ckpt

For ID evaluation, we integrate celeb-detection-oss into our code. Please download the facial recognition model weight from here and place it in the folder src/tasks/utils/metrics/celeb-detection-oss/examples/resources/face_recognition.

Generate training data for restoration

Taking the restoration of Angelina Jolie as an example, we first generate images of Angelina Jolie using the original Stable Diffusion (SD). The csv file containing the prompts for image generation is placed in the prompts folder. Generate images by running:

python src/execs/generate_dataset.py --prompts_path prompts/id/jolie.csv --concept jolie --save_path files/dataset/id --device cuda:0

(Optional but recommended): Use the classifiers to choose the good generated images for embedding search by running:

python src/execs/choose_dataset.py --concept_type 'id' --concept 'angelina jolie' --threshold 0.99

Search for the embedding

The configuration files for search are located in the configs folder. Start the adversarial search by running:

python src/execs/attack.py --config-file configs/id/jolie/ORI_jolie.json --logger.name Adv_Search

To validate the obtained embedding, we feed it into the unlearned model (task.erase_ckpt in the configuration file) to generate the image. We chooose the embedding based on its performance on the unlearned model (we use the UCE erased model for validation in the experiments).

Test the obtained embedding

We provide adversarial embeddings for the restoration of object (e.g., Jeep) and ID (e.g., Angelina Jolie) in the folder files/embeddings. The test demonstration is available in test.ipynb.

Acknowledgment

This repository is built upon the official codebase of UnlearnDiff, and we express gratitude for their helpful contributions.

Citation

@misc{han2024probing,
      title={Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective}, 
      author={Xiaoxuan Han and Songlin Yang and Wei Wang and Yang Li and Jing Dong},
      year={2024},
      eprint={2404.19382},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

This repository contains the official implementation of the paper titled "Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published