This repository tracks the latest research on machine unlearning in large language models (LLMs). The goal is to offer a comprehensive list of papers, datasets, and resources relevant to the topic.
Note
If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue, and I will address it as soon as possible.
If you want to add a new paper, feel free to either open an issue or create a pull request.
Paper | Author(s) | Year-Month | Venue |
---|---|---|---|
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | Blanco-Justicia et al. | 2024-04 | - |
Machine Unlearning for Traditional Models and Large Language Models: A Short Survey | Xu | 2024-04 | - |
The Frontier of Data Erasure: Machine Unlearning for Large Language Models | Qu et al. | 2024-03 | - |
Rethinking Machine Unlearning for Large Language Models | Liu et al. | 2024-02 | - |
Eight Methods to Evaluate Robust Unlearning in LLMs | Lynch et al. | 2024-02 | - |
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges | Si et al. | 2023-11 | - |
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions | Zhang et al. | 2023-07 | - |
Blog | Author(s) |
---|---|
Deep Forgetting & Unlearning for Safely-Scoped LLMs | Stephen Casper |
Dataset | Description | Link |
---|---|---|
TOFU | A synthetic QA dataset of fictitious authors generated by GPT-4. The datasets comes with three splits of the retain/forget sets, including 99/1, 95/5, and 90/10 (in percentage). The dataset also includes questions about real authors and world facts to evaluate the loss of general knowledge after unlearning. | arXiv, Hugging Face |
WMDP | A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing over 4000 multiple-choice questions with similar style to MMLU. It also comes with corpora in the three domains. | arXiv, Hugging Face |
MMLU Subsets | A task proposed along with the WMDP dataset. The goal is to unlearn (retain) three categories in the MMLU dataset: economics (econometrics and others), physics (math and others), and law (jurisprudence and others). The task requires high-precision unlearning, because the retain sets are categories closely related to the unlearning categories. | arXiv, Hugging Face |
arXiv and GitHub corpus | A dataset for evaluating approximate unlearning algorithms for pre-trained LLMs. The dataset contains both forget and retain splits of each category, and comes with both in-distribution and general retain sets. | arXiv, Hugging Face |