Name	Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md	README.md

Awesome Large Language Model Unlearning

This repository tracks the latest research on machine unlearning in large language models (LLMs). The goal is to offer a comprehensive list of papers, datasets, and resources relevant to the topic.

Note

If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue, and I will address it as soon as possible.

If you want to add a new paper, feel free to either open an issue or create a pull request.

Table of Contents
Papers
- Methods
- Surveys and Position Papers
Blog Posts
Datasets

Papers

Methods

Paper	Author(s)	Year-Month	Venue	Code
Offset Unlearning for Large Language Models	Huang et al.	2024-04	-	GitHub
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge	Lu et al.	2024-04	-	-
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning	Zhang et al.	2024-04	-	GitHub
Localizing Paragraph Memorization in Language Models	Stoehr et al.	2024-03	-	-
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	Li et al.	2024-03	-	GitHub
Dissecting Language Models: Machine Unlearning via Selective Pruning	Pochinkov and Schoots	2024-03	-	-
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models	Gu et al.	2024-03	-	-
Ethos: Rectifying Language Models in Orthogonal Parameter Space	Gao et al.	2024-03	-	-
Towards Efficient and Effective Unlearning of Large Language Models for Recommendation	Wang et al.	2024-03	-	GitHub
Guardrail Baselines for Unlearning in LLMs	Thaker et al.	2024-03	ICLR 2024 SeT-LLM Workshop	-
Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning	Zhao et al.	2024-02	-	-
Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination	Dong et al.	2024-02	-	GitHub
Towards Safer Large Language Models through Machine Unlearning	Liu et al.	2024-02	-	GitHub
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models	Wang et al.	2024-02	-	-
Unlearnable Algorithms for In-context Learning	Muresanu et al.	2024-02	-	-
Machine Unlearning of Pre-trained Large Language Models	Yao et al.	2024-02	-	GitHub
Visual In-Context Learning for Large Vision-Language Models	Zhou et al.	2024-02	-	-
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models	Xing et al.	2024-02	-	-
Unlearning Reveals the Influential Training Data of Language Models	Isonuma and Titov	2024-01	-	-
TOFU: A Task of Fictitious Unlearning for LLMs	Maini et al.	2024-01	-	GitHub
Large Language Model Unlearning	Yao et al.	2023-14	ICLR 2024	GitHub
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs	Kadhe et al.	2023-12	NeurIPS 2023 SoLaR Workshop	-
Making Harmful Behaviors Unlearnable for Large Language Models	Zhou et al.	2023-11	-	-
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models	Ni et al.	2023-11	-	-
Who's Harry Potter? Approximate Unlearning in LLMs	Eldan and Russinovich	2023-10	-	-
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models	Wu et al.	2023-10	EMNLP 2023	GitHub
Unlearn What You Want to Forget: Efficient Unlearning for LLMs	Chen and Yang	2023-10	EMNLP 2023	GitHub
In-Context Unlearning: Language Models as Few Shot Unlearners	Pawelczyk et al.	2023-10	-	-
Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble	Liu and Kalinli	2023-09	-	-
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks	Patil et al.	2023-09	-	GitHub
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation	Hu et al.	2023-08	AAAI 2024	GitHub
Unlearning Bias in Language Models by Partitioning Gradients	Yu et al.	2023-07	ACL (Findings) 2023	GitHub
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data	Li et al.	2023-07	-	-
What can we learn from Data Leakage and Unlearning for Law?	Borkar	2023-07	-	-
LEACE: Perfect linear concept erasure in closed form	Belrose et al.	2023-06	NeurIPS 2023	GitHub
Composing Parameter-Efficient Modules with Arithmetic Operations	Zhang et al.	2023-06	NeurIPS 2023	GitHub
KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment	Wang et al.	2023-05	-	GitHub
Editing Models with Task Arithmetic	Ilharco et al.	2022-12	ICLR 2023	GitHub
Privacy Adhering Machine Un-learning in NLP	Kumar et al.	2022-12	-	-
The CRINGE Loss: Learning what language not to model	Adolphs et al.	2022-11	-	-
Knowledge Unlearning for Mitigating Privacy Risks in Language Models	Jang et al.	2022-10	-	GitHub
Quark: Controllable Text Generation with Reinforced Unlearning	Lu et al.	2022-05	NeurIPS 2022	GitHub
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts	Liu et al.	2021-05	ACL 2021	GitHub

Surveys and Position Papers

Paper	Author(s)	Year-Month	Venue
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods	Blanco-Justicia et al.	2024-04	-
Machine Unlearning for Traditional Models and Large Language Models: A Short Survey	Xu	2024-04	-
The Frontier of Data Erasure: Machine Unlearning for Large Language Models	Qu et al.	2024-03	-
Rethinking Machine Unlearning for Large Language Models	Liu et al.	2024-02	-
Eight Methods to Evaluate Robust Unlearning in LLMs	Lynch et al.	2024-02	-
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges	Si et al.	2023-11	-
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions	Zhang et al.	2023-07	-

Blog Posts

Blog	Author(s)
Deep Forgetting & Unlearning for Safely-Scoped LLMs	Stephen Casper

Datasets

Dataset	Description	Link
TOFU	A synthetic QA dataset of fictitious authors generated by GPT-4. The datasets comes with three splits of the retain/forget sets, including 99/1, 95/5, and 90/10 (in percentage). The dataset also includes questions about real authors and world facts to evaluate the loss of general knowledge after unlearning.	arXiv, Hugging Face
WMDP	A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing over 4000 multiple-choice questions with similar style to MMLU. It also comes with corpora in the three domains.	arXiv, Hugging Face
MMLU Subsets	A task proposed along with the WMDP dataset. The goal is to unlearn (retain) three categories in the MMLU dataset: economics (econometrics and others), physics (math and others), and law (jurisprudence and others). The task requires high-precision unlearning, because the retain sets are categories closely related to the unlearning categories.	arXiv, Hugging Face
arXiv and GitHub corpus	A dataset for evaluating approximate unlearning algorithms for pre-trained LLMs. The dataset contains both forget and retain splits of each category, and comes with both in-distribution and general retain sets.	arXiv, Hugging Face

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Large Language Model Unlearning

Table of Contents

Papers

Methods

Surveys and Position Papers

Blog Posts

Datasets

About

Releases

Packages

Contributors 2

License

chrisliu298/awesome-llm-unlearning

Folders and files

Latest commit

History

Repository files navigation

Awesome Large Language Model Unlearning

Table of Contents

Papers

Methods

Surveys and Position Papers

Blog Posts

Datasets

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages