Skip to content

A resource repository for machine unlearning in large language models

License

Notifications You must be signed in to change notification settings

chrisliu298/awesome-llm-unlearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

Awesome Large Language Model Unlearning

Awesome

This repository tracks the latest research on machine unlearning in large language models (LLMs). The goal is to offer a comprehensive list of papers, datasets, and resources relevant to the topic.

Note

If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue, and I will address it as soon as possible.

If you want to add a new paper, feel free to either open an issue or create a pull request.

Table of Contents

Papers

Methods

Paper Author(s) Year-Month Venue Code
Offset Unlearning for Large Language Models Huang et al. 2024-04 - GitHub
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge Lu et al. 2024-04 - -
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning Zhang et al. 2024-04 - GitHub
Localizing Paragraph Memorization in Language Models Stoehr et al. 2024-03 - -
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Li et al. 2024-03 - GitHub
Dissecting Language Models: Machine Unlearning via Selective Pruning Pochinkov and Schoots 2024-03 - -
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models Gu et al. 2024-03 - -
Ethos: Rectifying Language Models in Orthogonal Parameter Space Gao et al. 2024-03 - -
Towards Efficient and Effective Unlearning of Large Language Models for Recommendation Wang et al. 2024-03 - GitHub
Guardrail Baselines for Unlearning in LLMs Thaker et al. 2024-03 ICLR 2024 SeT-LLM Workshop -
Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning Zhao et al. 2024-02 - -
Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination Dong et al. 2024-02 - GitHub
Towards Safer Large Language Models through Machine Unlearning Liu et al. 2024-02 - GitHub
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models Wang et al. 2024-02 - -
Unlearnable Algorithms for In-context Learning Muresanu et al. 2024-02 - -
Machine Unlearning of Pre-trained Large Language Models Yao et al. 2024-02 - GitHub
Visual In-Context Learning for Large Vision-Language Models Zhou et al. 2024-02 - -
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models Xing et al. 2024-02 - -
Unlearning Reveals the Influential Training Data of Language Models Isonuma and Titov 2024-01 - -
TOFU: A Task of Fictitious Unlearning for LLMs Maini et al. 2024-01 - GitHub
Large Language Model Unlearning Yao et al. 2023-14 ICLR 2024 GitHub
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs Kadhe et al. 2023-12 NeurIPS 2023 SoLaR Workshop -
Making Harmful Behaviors Unlearnable for Large Language Models Zhou et al. 2023-11 - -
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models Ni et al. 2023-11 - -
Who's Harry Potter? Approximate Unlearning in LLMs Eldan and Russinovich 2023-10 - -
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models Wu et al. 2023-10 EMNLP 2023 GitHub
Unlearn What You Want to Forget: Efficient Unlearning for LLMs Chen and Yang 2023-10 EMNLP 2023 GitHub
In-Context Unlearning: Language Models as Few Shot Unlearners Pawelczyk et al. 2023-10 - -
Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble Liu and Kalinli 2023-09 - -
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks Patil et al. 2023-09 - GitHub
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation Hu et al. 2023-08 AAAI 2024 GitHub
Unlearning Bias in Language Models by Partitioning Gradients Yu et al. 2023-07 ACL (Findings) 2023 GitHub
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data Li et al. 2023-07 - -
What can we learn from Data Leakage and Unlearning for Law? Borkar 2023-07 - -
LEACE: Perfect linear concept erasure in closed form Belrose et al. 2023-06 NeurIPS 2023 GitHub
Composing Parameter-Efficient Modules with Arithmetic Operations Zhang et al. 2023-06 NeurIPS 2023 GitHub
KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment Wang et al. 2023-05 - GitHub
Editing Models with Task Arithmetic Ilharco et al. 2022-12 ICLR 2023 GitHub
Privacy Adhering Machine Un-learning in NLP Kumar et al. 2022-12 - -
The CRINGE Loss: Learning what language not to model Adolphs et al. 2022-11 - -
Knowledge Unlearning for Mitigating Privacy Risks in Language Models Jang et al. 2022-10 - GitHub
Quark: Controllable Text Generation with Reinforced Unlearning Lu et al. 2022-05 NeurIPS 2022 GitHub
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts Liu et al. 2021-05 ACL 2021 GitHub

Surveys and Position Papers

Paper Author(s) Year-Month Venue
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods Blanco-Justicia et al. 2024-04 -
Machine Unlearning for Traditional Models and Large Language Models: A Short Survey Xu 2024-04 -
The Frontier of Data Erasure: Machine Unlearning for Large Language Models Qu et al. 2024-03 -
Rethinking Machine Unlearning for Large Language Models Liu et al. 2024-02 -
Eight Methods to Evaluate Robust Unlearning in LLMs Lynch et al. 2024-02 -
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges Si et al. 2023-11 -
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions Zhang et al. 2023-07 -

Blog Posts

Blog Author(s)
Deep Forgetting & Unlearning for Safely-Scoped LLMs Stephen Casper

Datasets

Dataset Description Link
TOFU A synthetic QA dataset of fictitious authors generated by GPT-4. The datasets comes with three splits of the retain/forget sets, including 99/1, 95/5, and 90/10 (in percentage). The dataset also includes questions about real authors and world facts to evaluate the loss of general knowledge after unlearning. arXiv, Hugging Face
WMDP A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing over 4000 multiple-choice questions with similar style to MMLU. It also comes with corpora in the three domains. arXiv, Hugging Face
MMLU Subsets A task proposed along with the WMDP dataset. The goal is to unlearn (retain) three categories in the MMLU dataset: economics (econometrics and others), physics (math and others), and law (jurisprudence and others). The task requires high-precision unlearning, because the retain sets are categories closely related to the unlearning categories. arXiv, Hugging Face
arXiv and GitHub corpus A dataset for evaluating approximate unlearning algorithms for pre-trained LLMs. The dataset contains both forget and retain splits of each category, and comes with both in-distribution and general retain sets. arXiv, Hugging Face

Releases

No releases published

Packages

No packages published