Skip to content

Commit

Permalink
Fix phrasing
Browse files Browse the repository at this point in the history
  • Loading branch information
chrisliu298 committed Mar 29, 2024
1 parent 3c4331f commit f601781
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,5 +83,5 @@ This repository tracks the latest research on machine unlearning in large langua
| Dataset | Description | Link |
| --------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------- |
| [TOFU](https://huggingface.co/datasets/locuslab/TOFU) | A synthetic QA dataset of fictitious authors generated by GPT-4. The datasets comes with three splits of the retain/forget sets, including 99/1, 95/5, and 90/10 (in percentage). The dataset also includes questions about real authors and world facts to evaluate the loss of general knowledge after unlearning. | [arXiv](https://arxiv.org/abs/2401.06121) |
| [WMDP](https://huggingface.co/datasets/cais/wmdp) | A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing over 4000 multiple-choice questions with similar style to MMLU. It also comes with the corpora in the three domains, and auxiliary corpora in economics, law, and physics as a task to unlearn subsets of MMLU. | [arXiv](https://arxiv.org/abs/2403.03218) |
| [WMDP](https://huggingface.co/datasets/cais/wmdp) | A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing over 4000 multiple-choice questions with similar style to MMLU. It also comes with corpora in the three domains. | [arXiv](https://arxiv.org/abs/2403.03218) |
| [MMLU Subsets](https://huggingface.co/datasets/cais/mmlu) | A task proposed along with the WMDP dataset. The goal is to unlearn (retain) three categories in the [MMLU](https://arxiv.org/abs/2009.03300) dataset: economics (econometrics and others), physics (math and others), and law (jurisprudence and others). The task requires high-precision unlearning, because the retain sets are categories closely related to the unlearning categories. | [arXiv](https://arxiv.org/abs/2009.03300) |

0 comments on commit f601781

Please sign in to comment.