Skip to content

Commit

Permalink
add open assistant dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
hiyouga committed Jun 28, 2023
1 parent a136eb4 commit 4c5717c
Show file tree
Hide file tree
Showing 7 changed files with 435,797 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ We select 100 instances in the `alpaca_gpt4_zh` dataset to evaluate the fine-tun
- [x] Fine-tuning the quantized model.
- [x] Writing a guidebook about how to fine-tune ChatGLM with this framework.
- [ ] Combining with state-of-the-art model editing algorithms. (*e.g. [MEND](https://arxiv.org/abs/2110.11309)*)
- [ ] Incorporating the [OpenAssistant Conversations Dataset](https://huggingface.co/datasets/OpenAssistant/oasst1) for SFT and alignment.
- [x] Incorporating the [OpenAssistant Conversations Dataset](https://huggingface.co/datasets/OpenAssistant/oasst1) for SFT and alignment.
- [ ] Incorporating the high quality Chinese instruction dataset [COIG](https://huggingface.co/datasets/BAAI/COIG).

## License
Expand Down
2 changes: 1 addition & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ python src/export_model.py \
- [x] 量化微调。
- [x] 撰写基于该框架的 ChatGLM 模型微调指南手册。
- [ ] 结合模型编辑技术。(例如:[MEND](https://arxiv.org/abs/2110.11309)
- [ ] 加入 [OpenAssistant 对话数据集](https://huggingface.co/datasets/OpenAssistant/oasst1)用于监督微调和意图对齐。
- [x] 加入 [OpenAssistant 对话数据集](https://huggingface.co/datasets/OpenAssistant/oasst1)用于监督微调和意图对齐。
- [ ] 加入高质量中文开源指令数据集 [COIG](https://huggingface.co/datasets/BAAI/COIG)

## 协议
Expand Down
40 changes: 40 additions & 0 deletions data/dataset_info.json
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,26 @@
"history": "history"
}
},
"oaast_sft": {
"file_name": "oaast_sft.json",
"file_sha1": "08912e34fb165db137d3436db4c35321e33b28d1",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"history": "history"
}
},
"oaast_sft_zh": {
"file_name": "oaast_sft_zh.json",
"file_sha1": "e0a2e7e8eff355434ada6c9b7f70bb915f941dd4",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"history": "history"
}
},
"example": {
"script_url": "example_dataset",
"columns": {
Expand Down Expand Up @@ -106,5 +126,25 @@
"response": "output",
"history": "history"
}
},
"oaast_rm": {
"file_name": "oaast_rm.json",
"file_sha1": "622d420e9b70003b210618253bd3d9d2891d86cb",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"history": "history"
}
},
"oaast_rm_zh": {
"file_name": "",
"file_sha1": "1065af1f3784dd61be5e79713a35f427b713a232",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"history": "history"
}
}
}
232,522 changes: 232,522 additions & 0 deletions data/oaast_rm.json

Large diffs are not rendered by default.

9,484 changes: 9,484 additions & 0 deletions data/oaast_rm_zh.json

Large diffs are not rendered by default.

186,163 changes: 186,163 additions & 0 deletions data/oaast_sft.json

Large diffs are not rendered by default.

7,586 changes: 7,586 additions & 0 deletions data/oaast_sft_zh.json

Large diffs are not rendered by default.

0 comments on commit 4c5717c

Please sign in to comment.