Merge pull request hiyouga#331 from SupritYoung/main

add RLHF tool in README.md
tayanzhuifeng · Jul 21, 2023 · dbe68ec · dbe68ec
2 parents eb26e3a + 4adfc82
commit dbe68ec
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -324,6 +324,10 @@ We select 100 instances in the `alpaca_gpt4_zh` dataset to evaluate the fine-tun
 
 > FZ: freeze tuning, PT: P-Tuning V2 (we use `pre_seq_len=16` for fair comparison with LoRA), Params: the percentange of trainable parameters.
 
+### RLHF Labeling
+
+In the RLHF stage, it is necessary to manually rank the k response generated by the LLM. If there is no good labeling tool, you can choose [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master).
+
 ## Compared with Existing Implementations
 
 - [THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning)

diff --git a/README_zh.md b/README_zh.md
@@ -329,6 +329,10 @@ python src/export_model.py \
 
 > FZ：Freeze 微调，PT：P-Tuning V2 微调（为了与 LoRA 公平比较，我们使用了 `pre_seq_len=16`），训练参数：可训练参数占全部参数的百分比。
 
+### RLHF 标注工具
+
+在 RLHF 阶段需要手工对模型生成的 k 个数据进行排序，如果没有好的标注工具，可以选用 [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master) 进行标注。
+
 ## 和现有类似项目的比较
 
 - [THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning)