RLHF tool

squirrelsgit · Jul 21, 2023 · 2d836a5 · 2d836a5
1 parent c9ec965
commit 2d836a5
Showing 1 changed file with 4 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -324,6 +324,10 @@ We select 100 instances in the `alpaca_gpt4_zh` dataset to evaluate the fine-tun
 
 > FZ: freeze tuning, PT: P-Tuning V2 (we use `pre_seq_len=16` for fair comparison with LoRA), Params: the percentange of trainable parameters.
 
+### RLHF Labeling
+
+In the RLHF stage, it is necessary to manually rank the k response generated by the LLM. If there is no good labeling tool, you can choose [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master).
+
 ## Compared with Existing Implementations
 
 - [THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning)