Skip to content

Commit

Permalink
RLHF tool
Browse files Browse the repository at this point in the history
  • Loading branch information
SupritYoung committed Jul 21, 2023
1 parent c9ec965 commit 2d836a5
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,10 @@ We select 100 instances in the `alpaca_gpt4_zh` dataset to evaluate the fine-tun

> FZ: freeze tuning, PT: P-Tuning V2 (we use `pre_seq_len=16` for fair comparison with LoRA), Params: the percentange of trainable parameters.
### RLHF Labeling

In the RLHF stage, it is necessary to manually rank the k response generated by the LLM. If there is no good labeling tool, you can choose [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master).

## Compared with Existing Implementations

- [THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning)
Expand Down

0 comments on commit 2d836a5

Please sign in to comment.