Skip to content

Commit

Permalink
Merge pull request hiyouga#331 from SupritYoung/main
Browse files Browse the repository at this point in the history
add RLHF tool in README.md
  • Loading branch information
hiyouga committed Jul 21, 2023
2 parents eb26e3a + 4adfc82 commit dbe68ec
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,10 @@ We select 100 instances in the `alpaca_gpt4_zh` dataset to evaluate the fine-tun

> FZ: freeze tuning, PT: P-Tuning V2 (we use `pre_seq_len=16` for fair comparison with LoRA), Params: the percentange of trainable parameters.
### RLHF Labeling

In the RLHF stage, it is necessary to manually rank the k response generated by the LLM. If there is no good labeling tool, you can choose [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master).

## Compared with Existing Implementations

- [THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning)
Expand Down
4 changes: 4 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,10 @@ python src/export_model.py \

> FZ:Freeze 微调,PT:P-Tuning V2 微调(为了与 LoRA 公平比较,我们使用了 `pre_seq_len=16`),训练参数:可训练参数占全部参数的百分比。
### RLHF 标注工具

在 RLHF 阶段需要手工对模型生成的 k 个数据进行排序,如果没有好的标注工具,可以选用 [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master) 进行标注。

## 和现有类似项目的比较

- [THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning)
Expand Down

0 comments on commit dbe68ec

Please sign in to comment.