Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
hiyouga committed Jul 21, 2023
1 parent dbe68ec commit fda94d5
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,9 +324,9 @@ We select 100 instances in the `alpaca_gpt4_zh` dataset to evaluate the fine-tun

> FZ: freeze tuning, PT: P-Tuning V2 (we use `pre_seq_len=16` for fair comparison with LoRA), Params: the percentange of trainable parameters.
### RLHF Labeling
## Projects

In the RLHF stage, it is necessary to manually rank the k response generated by the LLM. If there is no good labeling tool, you can choose [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master).
- [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master): A tool for ranking the responses of LLMs to generate annotated samples used in RLHF training.

## Compared with Existing Implementations

Expand Down
4 changes: 2 additions & 2 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,9 +329,9 @@ python src/export_model.py \

> FZ:Freeze 微调,PT:P-Tuning V2 微调(为了与 LoRA 公平比较,我们使用了 `pre_seq_len=16`),训练参数:可训练参数占全部参数的百分比。
### RLHF 标注工具
## 友情链接

在 RLHF 阶段需要手工对模型生成的 k 个数据进行排序,如果没有好的标注工具,可以选用 [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master) 进行标注
- [SupritYoung/RLHF-Label-Tool](https://github.com/SupritYoung/RLHF-Label-Tool/tree/master):一个给大模型生成结果进行排序,从而获得用于 RLHF 训练的标注数据的平台

## 和现有类似项目的比较

Expand Down

0 comments on commit fda94d5

Please sign in to comment.