🥞 RewardLM

Reward a Language Model with pancakes 🥞

Usage

This repository gathers three main modules. Their operation is shared, allowing the training of any generative model following the two main techniques of Reinforcement Learning w/ PPO (🥞 RLAIF) and the more classical 👨🏼‍🏫 fine-tune using PEFT techniques. The third module, ⚖️ Toxicity Meter, deals with measuring the toxicity of the responses of the generative model, whether pre-trained or after the 🥞 or 👨🏼‍🏫 process.

🥞 Reinforcement Learning from AI Feedback (RLAIF)

This module allows the use of reinforcement learning algorithms (specifically PPO) to optimise models according to a direction decided by the reward model. The process is similar to RLHF (Reinforcement Learning from Human Feedback) but removes the human component from the loop to automate the process.

To 🥞 Reward a generative LM using the DIALCONAN dataset:

Select the generative and reward models you intend to use and other hyperparameters:

import torch
from rewardlm.core.RL.RLModel import RLModel

rlmanager = RLModel(
    model_id = 'EleutherAI/pythia-70m',
    reward_model_id = 'facebook/roberta-hate-speech-dynabench-r4-target',
    optimized = True,   # use 8-bit PEFT
    # log_method = 'wandb',
    bs = 256,
    # force the use of CPU on Apple Silicon devices (mps not supported):
    accelerator_kwargs = {
        'cpu': False if torch.cuda.is_available() else True,
    },
)

Download the original dataset using the built in preprocessing functions:

from rewardlm.data.data_utils import get_DIALOCONAN_prepro

data = get_DIALOCONAN_prepro(delete_last_assistant_response = True)
dataset = rlmanager.generate_dataset(text = data)

Start the PPO learning algorithm:

history = rlmanager.train_PPO(dataset = dataset)

👨🏼‍🏫 Model fine-tune

Each generative model can be fine-tuned on the same data used for Reinforcement Learning. In this way, it is possible to compare the results obtained from both techniques.

To fine-tune a generative model using the DIALCONAN dataset:

Select the model you intend to use and the GenerativeModel to get the use it:

import torch
from rewardlm.core.GenerativeModel import GenerativeModel

model_id = 'facebook/opt-350m'
generator_manager = GenerativeModel(
    model_id,
    load_dtype = '8-bit' if torch.cuda.is_available() else 'fp32',
    # force the use of CPU on Apple Silicon devices (mps not supported):
    accelerator_kwargs = {
        'cpu': False if torch.cuda.is_available() else True,
    },
)

Download the original dataset using the built in preprocessing functions:

from rewardlm.data.data_utils import get_DIALOCONAN_prepro
from rewardlm.data.CustomDatasets import PromptDataset_CLM

data = get_DIALOCONAN_prepro()

dataset = PromptDataset_CLM(
    tokenizer = generator_manager.tokenizer,
    text = data,
    custom_prompt = custom_prompt,
)

Start the fine-tutning process:

generator_manager.fine_tune(
    torch_dataset = dataset, 
    optimized = True if torch.cuda.is_available() else False,
)

⚖️ ToxicityMeter

Toxicity meter allows measuring the toxicity of generative LM based on the output of a classifier (RoBERTa for hate speech as default if no RewardModel is used)

Select a configuration (or create your own):

from rewardlm.utils import load_config
config = load_config(name = 'RedPajama-INCITE-Chat-3B-v1')

Use the GenerativeModel class to get a generation manager:

import torch
from transformers import GenerationConfig
from rewardlm.core.GenerativeModel import GenerativeModel
from rewardlm.ToxicityMeter import ToxicityMeter
from rewardlm.utils import load_config

generator_manager = GenerativeModel(
    config['model_id'],
    load_from_peft = config['load_from_peft'],
    generation_config=config['generation']['generation_config'],
    # force the use of CPU on Apple Silicon devices (mps not supported):
    accelerator_kwargs = {
        'cpu': False if torch.cuda.is_available() else True,
    },
)

Customize the prompt from the original dataset and generate the toxicity_df dataset:

from rewardlm.data.data_utils import get_real_toxicity_prompts

toxicity_meter = ToxicityMeter(generator_manager)
batchsize = 12
custom_prompt = (config['generation']['custom_prompt']['user_name'] + 
                 ' "{prompt}".\n' + 
                 config['generation']['custom_prompt']['bot_name'] + ' '
                )

df = get_real_toxicity_prompts()
toxicity_df = toxicity_meter.measure_toxicity(
    text_prompt = df if not config['data']['subset'] else df[:config['data']['subset_size']],
    custom_prompt=custom_prompt, 
    batch_size=batchsize,
    print_response=True,
)

Save the obtained results:

fld = './result analysis/tmp'
toxicity_df.to_csv(
    fld + f'/measured_tox_instruct_{config["generation"]["model_id"].split("/")[-1]}_{load_dtype}.csv'
)

Tested Models and datasets

Generative language models:

LaMini-LM: Small-sized collection of efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions GitHub, Paper
RedPajama-*: Source
BloomZ: Family of models capable of following human instructions in dozens of languages zero-shot GitHub, Paper
Pythia: Predominantly abandoned in favour of instructed models. Model(s) that combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. GitHub, Paper
Falcon-*-isntruct: Causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. Source, Source istructed 7B model.

Datasets:

Real Toxicity Prompts: Mainly used for the ⚖️ ToxicityMeter module. Dataset of 100K naturally occurring, sentence-level prompts derived from a large corpus of English web text, paired with toxicity scores from a widely-used toxicity classifier. GitHub, Paper
DIALOCONAN: Mainly used for 👨🏼‍🏫 fine-tuning and 🥞 RLAF modules. Datasets of counter-narratives to fight online hate speech. GitHub, Paper

Reward models:

roberta-hate-speech-dynabench-r4-target: Model trained on ∼40,000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation. Paper

Development

How to setup on Google Colab:

Import the main notebook in colab
Include the following cell at the beginning:

!git clone https://__TOKEN_GIT__:@github.com/DanielSc4/RewardLM.git
%cd RewardLM/
!pip install -r requirements.txt
from huggingface_hub import login
login(token = '__TOKEN_HF__')

[Opt, only if the repo is private] Replace __TOKEN_GIT__ with your git token (more info here)
Replace __TOKEN_HF__ with you 🤗 HuggingFace personal token

How to setup developer environment

Dependency install:

Install poetry, a Python package manager
It is recommended to run the following command to let poetry create the virtual environment for the project directly inside the root folder, allowing IDEs to detect dependencies and executables

poetry config virtualenvs.in-project true

Inside the root folder, run poetry install to get all the dependencies. See Poetry docs for a thorough explanation of how poetry works

Activating virtual env:

To run a project file, you will need to use the interpreter installed by Poetry in the virtual environment, usually located in rewardlm/.venv/bin/. To do that, you can use poetry run command, followed by the name of the script that you want to run (Poetry run doc).

You can also run the following command to ensure that the terminal will use the correct python version (the one downloaded in the virtual env) together with its whole set of dependencies:

source .venv/bin/activate

Name		Name	Last commit message	Last commit date
Latest commit History 326 Commits
configs		configs
data		data
interpretability		interpretability
logs		logs
results		results
rewardlm		rewardlm
scripts_hpc		scripts_hpc
tests		tests
.gitignore		.gitignore
README.md		README.md
main_notebook.ipynb		main_notebook.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scriptToxicity.py		scriptToxicity.py
script_FT.py		script_FT.py
script_RL.py		script_RL.py
script_RL_postFT.py		script_RL_postFT.py
test_trained.py		test_trained.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥞 RewardLM

Usage

🥞 Reinforcement Learning from AI Feedback (RLAIF)

👨🏼‍🏫 Model fine-tune

⚖️ ToxicityMeter

Tested Models and datasets

Generative language models:

Datasets:

Reward models:

Development

How to setup on Google Colab:

How to setup developer environment

`Backlog`:

About

Releases

Packages

Languages

DanielSc4/RewardLM

Folders and files

Latest commit

History

Repository files navigation

🥞 RewardLM

Usage

🥞 Reinforcement Learning from AI Feedback (RLAIF)

👨🏼‍🏫 Model fine-tune

⚖️ ToxicityMeter

Tested Models and datasets

Generative language models:

Datasets:

Reward models:

Development

How to setup on Google Colab:

How to setup developer environment

Backlog:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Backlog`:

Packages