A light-weighted ResNet modification for building reward model on image-based RLHF pipeline
- place your images in
images/
folder and rename them asimage_{index}.jpg
- place your preference labels in
data/
folder - download resnet50 pretrained weight and put it int
models/
folder - train the model using the commend
python train.py --train data/train_judgements.csv --test data/test_judgements.csv --val data/validation_judgements.csv --resnet models/resnet50_best.pth --batch_size 8 --num_workers 4 --num_epoch 50 --lr 1e-3 --eval_ep 8 --grad_accum 8
- run the visualization code
python visualize.py
Change of elo during sample runs Distribution of overall and individual images
After undergoing fine-tuning for 50 epochs on a compact dataset with a low image-to-preference ratio, comprising 10,000 preferences across 3,000 images (averaging three comparisons per image), the model attained 95% of the desired ELO performance. GradCAM visualizations reveal that the model has adeptly internalized human perceptual patterns related to walkability, specifically honing in on key pedestrian infrastructures such as traffic lights, fences, and bridges, as well as identifying impediments like vehicles and unauthorized street blockages by goods.
Model | Accuracy |
---|---|
Elo Score (Baseline) | 77.8% |
ResNet50 | 73.8% |