Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 918 Bytes

DATASETS.md

File metadata and controls

10 lines (7 loc) · 918 Bytes

Datasets used in the paper

Dataset Description Source Percentage in Training Mixture (RT-2-PaLI-X) Percentage in Training Mixture (RT-2-PaLM-E)
WebLI Around 10B image-text pairs across 109 languages, filtered to the top 10% scoring cross-modal similarity examples to give 1B training examples. Chen et al. (2023b), Driess et al. (2023) N/A N/A
Episodic WebLI Not used in co-fine-tuning RT-2-PaLI-X. Chen et al. (2023a) N/A N/A
Robotics Dataset Demonstration episodes collected with a mobile manipulation robot. Each demonstration is annotated with a natural language instruction from one of seven skills. Brohan et al. (2022) 50% 66%
Language-Table Used for training on several prediction tasks. Lynch et al. (2022) N/A N/A