Skip to content

jakegrigsby/robomimic-decision-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decision Transformer for Robomimic

(this is a course project for CS391R 2022)

Table of Contents
  1. About The Project
  2. Getting Started
  3. Contact

About The Project

Decision Transformer reformulates offline reinforcement learning as a sequence modeling problem that can be effectively solved with large Transformer models. The most related work to our project is the original Decision Transformer.

Getting Started

TODO

Get the Datasets:

Dataset download link: https://drive.google.com/drive/folders/1dHMUOSLUr6AwW3PETn1DQO9CWMklTuqy?usp=share_link

Download the file and place the datasets/ folder in this repo. These datasets are generated with robomimic using the low dimensional state representations and dense reward information.

Dataset Types

Machine-Generated (MG)

Mixture of suboptimal data from state-of-the-art RL agents

Proficient-Human (PH) and Multi-Human (MH)

500 total, with 200 proficient human and 300 multi-human. Demonstrations from teleoperators of varying proficiency

Our setting: ALL data

More challenging combination of MG, MH, and PH Weighted towards lower-quality MG data

Dataset Tasks

Lift: lift the cube

Can: pick up the can and place it in proper spot

We chose these because they have large amounts of low quality machine generated data, which supports our goal of return conditioning on mixed quality data.

Our Decision Transformer Architecutre

new_arc_decision_transformer

We input state, actions, and returns-to-go into a causal transformer to get our desired actions. We combine the states actions and return to go into one token. This shortens the sequence length and computational requirements. The original decision transformer uses deterministic policy, we train a multi-modal stochastic policy, which helps to better model continuous actions.

The Semi-Sparse Reward Function:

During development, we found that robomimic uses sparse rewards due to a binary (success or no success) in the sequence data. We attempted to enable dense rewards in robomimic, but found that the dense reward returned was uncorrelated with dataset quality.

Through debugging, this led to us manually altering the reward function to add a semi-sparse success bonus that decreased on every time step, giving a wider distribution of target RTGs for the decision transformer than the default binary option of success in robomimic. The max sequence is 500, so if you go past 500 time steps you get nothing!

The Function: max(500 - success time, 0)

In future work, we hope that this function sees more iterations of development, and possibly altering the actual dense reward and not the function itself.

With this change, we altered the training data accordingly, as you can see in the new SequenceDataset.

Results

Found that return and past-action conditioning can make robomimic tasks more difficult:

Screenshot 2022-11-28 at 7 16 24 PM

Longer sequence modeling improves action prediction and eases problems caused by multi-modal demonstrations:

Screenshot 2022-11-28 at 7 16 46 PM

Decision Transformer lets us model the whole range of returns, not just the expert:

Screenshot 2022-11-28 at 7 18 41 PM

Data Tables:

Task: Lift

Type: All

Screenshot 2022-11-28 at 7 20 20 PM

[Naive BC]: Removing the low-quality data allows for expert performance, as in original robomimic

[DT-1, PH Only]: Removing the low-quality data allows for expert performance, as in original robomimic

[DT-20]: Decision Transformer can (mostly) filter the good demonstrations from the machine-generated noise

Task: Can

Type: All

Screenshot 2022-11-28 at 7 22 16 PM

Screenshot 2022-11-28 at 7 20 42 PM

[DT-3]: Action and RTG input sequence makes this task significantly more difficult. But DT is much better than naive BC

[DT-3, DT-10, DT-20, all small]: Smaller Transformer sizes decrease performance in the can task

[DT-3, Gaussian, Large]: Standard Gaussian policies are less capable of modeling multi-modal action distributions than our Gaussian Mixture Model default

Contact

Alex Chandler - alex [dot] chandler [at] utexas.edu

Jake Grigsby - grigsby [at] cs.utexas.edu

Omeed Tehrani - omeed [at] cs.utexas.edu

About

Return-Conditioned Imitation Learning for robomimic

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages