Decision Transformer for Robomimic

(this is a course project for CS391R 2022)

Table of Contents

About The Project
Getting Started
Contact

About The Project

Decision Transformer reformulates offline reinforcement learning as a sequence modeling problem that can be effectively solved with large Transformer models. The most related work to our project is the original Decision Transformer.

Getting Started

TODO

Get the Datasets:

Dataset download link: https://drive.google.com/drive/folders/1dHMUOSLUr6AwW3PETn1DQO9CWMklTuqy?usp=share_link

Download the file and place the datasets/ folder in this repo. These datasets are generated with robomimic using the low dimensional state representations and dense reward information.

Dataset Types

Machine-Generated (MG)

Mixture of suboptimal data from state-of-the-art RL agents

Proficient-Human (PH) and Multi-Human (MH)

500 total, with 200 proficient human and 300 multi-human. Demonstrations from teleoperators of varying proficiency

Our setting: ALL data

More challenging combination of MG, MH, and PH Weighted towards lower-quality MG data

Dataset Tasks

Lift: lift the cube

Can: pick up the can and place it in proper spot

We chose these because they have large amounts of low quality machine generated data, which supports our goal of return conditioning on mixed quality data.

Our Decision Transformer Architecutre

We input state, actions, and returns-to-go into a causal transformer to get our desired actions. We combine the states actions and return to go into one token. This shortens the sequence length and computational requirements. The original decision transformer uses deterministic policy, we train a multi-modal stochastic policy, which helps to better model continuous actions.

The Semi-Sparse Reward Function:

During development, we found that robomimic uses sparse rewards due to a binary (success or no success) in the sequence data. We attempted to enable dense rewards in robomimic, but found that the dense reward returned was uncorrelated with dataset quality.

Through debugging, this led to us manually altering the reward function to add a semi-sparse success bonus that decreased on every time step, giving a wider distribution of target RTGs for the decision transformer than the default binary option of success in robomimic. The max sequence is 500, so if you go past 500 time steps you get nothing!

The Function: max(500 - success time, 0)

In future work, we hope that this function sees more iterations of development, and possibly altering the actual dense reward and not the function itself.

With this change, we altered the training data accordingly, as you can see in the new SequenceDataset.

Results

Found that return and past-action conditioning can make robomimic tasks more difficult:

Longer sequence modeling improves action prediction and eases problems caused by multi-modal demonstrations:

Decision Transformer lets us model the whole range of returns, not just the expert:

Data Tables:

Task: Lift

Type: All

[Naive BC]: Removing the low-quality data allows for expert performance, as in original robomimic

[DT-1, PH Only]: Removing the low-quality data allows for expert performance, as in original robomimic

[DT-20]: Decision Transformer can (mostly) filter the good demonstrations from the machine-generated noise

Task: Can

Type: All

[DT-3]: Action and RTG input sequence makes this task significantly more difficult. But DT is much better than naive BC

[DT-3, DT-10, DT-20, all small]: Smaller Transformer sizes decrease performance in the can task

[DT-3, Gaussian, Large]: Standard Gaussian policies are less capable of modeling multi-modal action distributions than our Gaussian Mixture Model default

Contact

Alex Chandler - alex [dot] chandler [at] utexas.edu

Jake Grigsby - grigsby [at] cs.utexas.edu

Omeed Tehrani - omeed [at] cs.utexas.edu

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
agent.py		agent.py
data_utils.py		data_utils.py
env_utils.py		env_utils.py
ff.py		ff.py
learn.py		learn.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decision Transformer for Robomimic

About The Project

Getting Started

Get the Datasets: