From 51d29ae3f0e67767862d13a10051bf7959c66a7b Mon Sep 17 00:00:00 2001 From: Hongyao Tang <847606784@qq.com> Date: Fri, 15 Apr 2022 11:29:39 +0800 Subject: [PATCH] Add files via upload --- self-supervised-rl/README.md | 171 ++++++++++------------------------- 1 file changed, 50 insertions(+), 121 deletions(-) diff --git a/self-supervised-rl/README.md b/self-supervised-rl/README.md index 96f6ddf7..238f3a1f 100644 --- a/self-supervised-rl/README.md +++ b/self-supervised-rl/README.md @@ -1,150 +1,79 @@ -# **Self-supervised RL**: A Unified Algorithmic Framework & Opensource Code Implementation of Algorithms for Self-supervised Reinforcement Leanring (SSRL) with Representations +# RL with Policy Representation -This repo contains representative research works of TJU-RL-Lab on the topic of Self-supervised Representation Learning for RL. +Policy Representation is one major category in our taxonomy. +The core research content of policy representation is to discover or learn **low-dimensional representation for RL policy**, which are beneficial to downstream RL tasks (e.g., policy learning). +In a general view, any decision-making problems which involves multiple policies or a policy optimization process, can be the potential downstream tasks of policy representation. -This repo will be constantly updated to include new researches made by TJU-RL-Lab. -(The development of this repo is in progress at present.) - - - - -## Introduction -Reinforcement Learning (RL) is a major branch of Machine Learning, with expertise in solving sequential decision-making problems. -Following the typical paradigm of Agent-Environment Interface, -an RL agent interacts with the environment by performing its policy and receiving environmental states (or observations) and rewards. -The agent's purpose is to maximize its expected discounted cumulative rewards, through trial-and-error. - -Since the RL agent always receives, processes and delivers all kinds of data in the learning process, -how to **properly deal with such "data"** is naturally one key point to the effectiveness and efficiency of RL. -In general, whenever you are dealing with high-dimensional or complex data (note that even the dimensionality is low, the data can also be complex to our learning problem), -or in another word we may call it "not well represented", we often need good representation of data. +In our opinion, RL with Policy Representation contains the research on: +- **What an optimal policy representation should be like.** +- **How to obtain or learn desired policy representation in specific cases.** +- **How to make use of policy representation to develop and improve RL in different ways.** -One may be familiar to many examples in Computer Vision (CV) and Natural Language Processing (NLP). -In recent years, **Self-supervised Learning** (SSL) prevails in CV and NLP, boosting great advances in unsupervised pre-training, large model and etc. -In most cases mentioned above, the main idea of SSL is to **learn good representation without supervisions**, -which is often done by optimizing various pretext tasks (i.e., auxiliary tasks), e.g., reconstruction, prediction and contrast. -Now, we focus on the representations in RL, seeking for an answer to the question above - **"how to properly consider learn/use representations for RL"**. +## Repo Content -Among possible representations in RL, state representation is one major branch. -The researches on state representation dates back to heuristic representations in linear approximation, -state abstraction theories & methods in the early 21th century (actually even earlier). -New advances on state representation are also emerging in recent years, mainly to deal with high-dimensional states (or observations), e.g., image inputs. +This repo contains representative research works of TJU-RL-Lab on the topic of RL with Policy Representation. +Currently, we focus on how policy representation can improve policy learning process in a general way. -In our framework, we focus on three key questions: -- **What should a good representation for RL be?** -- **How can we obtain or realize such good representations?** -- **How can we making use of good representations to improve RL?** +### Two Types of Generalization -We view the three questions above as a general guidance for our researches. +One major characteristic brought by policy representation is **value (more broadly, function) generalization among policies**. +Two general types of generalization is shown below: +- **Global Generalization**: denotes the general cases where values (or other policy-dependent functions) already learned (or known) for some policies can generalize to the values of other policies (i.e., unknown or unseen ones). +- **Local Generalization**: denotes the specific cases where values (or other policy-dependent functions) already learned (or known) for historical (or previous) policies encountered along the **policy improvement path** to the values of the following (or successive) policies we are going to estimate later. -### Taxonomy of SSRL -This repo follow a systematic taxnomy of Self-supervised RL with Representations proposed by TJU-RL-Lab, which consists of: -- SSRL with State Representation -- SSRL with Action Representation -- SSRL with Policy Representation -- SSRL with Environment (and Task) Representation -- SSRL with Other Representation +
policy_generalization
-For a tutorial of this taxnomy, we refer the reader to our [ZhiHu blog series](https://zhuanlan.zhihu.com/p/413321572). +### GPI with PeVFA +An obvious consequence of local generalization is that, we now have additional value generalization for the successive policies during typical Generalized Policy Iteration (GPI). +Taking advantage of this characteristic, we propose a new learning paradigm called **Generalized Policy Iteration with Policy-extended Value Function Approximator (GPI with PeVFA)**. +A comparison between conventional GPI and GPI with PeVFA is illsutrated below: -### A Unified Algorithmic Framework (Implementation Design) of SSRL Algorithm +
GPI-with-PeVFA
-All SSRL algorithms with representation in our taxonmy follows the same algorithmic framework. -The illsutration of our Unified Algorithmic Framework (Implementation Design) of SSRL Algorithm is shown below. -From left to right, the framework consists of four phases: -- **Data Input** -- **Encoding and Transformation** -- **Methods and Criteria of Representation Learning** -- **Downstream RL Problems** - -The unified framework we propose is general. Almost all currently existing SSRL algorithms can be interpreted with our framework. -In turn, this unified framework can also serve as a guidance when we are working on designing a new algorithm. - -
Algorithmic Framework of SSRL
+GPI with PeVFA is general and can be fulfilled by various means in principle. The key points we may have to conisder is: +- Whether the additional generalization is beneficial to or improves conventional RL. +- How to properly represent policies and establish PeVFA to release the best potentials. +## An Overall View of Research Works in This Repo -### Ecology of SSRL +This repo will be constantly updated to include new researches made by TJU-RL-Lab. +(The development of this repo is in progress at present.) -Beyond the opensource of our research works, we plan to establish the ecology of SSRL in the future. -Driven by **three key fundamental challenges of RL**, we are working on research explorations at the frontier -**from the different perspectives of self-supervised representation in RL**. -For algorithms and methods proposed, we plan to study **a unified algorithmic framework** togather with **a unified opensource code-level implementation framework**. -These representations are expected to **boost the learning in various downstream RL problems**, in straightforward or sophatiscated ways. -Finally, our ultimate goal is to **land self-supervised representation driven RL in real-world decision-making scenarios**. +| Method | Is Contained | Is ReadME Prepared | Author | Publication | Link | +| ------ | --- | --- | ------ | ------ | ------ | +| [PPO-PeVFA](./Policy-based_RL_with_PeVFA/PPO-PeVFA) | ✅ | ✅ | Hongyao Tang | AAAI 2022 | https://arxiv.org/abs/2010.09536 | -
Ecology of SSRL
## Installation -The algorithms in this repo are all implemented **python 3.5** (and versions above). -**Tensorflow 1.x** and **PyTorch** are the main DL code frameworks we adopt in this repo with different choices in different algorithms. - -First of all, we recommend the user to install **anaconada** and or **venv** for convenient management of different python envs. - -In this repo, the following RL environments may be needed: -- [OpenAI Gym](https://github.com/openai/gym) (e.g., MuJoCo, Robotics) -- [MinAtar](https://github.com/kenjyoung/MinAtar) -- ...... -- And some environments designed by ourselves. - -Note that each algorithm may use only one or several environments in the ones listed above. Please refer to the page of specific algorithm for concrete requirements. - -To clone this repo: - -``` -git clone http://rl.beiyang.ren/tju_rl/self-supervised-rl.git -``` - -Note that this repo is a collection of multiple research branches (according to the taxonomy). -Environments and code frameworks may differ among different branches. Thus, please follow the installation guidance provided in the specific branch you are insterested in. - - -## An Overall View of Research Works in This Repo - - -| Category | Method | Is Contained | Is ReadME Prepared | Author | Publication | Link | -| ------ | ------ | --- | --- | ------ | ------ | ------ | -| Action | HyAR |✅ | ✅ | Boyan Li | ICLR 2022 | https://openreview.net/forum?id=64trBbOhdGU | -| Policy | PPO-PeVFA | ✅ | ✅ | Hongyao Tang |AAAI 2022 | https://arxiv.org/abs/2010.09536 | -| Env&task | CCM | ❌ | ❌ |Haotian Fu | AAAI 2021 | https://ojs.aaai.org/index.php/AAAI/article/view/16914 | -| Env&task | PAnDR |✅ | ❌ |Tong Sang| [ICLR 2022 GPL Workshop](https://ai-workshops.github.io/generalizable-policy-learning-in-the-physical-world/) | https://arxiv.org/abs/2204.02877 | -| Other | VDFP |✅ | ✅ |Hongyao Tang| AAAI 2021 | https://ojs.aaai.org/index.php/AAAI/article/view/17182 | +The algorithms in this repo are all implemented **python 3.5** (and versions above). **Tensorflow 1.x** and **PyTorch** are the main DL code frameworks we adopt in this repo with different choices in different algorithms. +Note that the algorithms contained in this repo may not use all the same environments. Please check the README of specific algorithms for detailed installation guidance. ## TODO +- [ ] Reconstruct PPO-PeVFA for modularity +- [x] Add README file for PPO-PeVFA -- [ ] Update the README files for each branch - -## Liscense - -This repo uses [MIT Liscense](https://github.com/TJU-DRL-LAB/self-supervised-rl/blob/main/LICENSE). - -## Citation - -If this repository has helped your research, please cite the following: -``` -@article{tjurllab22ssrl, - author = {TJU RL Lab}, - title = {A Unified Repo for Self-supervised RL with Representations}, - year = {2022}, - url = {https://github.com/TJU-DRL-LAB/self-supervised-rl}, -} -``` - +## Related Work +Here we provide a useful list of representative related works on policy representation and policy-extended value functions. -## Major Update Log -2022-04-07: -- Readme files updated for several branches (state/environment representation). -- Codes of our work PAnDR are uploaded. +### Policy-extended Value Function: +- Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Hangyu Mao, Wulong Liu, Yaodong Yang, Changmin Yu. What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator. AAAI 2021. +- Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon. Policy Evaluation Networks. arXiv:2002.11833 +- Francesco Faccio, Jürgen Schmidhuber. Parameter-based Value Functions. ICLR 2021 +- Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus. Fast Adaptation to New Environments via Policy-Dynamics Value Functions. ICML 2020 +### Policy Representation: +- Aditya Grover, Maruan Al-Shedivat, Jayesh K. Gupta, Yuri Burda, Harrison Edwards. Learning Policy Representations in Multiagent Systems. ICML 2018 +- Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus. Fast Adaptation via Policy-Dynamics Value Functions. ICML 2020 +- Nemanja Rakicevic, Antoine Cully, Petar Kormushev. Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution. GECCO 2021 +- Rundong Wang, Runsheng Yu, Bo An, Zinovi Rabinovich. I²HRL: Interactive Influence-based Hierarchical Reinforcement Learning. IJCAI 2020 +- Oscar Chang, Robert Kwiatkowski, Siyuan Chen, Hod Lipson. Agent Embeddings: A Latent Representation for Pole-Balancing Networks. AAMAS 2019 +- Isac Arnekvist, Danica Kragic, Johannes A. Stork. VPE: Variational Policy Embedding for Transfer Reinforcement Learning. ICRA 2019 +- Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, Ilya O. Tolstikhin. Predicting Neural Network Accuracy from Weights. arXiv:2002.11448 -2022-03-24: -- Readme files updated for several branches (action/policy/other representation) and individual works (VDFP/HyAR/PeVFA). -2022-03-18: -- Main page readme uploaded. -- VDFP, HyAR, PeVFA codes - first commit.