From 51d29ae3f0e67767862d13a10051bf7959c66a7b Mon Sep 17 00:00:00 2001
From: Hongyao Tang <847606784@qq.com>
Date: Fri, 15 Apr 2022 11:29:39 +0800
Subject: [PATCH] Add files via upload

---
 self-supervised-rl/README.md | 171 ++++++++++-------------------------
 1 file changed, 50 insertions(+), 121 deletions(-)

diff --git a/self-supervised-rl/README.md b/self-supervised-rl/README.md
index 96f6ddf7..238f3a1f 100644
--- a/self-supervised-rl/README.md
+++ b/self-supervised-rl/README.md
@@ -1,150 +1,79 @@
-# **Self-supervised RL**: A Unified Algorithmic Framework & Opensource Code Implementation of Algorithms for Self-supervised Reinforcement Leanring (SSRL) with Representations
+# RL with Policy Representation
 
-This repo contains representative research works of TJU-RL-Lab on the topic of Self-supervised Representation Learning for RL.
+Policy Representation is one major category in our taxonomy. 
+The core research content of policy representation is to discover or learn **low-dimensional representation for RL policy**, which are beneficial to downstream RL tasks (e.g., policy learning).
+In a general view, any decision-making problems which involves multiple policies or a policy optimization process, can be the potential downstream tasks of policy representation.
 
-This repo will be constantly updated to include new researches made by TJU-RL-Lab. 
-(The development of this repo is in progress at present.)
-
-
-
-
-## Introduction
-Reinforcement Learning (RL) is a major branch of Machine Learning, with expertise in solving sequential decision-making problems.
-Following the typical paradigm of Agent-Environment Interface, 
-an RL agent interacts with the environment by performing its policy and receiving environmental states (or observations) and rewards.
-The agent's purpose is to maximize its expected discounted cumulative rewards, through trial-and-error.
-
-Since the RL agent always receives, processes and delivers all kinds of data in the learning process, 
-how to **properly deal with such "data"** is naturally one key point to the effectiveness and efficiency of RL.
-In general, whenever you are dealing with high-dimensional or complex data (note that even the dimensionality is low, the data can also be complex to our learning problem), 
-or in another word we may call it "not well represented", we often need good representation of data.
+In our opinion, RL with Policy Representation contains the research on:
+- **What an optimal policy representation should be like.**
+- **How to obtain or learn desired policy representation in specific cases.**
+- **How to make use of policy representation to develop and improve RL in different ways.**
 
-One may be familiar to many examples in Computer Vision (CV) and Natural Language Processing (NLP).
-In recent years, **Self-supervised Learning** (SSL) prevails in CV and NLP, boosting great advances in unsupervised pre-training, large model and etc. 
-In most cases mentioned above, the main idea of SSL is to **learn good representation without supervisions**, 
-which is often done by optimizing various pretext tasks (i.e., auxiliary tasks), e.g., reconstruction, prediction and contrast.
-Now, we focus on the representations in RL, seeking for an answer to the question above - **"how to properly consider learn/use representations for RL"**.
+## Repo Content
 
-Among possible representations in RL, state representation is one major branch.
-The researches on state representation dates back to heuristic representations in linear approximation, 
-state abstraction theories & methods in the early 21th century (actually even earlier).
-New advances on state representation are also emerging in recent years, mainly to deal with high-dimensional states (or observations), e.g., image inputs.
+This repo contains representative research works of TJU-RL-Lab on the topic of RL with Policy Representation.
+Currently, we focus on how policy representation can improve policy learning process in a general way.
 
-In our framework, we focus on three key questions:
-- **What should a good representation for RL be?**
-- **How can we obtain or realize such good representations?**
-- **How can we making use of good representations to improve RL?**
+### Two Types of Generalization
 
-We view the three questions above as a general guidance for our researches.
+One major characteristic brought by policy representation is **value (more broadly, function) generalization among policies**.
 
+Two general types of generalization is shown below:
+- **Global Generalization**: denotes the general cases where values (or other policy-dependent functions) already learned (or known) for some policies can generalize to the values of other policies (i.e., unknown or unseen ones).
+- **Local Generalization**: denotes the specific cases where values (or other policy-dependent functions) already learned (or known) for historical (or previous) policies encountered along the **policy improvement path** to the values of the following (or successive) policies we are going to estimate later.
 
-### Taxonomy of SSRL
-This repo follow a systematic taxnomy of Self-supervised RL with Representations proposed by TJU-RL-Lab, which consists of:
-- SSRL with State Representation
-- SSRL with Action Representation
-- SSRL with Policy Representation
-- SSRL with Environment (and Task) Representation
-- SSRL with Other Representation
+<div align=center><img align="center" src="./../assets/pr_readme_figs/policy_generalization.png" alt="policy_generalization" style="zoom:20%;" /></div>
 
-For a tutorial of this taxnomy, we refer the reader to our [ZhiHu blog series](https://zhuanlan.zhihu.com/p/413321572).
+### GPI with PeVFA
 
+An obvious consequence of local generalization is that, we now have additional value generalization for the successive policies during typical Generalized Policy Iteration (GPI).
+Taking advantage of this characteristic, we propose a new learning paradigm called **Generalized Policy Iteration with Policy-extended Value Function Approximator (GPI with PeVFA)**.
 
+A comparison between conventional GPI and GPI with PeVFA is illsutrated below:
 
-### A Unified Algorithmic Framework (Implementation Design) of SSRL Algorithm
+<div align=center><img align="center" src="./../assets/pr_readme_figs/GPI_with_PeVFA.png" alt="GPI-with-PeVFA" style="zoom:20%;" /></div>
 
-All SSRL algorithms with representation in our taxonmy follows the same algorithmic framework.  
-The illsutration of our Unified Algorithmic Framework (Implementation Design) of SSRL Algorithm is shown below.
-From left to right, the framework consists of four phases:
-- **Data Input**
-- **Encoding and Transformation**
-- **Methods and Criteria of Representation Learning**
-- **Downstream RL Problems**
-
-The unified framework we propose is general. Almost all currently existing SSRL algorithms can be interpreted with our framework. 
-In turn, this unified framework can also serve as a guidance when we are working on designing a new algorithm.
-
-<div align=center><img align="center" src="./assets/alg_framework.png" alt="Algorithmic Framework of SSRL" style="zoom:20%;" /></div>
+GPI with PeVFA is general and can be fulfilled by various means in principle. The key points we may have to conisder is:
+- Whether the additional generalization is beneficial to or improves conventional RL.
+- How to properly represent policies and establish PeVFA to release the best potentials. 
 
+## An Overall View of Research Works in This Repo  
 
-### Ecology of SSRL
+This repo will be constantly updated to include new researches made by TJU-RL-Lab. 
+(The development of this repo is in progress at present.)
 
-Beyond the opensource of our research works, we plan to establish the ecology of SSRL in the future.
-Driven by **three key fundamental challenges of RL**, we are working on research explorations at the frontier 
-**from the different perspectives of self-supervised representation in RL**.
-For algorithms and methods proposed, we plan to study **a unified algorithmic framework** togather with **a unified opensource code-level implementation framework**.
-These representations are expected to **boost the learning in various downstream RL problems**, in straightforward or sophatiscated ways.
-Finally, our ultimate goal is to **land self-supervised representation driven RL in real-world decision-making scenarios**.
+| Method | Is Contained | Is ReadME Prepared | Author | Publication | Link |
+| ------ | --- | --- | ------ | ------ | ------ |
+| [PPO-PeVFA](./Policy-based_RL_with_PeVFA/PPO-PeVFA) | ✅ | ✅ | Hongyao Tang  | AAAI 2022 | https://arxiv.org/abs/2010.09536 |
 
-<div align=center><img align="center" src="./assets/Ecology_of_SSRL.png" alt="Ecology of SSRL" style="zoom:40%;" /></div>
 
 ## Installation
 
-The algorithms in this repo are all implemented **python 3.5** (and versions above). 
-**Tensorflow 1.x** and **PyTorch** are the main DL code frameworks we adopt in this repo with different choices in different algorithms.
-
-First of all, we recommend the user to install **anaconada** and or **venv** for convenient management of different python envs.
-
-In this repo, the following RL environments may be needed:
-- [OpenAI Gym](https://github.com/openai/gym) (e.g., MuJoCo, Robotics)
-- [MinAtar](https://github.com/kenjyoung/MinAtar)
-- ......
-- And some environments designed by ourselves.
-
-Note that each algorithm may use only one or several environments in the ones listed above. Please refer to the page of specific algorithm for concrete requirements.
-
-To clone this repo:
-
-```
-git clone http://rl.beiyang.ren/tju_rl/self-supervised-rl.git
-```
-
-Note that this repo is a collection of multiple research branches (according to the taxonomy). 
-Environments and code frameworks may differ among different branches. Thus, please follow the installation guidance provided in the specific branch you are insterested in.
-
-
-## An Overall View of Research Works in This Repo  
-
-
-| Category | Method | Is Contained | Is ReadME Prepared | Author | Publication | Link |
-| ------ | ------ | --- | --- | ------ | ------ | ------ |
-| Action | HyAR |✅ | ✅  |  Boyan Li | ICLR 2022 | https://openreview.net/forum?id=64trBbOhdGU |
-| Policy | PPO-PeVFA | ✅ | ✅ | Hongyao Tang  |AAAI 2022 | https://arxiv.org/abs/2010.09536 |
-| Env&task | CCM | ❌ | ❌ |Haotian Fu | AAAI 2021 | https://ojs.aaai.org/index.php/AAAI/article/view/16914 |
-| Env&task | PAnDR |✅ | ❌ |Tong Sang| [ICLR 2022 GPL Workshop](https://ai-workshops.github.io/generalizable-policy-learning-in-the-physical-world/) | https://arxiv.org/abs/2204.02877 |
-| Other | VDFP |✅ | ✅ |Hongyao Tang| AAAI 2021 | https://ojs.aaai.org/index.php/AAAI/article/view/17182 |
+The algorithms in this repo are all implemented **python 3.5** (and versions above). **Tensorflow 1.x** and **PyTorch** are the main DL code frameworks we adopt in this repo with different choices in different algorithms.
 
+Note that the algorithms contained in this repo may not use all the same environments. Please check the README of specific algorithms for detailed installation guidance.
 
 ## TODO
+- [ ] Reconstruct PPO-PeVFA for modularity
+- [x] Add README file for PPO-PeVFA
 
-- [ ] Update the README files for each branch
-
-## Liscense
-
-This repo uses [MIT Liscense](https://github.com/TJU-DRL-LAB/self-supervised-rl/blob/main/LICENSE).
-
-## Citation
-
-If this repository has helped your research, please cite the following:
-```
-@article{tjurllab22ssrl,
-  author    = {TJU RL Lab},
-  title     = {A Unified Repo for Self-supervised RL with Representations},
-  year      = {2022},
-  url       = {https://github.com/TJU-DRL-LAB/self-supervised-rl},
-}
-```
-
+## Related Work
 
+Here we provide a useful list of representative related works on policy representation and policy-extended value functions.
 
-## Major Update Log
-2022-04-07:
-- Readme files updated for several branches (state/environment representation).
-- Codes of our work PAnDR are uploaded. 
+### Policy-extended Value Function:
+- Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Hangyu Mao, Wulong Liu, Yaodong Yang, Changmin Yu. What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator. AAAI 2021.
+- Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon. Policy Evaluation Networks. arXiv:2002.11833
+- Francesco Faccio, Jürgen Schmidhuber. Parameter-based Value Functions. ICLR 2021
+- Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus. Fast Adaptation to New Environments via Policy-Dynamics Value Functions. ICML 2020
 
+### Policy Representation:
+- Aditya Grover, Maruan Al-Shedivat, Jayesh K. Gupta, Yuri Burda, Harrison Edwards. Learning Policy Representations in Multiagent Systems. ICML 2018
+- Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus. Fast Adaptation via Policy-Dynamics Value Functions. ICML 2020
+- Nemanja Rakicevic, Antoine Cully, Petar Kormushev. Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution. GECCO 2021
+- Rundong Wang, Runsheng Yu, Bo An, Zinovi Rabinovich. I²HRL: Interactive Influence-based Hierarchical Reinforcement Learning. IJCAI 2020
+- Oscar Chang, Robert Kwiatkowski, Siyuan Chen, Hod Lipson. Agent Embeddings: A Latent Representation for Pole-Balancing Networks. AAMAS 2019
+- Isac Arnekvist, Danica Kragic, Johannes A. Stork. VPE: Variational Policy Embedding for Transfer Reinforcement Learning. ICRA 2019
+- Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, Ilya O. Tolstikhin. Predicting Neural Network Accuracy from Weights. arXiv:2002.11448
 
-2022-03-24:
-- Readme files updated for several branches (action/policy/other representation) and individual works (VDFP/HyAR/PeVFA).
 
-2022-03-18:
-- Main page readme uploaded.
-- VDFP, HyAR, PeVFA codes - first commit.