Skip to content

An Implementation of Synchronous Advantage Actor Critic in Tensorflow

Notifications You must be signed in to change notification settings

gdao-research/A2C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advantage Actor Critic (A2C)

An implementation of A2C (a variant of A3C) from OpenAI blog post.

Intuition:

  • Multiple workers work on different copies of an environment to collect a batch of data $\rightarrow$ No need for replay buffer.
  • Noise is added to logits of policy to ensure exploration.
  • Perform 1 gradient update step based on the data batch.

Environment

  • Python 3.6.5
  • TensorFlow 1.12
  • OpenAI Gym 0.10.5
  • OpenCV 4.0.0
  • mpi4py 3.0.0

* Note: All of the environment modification were taken from OpenAI baseline repository.

About

An Implementation of Synchronous Advantage Actor Critic in Tensorflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages