Advantage Actor Critic (A2C)

An implementation of A2C (a variant of A3C) from OpenAI blog post.

Intuition:

Multiple workers work on different copies of an environment to collect a batch of data $\rightarrow$ No need for replay buffer.
Noise is added to logits of policy to ensure exploration.
Perform 1 gradient update step based on the data batch.

* Note: All of the environment modification were taken from OpenAI baseline repository.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
logger.py		logger.py
main.py		main.py
model.py		model.py
monitor.py		monitor.py
runner.py		runner.py
utils.py		utils.py
wrappers.py		wrappers.py