Update readme.

danijar · Sep 15, 2021 · e36271d · e36271d
1 parent 575becd
commit e36271d
Showing 1 changed file with 24 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -4,20 +4,11 @@
 
 # Crafter
 
-Open world survival environment for reinforcement learning.
+Open world survival game for evaluating a wide range of agent abilities within
+a single environment.
 
 ![Crafter Terrain](https://github.com/danijar/crafter/raw/main/media/terrain.png)
 
-If you find this code useful, please reference in your paper:
-
-```
-@misc{hafner2021crafter,
-  title = {Benchmarking Diverse Agent Capabilities},
-  author = {Danijar Hafner},
-  year = {2021},
-}
-```
-
 ## Overview
 
 Crafter features randomly generated 2D worlds where the player needs to forage
@@ -34,8 +25,20 @@ reinforcement learning by focusing on the following design goals:
   ability spectrum of both reward agents and unsupervised agents.
 
 - **Iteration speed:** Crafter evaluates many agent abilities within a single
-  environment, vastly reducing the computational requirements over benchmarks
-  suites that require training on many separate environments from scratch.
+  env, vastly reducing the computational requirements over benchmarks suites
+  that require training on many separate envs from scratch.
+
+See the research paper to find out more: [Benchmarking the Spectrum of Agent
+Capabilities](https://arxiv.org/pdf/2109.06780.pdf)
+
+```
+@article{hafner2021crafter,
+  title={Benchmarking the Spectrum of Agent Capabilities},
+  author={Danijar Hafner},
+  year={2021},
+  journal={arXiv preprint arXiv:2109.06780},
+}
+```
 
 ## Play Yourself
 
@@ -96,24 +99,22 @@ while not done:
 
 ## Evaluation
 
-The environmnent defines `CrafterReward-v1` for agents that learn from the
-provided reward and `CrafterNoReward-v1` for unsupervised agents. Agents are
-allowed a budget of 1M environmnent steps and are evaluated by their success
-rates on the 22 achievements and by their geometric mean score. Example scripts
-for computing these are included in the `analysis` directory of the repository.
+Agents are allowed a budget of 1M environmnent steps and are evaluated by their
+success rates of the 22 achievements and by their geometric mean score. Example
+scripts for computing these are included in the `analysis` directory of the
+repository.
 
-- **Reward:** The sparse reward is `+1` for unlocking a new achievement during
-  the episode and `-0.1` or `+0.1` for every lost or regenerated health point.
-  Performance should not be reported as reward but as the score; see below.
+- **Reward:** The sparse reward is `+1` for unlocking an achievement during
+  the episode and `-0.1` or `+0.1` for lost or regenerated health points.
+  Results should be reported not as reward but as success rates and score.
 
 - **Success rates:** The success rates of the 22 achievemnts are computed
   as the percentage across all training episodes in which the achievement was
   unlocked, allowing insights into the ability spectrum of an agent.
 
 - **Crafter score:** The score is the geometric mean of success rates, so that
   improvements on difficult achievements contribute more than improvements on
-  achievements with already high success rates. Please see the paper for
-  details.
+  achievements with already high success rates.
 
 ## Baselines