Skip to content

Commit

Permalink
typo fix + screen fix
Browse files Browse the repository at this point in the history
  • Loading branch information
Theo Jaunet committed Jul 30, 2019
1 parent b5bc835 commit 10bff3f
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 29 deletions.
File renamed without changes.
30 changes: 15 additions & 15 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -128,35 +128,35 @@ <h5 style="width: 100%" id="card_title"> Full Memory</h5>
<div style="text-align: left;width: 800px;display: inline-block">
<h4>Deep Reinforcement Learning and Memory</h4>

<p> We used the Advantage Actor Critic (A2C), as presented by <i><a
href="https://arxiv.org/abs/1904.01806"> E.Beetching et. al. </a> </i>.
<p> We used the Advantage Actor Critic (A2C) methode, as presented by <i><a
href="https://arxiv.org/abs/1904.01806"> E.Beeching et. al. </a> </i>.
This model learns through trial and error to associate an observation (i.e. matrice of pixels), at
the time-step t to an action (at) such as turn left.
It can achieve this by using neuronal networks with shared parameters <i>theta.</i>
</p>

<p>
The model is composed of three stages with different purposes.
First, 3 convolutional layers to analyze and extract features from the inputted game capture
First, 3 convolutional layers to analyze and extract features from the input game screen
(image).
This results in a tensor of 16 features <i>(ft)</i> shaped as 10x4 matrices. Those features are then
flatten into a vector of 1x32 using a Fully Connected layer.
The purpose of such operation is to prepare them for the next stage which is the memory.
flattened into a vector of 1x32 using a Fully Connected layer.
The purpose of such operation is to prepare them for the next stage, which is the memory.
</p>
<p>
The memory of such model is handled by a GRU layer, which takes a vector as input and outputs a
hidden state <i>(ht)</i>, a vector of 32 elements.
GRU layers maintains and updates a latent representation through time-steps using the combination of
GRU layers maintain and update a latent representation through time-steps using the combination of
its current input <i>(ft)</i> and its previous hidden state <i>ht-1</i>.
Each element of the hidden states is a quantity within the range[−1,1]. A value close to 0
represents low activity, whereas a value close to any extremity represents high activity.
Hidden states can change their values between two time-steps. Such value changes can be widely
observed across hidden states elements during trajectories.
However, it remains unclear which elements, correspond to which representations, and thus,
However, it remains unclear which elements correspond to which representations, and thus are
responsible for decisions.


Finally, the last stage consist of mapping the current hidden state <i>h_t</i> to a probability
Finally, the last stage consists of mapping the current hidden state <i>h_t</i> to a probability
distribution over the 5 available actions <i>(right, left, forward, forward+right, forward+left)</i>.
</p>

Expand All @@ -168,7 +168,7 @@ <h4>Deep Reinforcement Learning and Memory</h4>
which is then used to optimize the neural network with gradient descent.
</p>

<p> For a more detailed introduction to memory, we recommand reading <a
<p> For a more detailed introduction to sequential memory, we recommend reading <a
href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/"> Christopher Olah's blog</a>
on LSTMs. </p>
</div>
Expand All @@ -180,16 +180,16 @@ <h4>Deep Reinforcement Learning and Memory</h4>
<div style="text-align: left;width: 800px;display: inline-block">
<h4> Why Manipulating the Memory?</h4>
<p>
As detailed in the previous section, the agent's actions are directly linked to its memory,
As detailed in the previous section, the agents actions are directly linked to its memory,
therefore,
each of its decisions may be justified by its current hidden state <i>ht</i>, and its values.
each of its decisions depend on its current hidden state <i>ht</i>, and its values.
However, such memory is hard to understand due to the fact that it is time-varying and using
abstract values.
Being able to erase memory elements, and observing how the agent behaves without them,
numerical values.
Being able to erase memory elements, and to observe how the agent behaves without them,
may help understanding and interpreting their roles in the decision process and information they may
represent.

In addition, the hidden states length is manually set by the model's builder, therefore such value
In addition, the hidden states\' length is manually set by the model's designer, therefore such values
maybe unfit to the agent's needs, which may results in unused or redundant elements.
Removing them, and thus reducing the memory length can reduce the computation power needed by the
agent, and both reduce the training time and the energy consumption footprint.
Expand All @@ -208,7 +208,7 @@ <h4> How do we Erase Memory Elements?</h4>

<p> In order to simulate a reduced memory we implemented a method that allows to generate trajectories
from agents with limited memory.
Technically, we hijack the memory vectors by applying a mask to them before each decisions.
Technically, we hijack the memory vectors by applying a mask to them before each decision.
This mask is a 1x32 vector, with its values either set to 0 (remove the element) or set to 1 (keep
the element).
Each memory element is multiplied by its corresponding mask element, and therefore either have
Expand Down
5 changes: 3 additions & 2 deletions js/drawVector.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ let ve_h;
let ve_rows = [];
let ve_w = 15;
let col = d3.scaleLinear().domain([-1, 0, 1]).range(['#2266a6', '#effce5', '#bf6b09']).interpolate(d3.interpolateHcl);
let hst = 850;
let hst = 521;
let sels = [-1, -1];
let old_sels = [-1, -1];
let cur_tri = 'nm';
Expand All @@ -25,11 +25,12 @@ function ve_init_rows(svg, data, height, width, mask, elem) {

let g = svg.append('g').attr('class', 'hiddensgrp').attr('id', 'hiddensgrp');

if (width < 1000) {
/* if (width < 1000) {
hst = (710 * width) / 1200;
hst += 42
}
console.log(hst);*/
ve_h = Math.min(((height - 140) / data[0].length), 60);
ve_w = Math.min((width - hst - 10) / data.length, 13);

Expand Down
6 changes: 4 additions & 2 deletions js/main.js
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ function meta_change(filename, index) {
function load_data(data, index) {
data = tofloat(data);
let tbbox = tool[0].node().getBoundingClientRect();
let traj_s = ((450 * tbbox.width) / 1300);
let traj_s = ((600 * tbbox.width) / 1300);
tdata = data;

console.log('lqlalqlq');
Expand All @@ -62,7 +62,7 @@ function load_data(data, index) {
curStep = tdata.hiddens.length - 1
}

ve_init_rows(tool[0], tdata.hiddens, tool[2], tool[1], tdata.mask, index);
ve_init_rows(tool[0], tdata.hiddens, 685, 811, tdata.mask, index);
$('.traj-sel').toggleClass('traj-sel');
draw_traj(tdata.positions, tool[0], traj_s, traj_s, false, 'sec-traj traj-sel');
update_bars(tool[0], tdata.probabilities[start + curStep]);
Expand All @@ -74,6 +74,7 @@ function load_data(data, index) {

$('#timebar').val(curStep);
update_time();
d3.selectAll('.item').moveToFront()

switch (stage) {
case "0":
Expand Down Expand Up @@ -121,6 +122,7 @@ function step() {
draw_agent_path(tool[0], tdata.positions[start + curStep], tdata.orientations[start + curStep]);
drawImage(tool[0], 'data:image/png;base64,' + tdata.inputs[start + curStep], tool[2]);
update_bars(tool[0], tdata.probabilities[start + curStep]);
d3.selectAll('.item').moveToFront()
} else {
pl = false;
$('.play ').attr('src', 'assets/play-sign.svg');
Expand Down
12 changes: 6 additions & 6 deletions js/stage.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ let stage = 0;
let stages_titles = ['Full Memory', 'Random Memory Reductions', 'Top Memory Elements', 'Memory Elements Selection', 'Do it Yourself!'];
let stages_txt = [

'In order to play doom, the artificial player receive at each time-step, a game capture (image) corresponding to its field of view.\n' +
' From this game capture, it decides which action it should do. As the player\n' +
' decides, it builds an inner representation of the previously seen game captures. To do so, it combines the current game capture and its previous inner representation. Such representation,\n' +
'In order to solve the task of playing Doom, the artificial player at each instant receives, an observed image (a screen capture) corresponding to its field of view.\n' +
' From this image, it decides which action it should do. As the game episode goes on, the agent ' +
' builds an inner representation of the previously seen game captures. To do so, it combines the current game capture and its previous inner representation. The representation,\n' +
' is a vector <i>(1x32)</i> with values in a scale from inactive <span class="cell"></span> to active <span class="cell" style="background-color: rgb(191, 84, 47)"></span>. ' +
'Each vector is vertically aligned in the order in which it is produced.' +
'<br>' +
Expand All @@ -16,9 +16,9 @@ let stages_txt = [
// '<br>' +
// '<br>' +
'Also, the <a onmouseover="highelems([28])" onmouseout="resetelems()"> element # 28</a> (row) remained active until the agent ' +
'gathered the red armor and inactive after. How the agent would behave without it? <a onclick="meta_change(\'nDIY/red28_-1.json\', [28,-1])">Let\'s find out! </a><br> ' +
'The new trajectory stars as the previous one, however once the agent gathered the red armor, it turned left instead of right. ' +
'What if we go further and remove more memory elements? Having smaller models would be useful as they may be more interpretable, but also requiring less computing power and less energy consumption footprint. ' +
'gathered the red armor and inactive after. How would the agent behave without row#28 of its memory? <a onclick="meta_change(\'nDIY/red28_-1.json\', [28,-1])">Let\'s find out! </a><br> ' +
'The new trajectory starts as the previous one; however once the agent gathered the red armor, it turned left instead of right. ' +
'What if we go further and remove more memory elements? Having smaller models would be useful as they may be more interpretable, but also requiring less computing power and have a lower energy consumption footprint. ' +
'<br>',

'A naive approach is to randomly remove memory elements regardless of their activation. Here, they each have 1 chance out of 2 to be erased. ' +
Expand Down
10 changes: 6 additions & 4 deletions js/trajectory.js
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,9 @@ function draw_traj(data, svg, width, height, cs, cla) {

d3.select('.traj_bg').moveToFront()
d3.select('.traj_top').moveToFront()
d3.select('.item').moveToFront()
d3.select('.traj-sel').moveToFront()
d3.selectAll('.item').moveToFront()

}


Expand All @@ -77,26 +78,27 @@ function place_items(svg, st) {
g.append("svg:image")
.attr('x', traj_x(80) + offx + 1)
.attr('y', traj_y(80) + offy - 15.5 + 1)
.attr('class', 'item')
.attr('width', 45)
.attr('height', 57)
.attr('class', 'item')

.attr("xlink:href", 'assets/armorGreen.png')

g.append("svg:image")
.attr('x', traj_x(-400) + offx + 1)
.attr('y', traj_y(-80) + offy - 13.5 + 1)
.attr('class', 'item')
.attr('width', 33)
.attr('height', 27)
.attr('class', 'item')
.attr("xlink:href", 'assets/armorRed.png')


g.append("svg:image")
.attr('x', traj_x(-240) + offx + 1)
.attr('y', traj_y(240) + offy - 13.5 + 1)
.attr('width', 33)
.attr('class', 'item')
.attr('height', 27)
.attr('class', 'item')
.attr("xlink:href", 'assets/hp.png');


Expand Down

0 comments on commit 10bff3f

Please sign in to comment.