Skip to content

Commit

Permalink
Merge pull request udacity#170 from jxm-math/master
Browse files Browse the repository at this point in the history
Fix: some misspellings at projects/smartcab/smartcab.ipynb
  • Loading branch information
adarsh0806 committed Dec 14, 2016
2 parents b88acf9 + 8fb38ce commit 2274839
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions projects/smartcab/smartcab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@
"To obtain results from the initial simulation, you will need to adjust following flags:\n",
"- `'enforce_deadline'` - Set this to `True` to force the driving agent to capture whether it reaches the destination in time.\n",
"- `'update_delay'` - Set this to a small value (such as `0.01`) to reduce the time between steps in each trial.\n",
"- `'log_metrics'` - Set this to `True` to log the simluation results as a `.csv` file in `/logs/`.\n",
"- `'log_metrics'` - Set this to `True` to log the simulation results as a `.csv` file in `/logs/`.\n",
"- `'n_test'` - Set this to `'10'` to perform 10 testing trials.\n",
"\n",
"Optionally, you may disable to the visual simulation (which can make the trials go faster) by setting the `'display'` flag to `False`. Flags that have been set here should be returned to their default setting when debugging. It is important that you understand what each flag does and how it affects the simulation!\n",
Expand All @@ -153,7 +153,7 @@
"### Question 3\n",
"Using the visualization above that was produced from your initial simulation, provide an analysis and make several observations about the driving agent. Be sure that you are making at least one observation about each panel present in the visualization. Some things you could consider:\n",
"- *How frequently is the driving agent making bad decisions? How many of those bad decisions cause accidents?*\n",
"- *Given that the agent is driving randomly, does the rate of reliabilty make sense?*\n",
"- *Given that the agent is driving randomly, does the rate of reliability make sense?*\n",
"- *What kind of rewards is the agent receiving for its actions? Do the rewards suggest it has been penalized heavily?*\n",
"- *As the number of trials increases, does the outcome of results change significantly?*\n",
"- *Would this Smartcab be considered safe and/or reliable for its passengers? Why or why not?*"
Expand Down Expand Up @@ -222,7 +222,7 @@
"metadata": {},
"source": [
"### Question 5\n",
"*If a state is defined using the features you've selected from **Question 4**, what would be the size of the state space? Given what you know about the evironment and how it is simulated, do you think the driving agent could learn a policy for each possible state within a reasonable number of training trials?* \n",
"*If a state is defined using the features you've selected from **Question 4**, what would be the size of the state space? Given what you know about the environment and how it is simulated, do you think the driving agent could learn a policy for each possible state within a reasonable number of training trials?* \n",
"**Hint:** Consider the *combinations* of features to calculate the total number of states!"
]
},
Expand All @@ -249,7 +249,7 @@
"source": [
"-----\n",
"## Implement a Q-Learning Driving Agent\n",
"The third step to creating an optimized Q-Learning agent is to begin implementing the functionality of Q-Learning itself. The concept of Q-Learning is fairly straightforward: For every state the agent visits, create an entry in the Q-table for all state-action pairs available. Then, when the agent encounters a state and performs an action, update the Q-value associated with that state-action pair based on the reward received and the interative update rule implemented. Of course, additional benefits come from Q-Learning, such that we can have the agent choose the *best* action for each state based on the Q-values of each state-action pair possible. For this project, you will be implementing a *decaying,* $\\epsilon$*-greedy* Q-learning algorithm with *no* discount factor. Follow the implementation instructions under each **TODO** in the agent functions.\n",
"The third step to creating an optimized Q-Learning agent is to begin implementing the functionality of Q-Learning itself. The concept of Q-Learning is fairly straightforward: For every state the agent visits, create an entry in the Q-table for all state-action pairs available. Then, when the agent encounters a state and performs an action, update the Q-value associated with that state-action pair based on the reward received and the interactive update rule implemented. Of course, additional benefits come from Q-Learning, such that we can have the agent choose the *best* action for each state based on the Q-values of each state-action pair possible. For this project, you will be implementing a *decaying,* $\\epsilon$*-greedy* Q-learning algorithm with *no* discount factor. Follow the implementation instructions under each **TODO** in the agent functions.\n",
"\n",
"Note that the agent attribute `self.Q` is a dictionary: This is how the Q-table will be formed. Each state will be a key of the `self.Q` dictionary, and each value will then be another dictionary that holds the *action* and *Q-value*. Here is an example:\n",
"\n",
Expand Down Expand Up @@ -278,7 +278,7 @@
"To obtain results from the initial Q-Learning implementation, you will need to adjust the following flags and setup:\n",
"- `'enforce_deadline'` - Set this to `True` to force the driving agent to capture whether it reaches the destination in time.\n",
"- `'update_delay'` - Set this to a small value (such as `0.01`) to reduce the time between steps in each trial.\n",
"- `'log_metrics'` - Set this to `True` to log the simluation results as a `.csv` file and the Q-table as a `.txt` file in `/logs/`.\n",
"- `'log_metrics'` - Set this to `True` to log the simulation results as a `.csv` file and the Q-table as a `.txt` file in `/logs/`.\n",
"- `'n_test'` - Set this to `'10'` to perform 10 testing trials.\n",
"- `'learning'` - Set this to `'True'` to tell the driving agent to use your Q-Learning implementation.\n",
"\n",
Expand Down Expand Up @@ -329,7 +329,7 @@
"source": [
"-----\n",
"## Improve the Q-Learning Driving Agent\n",
"The third step to creating an optimized Q-Learning agent is to perform the optimization! Now that the Q-Learning algorithm is implemented and the driving agent is successfully learning, it's necessary to tune settings and adjust learning paramaters so the driving agent learns both **safety** and **efficiency**. Typically this step will require a lot of trial and error, as some settings will invariably make the learning worse. One thing to keep in mind is the act of learning itself and the time that this takes: In theory, we could allow the agent to learn for an incredibly long amount of time; however, another goal of Q-Learning is to *transition from experimenting with unlearned behavior to acting on learned behavior*. For example, always allowing the agent to perform a random action during training (if $\\epsilon = 1$ and never decays) will certainly make it *learn*, but never let it *act*. When improving on your Q-Learning implementation, consider the impliciations it creates and whether it is logistically sensible to make a particular adjustment."
"The third step to creating an optimized Q-Learning agent is to perform the optimization! Now that the Q-Learning algorithm is implemented and the driving agent is successfully learning, it's necessary to tune settings and adjust learning parameters so the driving agent learns both **safety** and **efficiency**. Typically this step will require a lot of trial and error, as some settings will invariably make the learning worse. One thing to keep in mind is the act of learning itself and the time that this takes: In theory, we could allow the agent to learn for an incredibly long amount of time; however, another goal of Q-Learning is to *transition from experimenting with unlearned behavior to acting on learned behavior*. For example, always allowing the agent to perform a random action during training (if $\\epsilon = 1$ and never decays) will certainly make it *learn*, but never let it *act*. When improving on your Q-Learning implementation, consider the implications it creates and whether it is logistically sensible to make a particular adjustment."
]
},
{
Expand All @@ -340,7 +340,7 @@
"To obtain results from the initial Q-Learning implementation, you will need to adjust the following flags and setup:\n",
"- `'enforce_deadline'` - Set this to `True` to force the driving agent to capture whether it reaches the destination in time.\n",
"- `'update_delay'` - Set this to a small value (such as `0.01`) to reduce the time between steps in each trial.\n",
"- `'log_metrics'` - Set this to `True` to log the simluation results as a `.csv` file and the Q-table as a `.txt` file in `/logs/`.\n",
"- `'log_metrics'` - Set this to `True` to log the simulation results as a `.csv` file and the Q-table as a `.txt` file in `/logs/`.\n",
"- `'learning'` - Set this to `'True'` to tell the driving agent to use your Q-Learning implementation.\n",
"- `'optimized'` - Set this to `'True'` to tell the driving agent you are performing an optimized version of the Q-Learning implementation.\n",
"\n",
Expand Down Expand Up @@ -379,7 +379,7 @@
"### Question 7\n",
"Using the visualization above that was produced from your improved Q-Learning simulation, provide a final analysis and make observations about the improved driving agent like in **Question 6**. Questions you should answer: \n",
"- *What decaying function was used for epsilon (the exploration factor)?*\n",
"- *Approximately how many training trials were needed for your agent before begining testing?*\n",
"- *Approximately how many training trials were needed for your agent before beginning testing?*\n",
"- *What epsilon-tolerance and alpha (learning rate) did you use? Why did you use them?*\n",
"- *How much improvement was made with this Q-Learner when compared to the default Q-Learner from the previous section?*\n",
"- *Would you say that the Q-Learner results show that your driving agent successfully learned an appropriate policy?*\n",
Expand Down Expand Up @@ -423,7 +423,7 @@
"source": [
"-----\n",
"### Optional: Future Rewards - Discount Factor, `'gamma'`\n",
"Curiously, as part of the Q-Learning algorithm, you were asked to **not** use the discount factor, `'gamma'` in the implementation. Including future rewards in the algorithm is used to aid in propogating positive rewards backwards from a future state to the current state. Essentially, if the driving agent is given the option to make several actions to arrive at different states, including future rewards will bias the agent towards states that could provide even more rewards. An example of this would be the driving agent moving towards a goal: With all actions and rewards equal, moving towards the goal would theoretically yield better rewards if there is an additional reward for reaching the goal. However, even though in this project, the driving agent is trying to reach a destination in the allotted time, including future rewards will not benefit the agent. In fact, if the agent were given many trials to learn, it could negatively affect Q-values!"
"Curiously, as part of the Q-Learning algorithm, you were asked to **not** use the discount factor, `'gamma'` in the implementation. Including future rewards in the algorithm is used to aid in propagating positive rewards backwards from a future state to the current state. Essentially, if the driving agent is given the option to make several actions to arrive at different states, including future rewards will bias the agent towards states that could provide even more rewards. An example of this would be the driving agent moving towards a goal: With all actions and rewards equal, moving towards the goal would theoretically yield better rewards if there is an additional reward for reaching the goal. However, even though in this project, the driving agent is trying to reach a destination in the allotted time, including future rewards will not benefit the agent. In fact, if the agent were given many trials to learn, it could negatively affect Q-values!"
]
},
{
Expand Down

0 comments on commit 2274839

Please sign in to comment.