sweeps

the-full-stack · Nov 16, 2019 · b8350a9 · b8350a9
1 parent fec8dab
commit b8350a9
Show file tree

Hide file tree

Showing 12 changed files with 53 additions and 21 deletions.
diff --git a/lab3/readme.md b/lab3/readme.md
@@ -18,20 +18,25 @@ cd lab3/
 
 ## Intro to Weights & Biases
 
+Weights & Biases is an experiment tracking tool that ensures you never lose track of your progress.
+
 ### Motivation for W&B
+
 - Keep track of all experiments in one place
 - Easily compare runs
+- Create reports to document your progress
 - Look at results from the whole team
 
 ### Let's get started with W&B!
 
-NOTE: These instructions are optional if you're working in the pre-configured Jupyter hub.
+> NOTE: These instructions are optional if you're working in the pre-configured Jupyter hub.
 
 ```
 pipenv run wandb init
 ```
 
 You should see something like:
+
 ```
 ? Which team should we use? (Use arrow keys)
 > your_username
@@ -44,6 +49,7 @@ Select your username.
 Which project should we use?
 > Create New
 ```
+
 Select `fsdl-text-recognizer-project`.
 
 How to implement W&B in training code?
@@ -61,8 +67,9 @@ tasks/train_character_predictor.sh
 You should see:
 
 ```
-wandb: Started W&B process version 0.6.17 with PID <xxxx>
-wandb: Syncing https://api.wandb.ai/<USERNAME>/fsdl-text-recognizer-project/runs/<xxxxxx>
+wandb: Tracking run with wandb version 0.8.15
+wandb: Run data is saved locally in wandb/run-20191116_020355-1n7aaz5g
+wandb: Syncing run flowing-waterfall-1
 ```
 
 Click the link to see your run train.
@@ -79,15 +86,17 @@ Click the link to see your run train.
 pipenv run python training/run_experiment.py --save '{"dataset": "EmnistDataset", "model": "CharacterModel", "network": "mlp", "train_args": {"batch_size": 512}}' --gpu=1
 ```
 
-Check out both runs at https://app.wandb.ai
+Check out both runs at https://app.wandb.ai/<USERNAME>/fsdl-text-recognizer-project
 
 ### Automatically running multiple experiments
 
 Desiderata for single-machine parallel experimentation code
+
 - Define multiple experiments and run them simultaneously on all available GPUs
 - Run more experiments than GPUs and automatically queue up extras
 
 Let's look at a simple implementation of these:
+
 - Look at `training/prepare_experiments.py`
 - Look at `training/gpu_manager.py`
 

diff --git a/lab4/readme.md b/lab4/readme.md
@@ -3,14 +3,18 @@
 In this lab we will introduce the IAM handwriting dataset, and give you a chance to try out different things, run experiments, and review results on W&B.
 
 ## Goal of the lab
+
 - Introduce IAM handwriting dataset
 - Try some ideas & review results on W&B
 - See who can get the best score :)
+- Automate trials with hyper-parameter sweeps
 
 ## Outline
+
 - Intro to IAM datasets
 - Train a baseline model
 - Try your own ideas
+- Run a sweep
 
 ## Follow along
 
@@ -23,7 +27,7 @@ cd lab4/
 
 - Look at `notebooks/03-look-at-iam-lines.ipynb`.
 
-## Training
+## Training individual runs
 
 Let's train with the default params by running `tasks/train_lstm_line_predictor_on_iam.sh`, which runs the following command:
 
@@ -35,11 +39,40 @@ This uses our LSTM with CTC model. 8 epochs gets accuracy of 40% and takes about
 
 Training longer will keep improving: the same settings get to 60% accuracy in 40 epochs.
 
+## Configuring sweeps
+
+Sweeps enable automated trials of hyper-parameters. W&B provides built in support for running [sweeps](https://docs.wandb.com/library/sweeps). We've setup an initial configuration file for sweeps in `training/sweeps.yaml`. It performs a basic grid search across 3 parameters. There are lots of different [configuration options](https://docs.wandb.com/library/sweeps/configuration) for defining more complex sweeps. Anytime you modify this configuration you'll need to create a sweep in wandb by running:
+
+```bash
+pipenv run wandb sweep training/sweep.yaml
+```
+
+```text
+Creating sweep from: sweep.yaml
+Create sweep with ID: 0nnj74vx
+```
+
+Take note of the 8 character ID that's returned by this command. It's best to store this in an environment variable by running `export SWEEP_ID=0nnj74vx`. W&B sweeps work by running a command and passing arguments into it. We wrote a wrapper at `training/run_sweep.py` to convert these arguments into a JSON config object.
+
+> NOTE: Be sure to edit **config_defaults** in `training/run_sweep.py` if you train on different datasets or models.
+
+To run a sweep you can start multiple agents to query for and run the next set of parameters. This is done with the command:
+
+```bash
+pipenv run wandb agent $SWEEP_ID
+```
+
+This will print a url to W&B which you can use to monitor or control the sweep.
+
+### Stopping a sweep
+
+If you choose the **random** sweep strategy, the agent will run forever. Our **grid** search strategy will stop once all options have been tried. You can stop a sweep from the W&B UI, or directly from the terminal. Hitting CTRL-C once will prevent the agent from running a new experiment but allow the current experiment to finish. Hitting CTRL-C again will kill the current running experiment.
+
 ## Ideas for things to try
 
 For the rest of the lab, let's play around with different things and see if we can improve performance quickly.
 
-You can see all of our training runs here: https://app.wandb.ai/fsdl/fsdl-text-recognizer-nov16
+You can see all of our training runs here: https://app.wandb.ai/fsdl/fsdl-text-recognizer-nov2019
 Feel free to peek in on your neighbors!
 
 If you commit and push your code changes, then the run will also be linked to the exact code your ran, which you will be able to review months later if necessary.

diff --git a/lab4/training/run_sweep.py b/lab4/training/run_sweep.py
@@ -20,7 +20,7 @@
     },
     "train_args": {
         "batch_size": 128,
-        "epochs": 10
+        "epochs": 5
     }
 }
 

diff --git a/lab4/training/sweep.yaml b/lab4/training/sweep.yaml
@@ -4,8 +4,6 @@ metric:
   name: val_loss
   goal: minimize
 parameters:
-  dataset_args.max_overlap:
-    values: [0.1, 0.4, 0.7]
   network_args.window_width:
     values: [10, 20]
   network_args.window_stride:

diff --git a/lab5/training/run_sweep.py b/lab5/training/run_sweep.py
@@ -20,7 +20,7 @@
     },
     "train_args": {
         "batch_size": 128,
-        "epochs": 10
+        "epochs": 5
     }
 }
 

diff --git a/lab5/training/sweep.yaml b/lab5/training/sweep.yaml
@@ -4,8 +4,6 @@ metric:
   name: val_loss
   goal: minimize
 parameters:
-  dataset_args.max_overlap:
-    values: [0.1, 0.4, 0.7]
   network_args.window_width:
     values: [10, 20]
   network_args.window_stride:

diff --git a/lab6/training/run_sweep.py b/lab6/training/run_sweep.py
@@ -20,7 +20,7 @@
     },
     "train_args": {
         "batch_size": 128,
-        "epochs": 10
+        "epochs": 5
     }
 }
 

diff --git a/lab6/training/sweep.yaml b/lab6/training/sweep.yaml
@@ -4,8 +4,6 @@ metric:
   name: val_loss
   goal: minimize
 parameters:
-  dataset_args.max_overlap:
-    values: [0.1, 0.4, 0.7]
   network_args.window_width:
     values: [10, 20]
   network_args.window_stride:

diff --git a/lab7/training/run_sweep.py b/lab7/training/run_sweep.py
@@ -20,7 +20,7 @@
     },
     "train_args": {
         "batch_size": 128,
-        "epochs": 10
+        "epochs": 5
     }
 }
 

diff --git a/lab7/training/sweep.yaml b/lab7/training/sweep.yaml
@@ -4,8 +4,6 @@ metric:
   name: val_loss
   goal: minimize
 parameters:
-  dataset_args.max_overlap:
-    values: [0.1, 0.4, 0.7]
   network_args.window_width:
     values: [10, 20]
   network_args.window_stride:

diff --git a/lab8/training/run_sweep.py b/lab8/training/run_sweep.py
@@ -20,7 +20,7 @@
     },
     "train_args": {
         "batch_size": 128,
-        "epochs": 10
+        "epochs": 5
     }
 }
 

diff --git a/lab8/training/sweep.yaml b/lab8/training/sweep.yaml
@@ -4,8 +4,6 @@ metric:
   name: val_loss
   goal: minimize
 parameters:
-  dataset_args.max_overlap:
-    values: [0.1, 0.4, 0.7]
   network_args.window_width:
     values: [10, 20]
   network_args.window_stride: