Skip to content

Commit

Permalink
sweeps
Browse files Browse the repository at this point in the history
  • Loading branch information
sergeyk committed Nov 16, 2019
1 parent fec8dab commit b8350a9
Show file tree
Hide file tree
Showing 12 changed files with 53 additions and 21 deletions.
17 changes: 13 additions & 4 deletions lab3/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,25 @@ cd lab3/

## Intro to Weights & Biases

Weights & Biases is an experiment tracking tool that ensures you never lose track of your progress.

### Motivation for W&B

- Keep track of all experiments in one place
- Easily compare runs
- Create reports to document your progress
- Look at results from the whole team

### Let's get started with W&B!

NOTE: These instructions are optional if you're working in the pre-configured Jupyter hub.
> NOTE: These instructions are optional if you're working in the pre-configured Jupyter hub.
```
pipenv run wandb init
```

You should see something like:

```
? Which team should we use? (Use arrow keys)
> your_username
Expand All @@ -44,6 +49,7 @@ Select your username.
Which project should we use?
> Create New
```

Select `fsdl-text-recognizer-project`.

How to implement W&B in training code?
Expand All @@ -61,8 +67,9 @@ tasks/train_character_predictor.sh
You should see:

```
wandb: Started W&B process version 0.6.17 with PID <xxxx>
wandb: Syncing https://api.wandb.ai/<USERNAME>/fsdl-text-recognizer-project/runs/<xxxxxx>
wandb: Tracking run with wandb version 0.8.15
wandb: Run data is saved locally in wandb/run-20191116_020355-1n7aaz5g
wandb: Syncing run flowing-waterfall-1
```

Click the link to see your run train.
Expand All @@ -79,15 +86,17 @@ Click the link to see your run train.
pipenv run python training/run_experiment.py --save '{"dataset": "EmnistDataset", "model": "CharacterModel", "network": "mlp", "train_args": {"batch_size": 512}}' --gpu=1
```

Check out both runs at https://app.wandb.ai
Check out both runs at https://app.wandb.ai/<USERNAME>/fsdl-text-recognizer-project

### Automatically running multiple experiments

Desiderata for single-machine parallel experimentation code

- Define multiple experiments and run them simultaneously on all available GPUs
- Run more experiments than GPUs and automatically queue up extras

Let's look at a simple implementation of these:

- Look at `training/prepare_experiments.py`
- Look at `training/gpu_manager.py`

Expand Down
37 changes: 35 additions & 2 deletions lab4/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,18 @@
In this lab we will introduce the IAM handwriting dataset, and give you a chance to try out different things, run experiments, and review results on W&B.

## Goal of the lab

- Introduce IAM handwriting dataset
- Try some ideas & review results on W&B
- See who can get the best score :)
- Automate trials with hyper-parameter sweeps

## Outline

- Intro to IAM datasets
- Train a baseline model
- Try your own ideas
- Run a sweep

## Follow along

Expand All @@ -23,7 +27,7 @@ cd lab4/

- Look at `notebooks/03-look-at-iam-lines.ipynb`.

## Training
## Training individual runs

Let's train with the default params by running `tasks/train_lstm_line_predictor_on_iam.sh`, which runs the following command:

Expand All @@ -35,11 +39,40 @@ This uses our LSTM with CTC model. 8 epochs gets accuracy of 40% and takes about

Training longer will keep improving: the same settings get to 60% accuracy in 40 epochs.

## Configuring sweeps

Sweeps enable automated trials of hyper-parameters. W&B provides built in support for running [sweeps](https://docs.wandb.com/library/sweeps). We've setup an initial configuration file for sweeps in `training/sweeps.yaml`. It performs a basic grid search across 3 parameters. There are lots of different [configuration options](https://docs.wandb.com/library/sweeps/configuration) for defining more complex sweeps. Anytime you modify this configuration you'll need to create a sweep in wandb by running:

```bash
pipenv run wandb sweep training/sweep.yaml
```

```text
Creating sweep from: sweep.yaml
Create sweep with ID: 0nnj74vx
```

Take note of the 8 character ID that's returned by this command. It's best to store this in an environment variable by running `export SWEEP_ID=0nnj74vx`. W&B sweeps work by running a command and passing arguments into it. We wrote a wrapper at `training/run_sweep.py` to convert these arguments into a JSON config object.

> NOTE: Be sure to edit **config_defaults** in `training/run_sweep.py` if you train on different datasets or models.
To run a sweep you can start multiple agents to query for and run the next set of parameters. This is done with the command:

```bash
pipenv run wandb agent $SWEEP_ID
```

This will print a url to W&B which you can use to monitor or control the sweep.

### Stopping a sweep

If you choose the **random** sweep strategy, the agent will run forever. Our **grid** search strategy will stop once all options have been tried. You can stop a sweep from the W&B UI, or directly from the terminal. Hitting CTRL-C once will prevent the agent from running a new experiment but allow the current experiment to finish. Hitting CTRL-C again will kill the current running experiment.

## Ideas for things to try

For the rest of the lab, let's play around with different things and see if we can improve performance quickly.

You can see all of our training runs here: https://app.wandb.ai/fsdl/fsdl-text-recognizer-nov16
You can see all of our training runs here: https://app.wandb.ai/fsdl/fsdl-text-recognizer-nov2019
Feel free to peek in on your neighbors!

If you commit and push your code changes, then the run will also be linked to the exact code your ran, which you will be able to review months later if necessary.
Expand Down
2 changes: 1 addition & 1 deletion lab4/training/run_sweep.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
},
"train_args": {
"batch_size": 128,
"epochs": 10
"epochs": 5
}
}

Expand Down
2 changes: 0 additions & 2 deletions lab4/training/sweep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ metric:
name: val_loss
goal: minimize
parameters:
dataset_args.max_overlap:
values: [0.1, 0.4, 0.7]
network_args.window_width:
values: [10, 20]
network_args.window_stride:
Expand Down
2 changes: 1 addition & 1 deletion lab5/training/run_sweep.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
},
"train_args": {
"batch_size": 128,
"epochs": 10
"epochs": 5
}
}

Expand Down
2 changes: 0 additions & 2 deletions lab5/training/sweep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ metric:
name: val_loss
goal: minimize
parameters:
dataset_args.max_overlap:
values: [0.1, 0.4, 0.7]
network_args.window_width:
values: [10, 20]
network_args.window_stride:
Expand Down
2 changes: 1 addition & 1 deletion lab6/training/run_sweep.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
},
"train_args": {
"batch_size": 128,
"epochs": 10
"epochs": 5
}
}

Expand Down
2 changes: 0 additions & 2 deletions lab6/training/sweep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ metric:
name: val_loss
goal: minimize
parameters:
dataset_args.max_overlap:
values: [0.1, 0.4, 0.7]
network_args.window_width:
values: [10, 20]
network_args.window_stride:
Expand Down
2 changes: 1 addition & 1 deletion lab7/training/run_sweep.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
},
"train_args": {
"batch_size": 128,
"epochs": 10
"epochs": 5
}
}

Expand Down
2 changes: 0 additions & 2 deletions lab7/training/sweep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ metric:
name: val_loss
goal: minimize
parameters:
dataset_args.max_overlap:
values: [0.1, 0.4, 0.7]
network_args.window_width:
values: [10, 20]
network_args.window_stride:
Expand Down
2 changes: 1 addition & 1 deletion lab8/training/run_sweep.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
},
"train_args": {
"batch_size": 128,
"epochs": 10
"epochs": 5
}
}

Expand Down
2 changes: 0 additions & 2 deletions lab8/training/sweep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ metric:
name: val_loss
goal: minimize
parameters:
dataset_args.max_overlap:
values: [0.1, 0.4, 0.7]
network_args.window_width:
values: [10, 20]
network_args.window_stride:
Expand Down

0 comments on commit b8350a9

Please sign in to comment.