Name		Name	Last commit message	Last commit date
parent directory ..
notebooks		notebooks
tasks		tasks
text_recognizer		text_recognizer
training		training
readme.md		readme.md

readme.md

Lab 1: Single-character prediction

Goal of the lab

Train a model to solve a simplified version of the line text recognition problem.

Outline

Intro to EMNIST, a character prediction dataset.
Explore the networks and training code.
Train simple MLP/CNN baselines to solve EMNIST.
Test your model.

Follow along

git pull
pipenv sync -d
cd lab1/

Intro to EMNIST

EMNIST = Extended Mini-NIST :)
All English letters and digits presented in the MNIST format.
Look at: notebooks/01-look-at-emnist.ipynb

Networks and training code

- text_recognizer/networks/mlp.py
- text_recognizer/networks/lenet.py
- text_recognizer/models/base.py
- text_recognizer/models/character_model.py
- training/util.py

Train MLP and CNN

You can run the shortcut command tasks/train_character_predictor.sh, which runs the following:

pipenv run training/run_experiment.py --save \
  '{"dataset": "EmnistDataset", "model": "CharacterModel", "network": "mlp",  "train_args": {"batch_size": 256}}'

It will take a couple of minutes to train your model.

Just for fun, you could also try a larger MLP, with a smaller batch size:

pipenv run training/run_experiment.py \
  '{"dataset": "EmnistDataset", "model": "CharacterModel", "network": "mlp", "network_args": {"num_layers": 8}, "train_args": {"batch_size": 128}}'

Let's also train a CNN on the same task.

pipenv run training/run_experiment.py '{"dataset": "EmnistDataset", "model": "CharacterModel", "network": "lenet", "train_args": {"epochs": 1}}'

Training the single epoch will take about 2 minutes (that's why we only do one epoch in this lab :)). Leave it running while we go on to the next part.

It is very useful to be able to subsample the dataset for quick experiments. This is possibe by passing subsample_fraction=0.1 (or some other fraction) at dataset initialization, or in dataset_args in the run_experiment.py dictionary, for example:

pipenv run training/run_experiment.py '{"dataset": "EmnistDataset", "dataset_args": {"subsample_fraction": 0.1}, "model": "CharacterModel", "network": "lenet"}'

Testing

First, let's take a look at how the test works at

text_recognizer/tests/test_character_predictor.py

Now let's see if it works by running:

pipenv run pytest -s text_recognizer/tests/test_character_predictor.py

Or, use the shorthand tasks/test_functionality.sh

Testing should finish quickly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lab1

lab1

readme.md

Lab 1: Single-character prediction

Goal of the lab

Outline

Follow along

Intro to EMNIST

Networks and training code

Train MLP and CNN

Testing

Files

lab1

Directory actions

More options

Directory actions

More options

Latest commit

History

lab1

Folders and files

parent directory

readme.md

Lab 1: Single-character prediction

Goal of the lab

Outline

Follow along

Intro to EMNIST

Networks and training code

Train MLP and CNN

Testing