Skip to content

Latest commit

 

History

History

mnist

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

The examples in this directory have been adapted from the TensorFlow tutorials. To execute these examples, you will have to unzip the MNIST data files in data/.

Linear Classifier

The code can be found in mnist_linear.ml.

We first load the MNIST data. This is done using the MNIST helper module, labels are returned as a vector of integer. Train images and labels are used when training the model. Test images and labels are used to estimate the validation error.

let m = vision::mnist::load_dir("data").unwrap();

After that two tensors are initialized to hold the weights and biases for the linear classifier. requires_grad is used when creating the tensors to inform torch that we will compute some gradients with respect to these tensors.

let mut ws = Tensor::zeros(&[IMAGE_DIM, LABELS], kind::FLOAT_CPU).set_requires_grad(true);
let mut bs = Tensor::zeros(&[LABELS], kind::FLOAT_CPU).set_requires_grad(true);

Using these the model is defined as multiplying an input by the weight matrix and adding the bias. A softmax function is used to transform the output into a probability distribution.

    let logits = m.train_images.mm(&ws) + &bs;

We use gradient descent to minimize cross-entropy with respect to variables ws and bs and iterate this a couple hundred times.

Rather than using an optimizer we perform the gradient descent updates manually. This is only to illustrate how gradients can be computed and used. Other examples such as mnist_nn.rs or mnist_conv.rs use an Adam optimizer.

    for epoch in 1..200 {
        let logits = m.train_images.mm(&ws) + &bs;
        let loss = logits.log_softmax(-1).nll_loss(&m.train_labels);
        ws.zero_grad();
        bs.zero_grad();
        loss.backward();
        no_grad(|| {
            ws += ws.grad() * (-1);
            bs += bs.grad() * (-1);
        });

Running this code should build a model that has ~92% accuracy.

A Simple Neural-Network

The code can be found in mnist_nn.rs, accuracy should reach ~96%.

Convolutional Neural-Network

The code can be found in mnist_conv.rs, accuracy should reach ~99%.

When building models with multiple weights and bias parameters we use a variable store to keep track of these variables and let the optimizer know about them. The variable store is created in the first line of the following snippet, then the model is built using this variable store and finally we create an optimizer that will performs gradient descent over the parameters that have been added to the variable store when creating the network.

    let vs = nn::VarStore::new(Device::cuda_if_available());
    let net = Net::new(&vs.root());
    let opt = nn::Optimizer::adam(&vs, 1e-4, Default::default());

Note that this will automatically run on a GPU when available. For a convolutional model we cannot run on all the training images in a single step as this would requires lots of memory. Instead we run on mini-batches. An iterator makes it easy to loop over all the training images shuffled and grouped by mini-batches. This is done in the main training loop:

    for epoch in 1..100 {
        for (bimages, blabels) in m.train_iter(256).shuffle().to_device(vs.device()) {
            let loss = net
                .forward_t(&bimages, true)
                .cross_entropy_for_logits(&blabels);
            opt.backward_step(&loss);
        }
        let test_accuracy =
            net.batch_accuracy_for_logits(&m.test_images, &m.test_labels, vs.device(), 1024);
        println!("epoch: {:4} test acc: {:5.2}%", epoch, 100. * test_accuracy,);
    }