Skip to content

Traffic sign classification project of Udacity Self-Driving Car Engineer Nanodegree

License

Notifications You must be signed in to change notification settings

frgfm/sdcnd-p3-traffic-sign-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Traffic Sign Classification

LicenseCodacy Badge CircleCI codecov

This repository is an implementation of an image classification pipeline for the traffic sign classification of Udacity Self-Driving Car Nanodegree (cf. repo).

test_sample05

Table of Contents

Getting started

Prerequisites

  • Python 3.6 (or more recent)
  • pip

Installation

You can install the project requirements as follows:

git clone https://github.com/frgfm/sdcnd-p3-traffic-sign-classification.git
cd sdcnd-p3-traffic-sign-classification
pip install -r requirements.txt
mkdir data

Get the German Traffic Dataset from here and extract its content in the data folder to be able to train. Download the class labels and put them in the same folder.

You can download the trained model (put the checkpoint in the ./data folder) and the testing images (extract it to a test_images folder) from the latest release.

Note: if you wish to use CPU only, replace the tensorflow-gpu dependencies with tensorflow

Usage

Training

A training script is available to train an image classifier on the training dataset.

usage: train.py [-h] [--folder FOLDER] [--batch-size BATCH_SIZE] [--lr LR]
                [--distribution]
                epochs

Traffic sign classification training

positional arguments:
  epochs                Number of epochs to train

optional arguments:
  -h, --help            show this help message and exit
  --folder FOLDER       Path to data folder (default: ./data)
  --batch-size BATCH_SIZE
                        Batch size (default: 128)
  --lr LR               Learning rate (default: 0.001)
  --distribution        Should the class distribution be displayed (default:
                        False)

Test

An inference script is available for you to test out your trained model or the one from th release.

usage: test.py [-h] [--folder FOLDER] [--model MODEL] imgfolder

Traffic sign classification training

positional arguments:
  imgfolder        Path to image folder

optional arguments:
  -h, --help       show this help message and exit
  --folder FOLDER  Images to test (default: ./data)
  --model MODEL    Path to model checkpoint (default: ./data/model.h5)

Activation visualization

Finally you can check the activation on specific layers of your trained model.

usage: visualize.py [-h] [--layer LAYER] [--folder FOLDER] [--model MODEL] img

Traffic sign classification activation visualization

positional arguments:
  img              Path to image

optional arguments:
  -h, --help       show this help message and exit
  --layer LAYER    Layer name (options: conv2d, max_pooling2d, conv2d_1,
                   max_pooling2d_1) (default: conv2d)
  --folder FOLDER  Images to test (default: ./data)
  --model MODEL    Path to model checkpoint (default: ./data/model.h5)

Approach

Dataset

The dataset has 34799 training examples, 4410 validation examples and 12630 testing examples Each sample is an RGB image of shape: (32, 32, 3). Labels include 43 classes of traffic signs.

class_distribution

The class distribution is imbalanced as shown above. The data preprocessing will include:

  • conversion to grayscale: to reduce the number of channels (data dimensionality)
  • normalization: feeding normalized data to the model even on new data distribution (with the mean and standard deviation of the training set).

Architecture

As suggested by the default repository, the LeNet5 architecture was used for image classification. This architecture has already shown great results on small grayscale images (cf. MNIST), despite its small size, which is a similar problem in terms of number of classes (a few dozens), input size (32x32), number of channels (grayscale) and learned task (digit classification, traffic sign classification).

lenet5

Source: Gradient-based learning applied to document recognition (1998)

Picking a light architecture like LeNet5 is expected to prevent overfitting and to ease the training convergence, thanks to the small number of learnable parameters (compared to a VGG or ResNet architecture for instance).

By implementing the architecture from the paper and introducing dropout in the fully connected layers, we get our final model:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 6)         156       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 10, 10, 16)        2416      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten (Flatten)            (None, 400)               0         
_________________________________________________________________
dense (Dense)                (None, 120)               48120     
_________________________________________________________________
dense_1 (Dense)              (None, 84)                10164     
_________________________________________________________________
dropout (Dropout)            (None, 84)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 43)                3655      
=================================================================
Total params: 64,511
Trainable params: 64,511
Non-trainable params: 0

Training sequence

The training sequence is handled by the train.py script as follows:

  • optimizer: Adam (commonly used for image classification)
  • batch size: 128 (selected based on GPU RAM capacity)
  • epochs: 10
  • learning rate scheduler: flat (constant)
  • learning rate: 1e-3
  • Dropout rate: 0.2 (reduces overfitting without extending the number of epochs too much)

The training procedure yields very positive results with a final validation accuracy of 95.92%.

training_monitoring

The accuracy over the testing set reaches 92.52%, which shows a good generalization capacity of the model, without any apparent overfitting. The model parameters' values were saved here.

Evaluation on new images

Eight images were selected on the Web for testing purposes (their resized version is shown below):

all_samples

The difference in resolution and size may have detrimental effects on the model performances for these images. Other visual deformations would also have an influence but the set was manually selected to avoid those.

Using the test.py script, we can inspect the predictions of the trained model over the test images. Feel free to use your own.

test_sample01

test_sample02

test_sample03

test_sample04

test_sample05

With this limited dataset, the aggregated accuracy of the model is 62.50%. The results for the first 5 samples of the dataset are shown above. A larger dataset would confirm potential ideas to improve the architecture and training procedure.

Feature visualization

Finally, by passing data through the trained model, we can visualize the activation of specific feature layers for a given image. Below are examples of different layers for a sample image.

1st convolution layer

activ_conv2d

1st maxpooling layer

activ_maxpool2d

2nd convolution layer

activ_conv2d_1

2nd maxpooling layer

activ_maxpool2d_1

Credits

This implementation is vastly based on the following methods:

License

Distributed under the MIT License. See LICENSE for more information.