GoogLeNet from Scratch

Objectives

Implement GoogLeNet family from scratch, including MiniGoogLeNet and GoogLeNet, and train them on CIFAR-10, Tiny ImageNet, and ImageNet datasets.

Construct MiniGoogLeNet and train the network on CIFAR-10 datasets to obtain ≥90% accuracy.
Construct GoogLeNet and train the network on Tiny ImageNet Visual Recognition Challenge and claim a top ranking position on Leaderboard.

Packages Used

Python 3.6
OpenCV 4.0.0
keras 2.2.4 for GoogLeNet on CIFAR-10 and 2.1.0 for the rest
Tensorflow 1.13.0
cuda toolkit 10.0
cuDNN 7.4.2
scikit-learn 0.20.2
Imutils
NumPy

Approaches

MiniGoogLeNet on CIFAR-10

The details about CIFAR-10 datasets can be found here.

The MiniGoogLeNet on CAFAR-10 dataset is inspired by Eric Jang and pluskid.

There are three modules inside MiniGoogLeNet, including Conv module, inception module, downsample module. Figure 1 shows the MiniGoogLeNet architecture.

Figure 1: MiniGoogLeNet architecture (reference).

The MiniGoogLeNet architecture can be found in minigooglenet.py (check here) under pipeline/nn/conv/ directory. The input to the model includes dimensions of the image (height, width, depth, and number of classes). In this part, the input would be (width = 32, height = 32, depth = 3, classes = 10).

The googlenet_cifar10.py (check here) is responsible for training the network, evaluating the model (including plotting the loss and accuracy curve of training and validation sets, providing the classification report), and serialize the model to disk.

There is a helper class:

The trainingmonitor.py (check here) under pipeline/callbacks/ directory create a TrainingMonitor callback that will be called at the end of every epoch when training a network. The monitor will construct a plot of training loss and accuracy. Applying such callback during training will enable us to babysit the training process and spot overfitting early, allowing us to abort the experiment and continue trying to tune parameters.

We could use following command to train the model.

python googlenet_cifar10.py --model output/minigooglenet_cifar10.hdf5 --output output

GoogLeNet on Tiny ImageNet Visual Recognition Challenge

The details about the challenge and dataset can be found here.

The tiny_imagenet_config.py (check here) under config/ directory stores all relevant configurations for the project, including the paths to input images, total number of class labels, information on the training, validation, and testing splits, path to the HDF5 datasets, and path to output models, plots, and etc.

Build the infrastructure for `HDF5` dataset

The hdf5datasetwriter.py (check here) under pipeline/io/ directory, defines a class that help to write raw images or features into HDF5 dataset.

The build_tiny_imagenet.py (check here) is used for serializing the raw images into an HDF5 dataset. Although Keras has methods that can allow us to use the raw file paths on disk as input to the training process, this method is highly inefficient. Each and every image residing on disk requires an I/O operation which introduces latency into training pipeline. Not only is HDF5 capable of storing massive dataset, but it is optimized for I/O operations.

We could use following command to build Tiny ImageNet dataset.

python build_tiny_imagenet.py

Build image pre-processors

The meanpreprocessor.py (check here) under pipeline/preprocessing/ directory subtracts the mean red, green, and blue pixel intensties across the training set, which is a form of data normalization. Mean subtraction is used to reduce the affects of lighting variations during classification.

The simplepreprocessor.py (check here) under pipeline/preprocessing/ directory defines a class to change the size of image. This class is just used to ensure that each input image has dimenison of 64x64x3.

The imagetoarraypreprocessor.py (check here) under pipeline/preprocessing/ directory defines a class to convert the image dataset into keras-compatile arrays.

Construct GoogLeNet architecture from scratch

Figure 2 shows the micro-architecture of inception module in GoogLeNet.

Figure 2: Inception module, which is the micro architecture of the GoogLeNet (reference).

Table 1 illustrates the GoogLeNet architecture (reference).

layer type	patch size/stride	output size	depth	#1x1	#3x3 reduce	#3x3	#5x5 reduce	#5x5	pool proj
convolution	7x7/2	112x112x64	1
max pool	3x3/2	56x56x64	0
convolution	3x3/1	56x56x192	2		64	192
max pool	3x3/2	28x28x192	0
inception(3a)		28x28x256	2	64	96	128	16	32	32
inception(3b)		28x28x480	2	128	128	192	32	96	64
max pool	3x3/2	14x14x480	0
inception(4a)		14x14x512	2	192	96	208	16	48	64
inception(4b)		14x14x512	2	160	112	224	24	64	64
inception(4c)		14x14x512	2	128	128	256	24	64	64
inception(4d)		14x14x528	2	112	144	288	32	64	64
inception(4e)		14x14x832	2	256	160	320	32	128	128
max pool	3x3/2	7x7x832	0
inception(5a)		7x7x832	2	256	160	320	32	128	128
inception(5b)		7x7x1024	2	384	192	384	48	128	128
avg pool	7x7/1	1x1x1024	0
dropout(40%)		1x1x1024	0
linear		1x1x1000	1
softmax		1x1x1000	0

Instead of using 7x7 filters with stride of 2x2 in the first convolution layer, I use 5x5 filters with stride of 1x1, since the input images have dimension of 64x64x3, unlike original GoogLeNet which has input dimension of 224x224x3. Thus, 7x7 filters with stride of 2x2 will reduce the dimensions too quickly. Also the size of average pooling layer should be 4x4 instead of 7x7, with stride of 1.

The GoogleNet can be found in googlenet.py (check here) under nn/conv/ directory.

Train the GoogLeNet and evaluate it

I use a "ctrl+c" method to train the model as a baseline. By using this method, I can start training with an initial learning rate (and associated set of hyperparameters), monitor training, and quickly adjust the learning rate based on the results as they come in.

The train.py (check here) is responsible for training the baseline model. The TrainingMonitor callback is responsible for plotting the loss and accuracy curves of training and validation sets. And the EpochCheckpoint callback is responsible for saving the model every 5 epochs.

After getting a sense of baseline model, I will switch to use method of learning rate decay to re-train the model. The train_decay.py (check here) change the method from "ctrl+c" to learning rate decay to re-train the model. The TrainingMonitor callback again is responsible for plotting the loss and accuracy curves of training and validation sets. The LearningRateScheduler callback is responsible for learning rate decay.

The rank_accuracy.py (check here) measures the rank-1 and rank-5 accuracy of the model by using the testing set.

There are some helper classes for training process, including:

The EpochCheckpoint.py (check here) under pipeline/callbacks/ directory can help to store individual checkpoints for GoogLeNet so that we do not have to retrain the network from beginning. The model is stored every 5 epochs.

The hdf5datasetgenerator.py (check here) under pipeline/io/ directory yields batches of images and labels from HDF5 dataset. This class can help to facilitate our ability to work with datasets that are too big to fit into memory.

The ranked.py (check here) under pipeline/utils/ directory contains a helper function to measure both the rank-1 and rank-5 accuracy when the model is evaluated by using testing set.

We could use following command to train the model if we start from the beginning.

python train.py --checkpoints output/checkpoints

If we start the training at middle of the epochs (simply use a number to replace {epoch_number_you_want_to_start}):

python train.py --checkpoints output/checkpoints --model output/checkpoints/epoch_{epoch_number_you_want_to_start}.hdf --start_epoch {the_epoch_number_you_want_to_start}

For learning rate decay, just use following command:

python train_decay.py --model output/googlenet_tinyimagenet_decay.hdf5

In order to use testing set to evaluate the network, use the following command:

python rank_accuracy.py

Results

MiniGoogLeNet on CIFAR-10

Figure 3 demonstrates the loss and accuracy curve of training and validation sets. And Figure 4 shows the evaluation of the network, which indicate a 90% accuracy.

Figure 3: Plot of training and validation loss and accuracy.

Figure 4: Evaluation of the network, indicating 90% accuracy.

GoogLeNet on Tiny ImageNet

Experiment 1

In experiment 1, I use "ctrl+c" method with learning rate schedule shown as Table 2. SGD optimizer with momentum of 0.9 and nesterov acceleration is used. The sequence of convolution_module is CONV => BN => ReLU

Table 2: Learning rate schedule for experiment 1.

Epoch	Learning Rate
1 - 40	1e-3
41 - 60	1e-4
61 - 70	1e-5

Figure 5 demonstrates the loss and accuracy curve of training and validation sets. And Figure 6 shows the evaluation of the network, which indicate 55.05% rank-1 accuracy and 79.64% rank-5 accuracy.

Figure 5: Plot of training and validation loss and accuracy.

Figure 6: Evaluation of the network, indicating 55.05% rank-1 accuracy and 79.64% rank-5 accuracy.

Experiment 2

In experiment 2, I still use learning rate in Table 2. In order to increase the accuracy, I change the convolution module to use CONV => RELU => BN sequence instead of CONV => BN => RELU.

Figure 7 demonstrates the loss and accuracy curve of training and validation sets. And Figure 8 shows the evaluation of the network, which indicate 55.41% rank-1 accuracy and 80.68% rank-5 accuracy. There is about 0.35% increment in rank-1 accuracy and 1% increment in rank-5 accuracy, comparing to the result in experiment 1.

Figure 7: Plot of training and validation loss and accuracy.

Figure 8: Evaluation of the network, indicating 55.41% rank-1 accuracy and 80.68% rank-5 accuracy.

Experiment 3

In the experiment 3, I use the convolution module in experiment 2, but change the method from "ctrl+c" to learning rate decay. And the number of epoch is still 70.

Figure 9 demonstrates the loss and accuracy curve of training and validation sets. And Figure 10 shows the evaluation of the network, which indicate 57.34% rank-1 accuracy and 81.25% rank-5 accuracy.

Figure 9: Plot of training and validation loss and accuracy.

Figure 10: Evaluation of the network, indicating 57.34% rank-1 accuracy and 81.25% rank-5 accuracy.

By using this rank-1 accuracy, I can claim #5 position on the Leaderboard in Tiny ImageNet Visual Recognition Challenge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoogLeNet from Scratch

Objectives

Packages Used

Approaches

MiniGoogLeNet on CIFAR-10

GoogLeNet on Tiny ImageNet Visual Recognition Challenge

Build the infrastructure for `HDF5` dataset

Build image pre-processors

Construct GoogLeNet architecture from scratch

Train the GoogLeNet and evaluate it

Results

MiniGoogLeNet on CIFAR-10

GoogLeNet on Tiny ImageNet

Experiment 1

Experiment 2

Experiment 3

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
config		config
output		output
pipeline		pipeline
.gitignore		.gitignore
README.md		README.md
build_tiny_imagenet.py		build_tiny_imagenet.py
googlenet_cifar10.py		googlenet_cifar10.py
rank_accuracy.py		rank_accuracy.py
train.py		train.py
train_decay.py		train_decay.py

meng1994412/GoogLeNet_from_scratch

Folders and files

Latest commit

History

Repository files navigation

GoogLeNet from Scratch

Objectives

Packages Used

Approaches

MiniGoogLeNet on CIFAR-10

GoogLeNet on Tiny ImageNet Visual Recognition Challenge

Build the infrastructure for HDF5 dataset

Build image pre-processors

Construct GoogLeNet architecture from scratch

Train the GoogLeNet and evaluate it

Results

MiniGoogLeNet on CIFAR-10

GoogLeNet on Tiny ImageNet

Experiment 1

Experiment 2

Experiment 3

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Build the infrastructure for `HDF5` dataset

Packages