porting cxxnet data science bowl 1 example to mxnet #1143

Lopezurrutia · 2016-01-03T12:56:00Z

I am trying to port to mxnet the example code in cxxnet for Kaggle data science bowl (1) https://github.com/dmlc/cxxnet/tree/master/example/kaggle_bowl

I have most things working but I do not understand something in the bowl.conf for cxxnet. (excuse my ignorance but I am new to deep learning, I am a marine biologist with interests in image classification).

Although images are scaled to 48x48, the bowl.conf file has "input_shape = 3,40,40", why is that?
In mxnet, I have rescaled all images to 48x48 but if I use
data_shape = (3, 48, 48)
when I define the ImageRecordIter as:

def get_iterator(args, kv):
    data_shape = (3, 48, 48)
    train = mx.io.ImageRecordIter(
        path_imgrec = args.data_dir + "tr.rec",
        mean_img    = args.data_dir + "mean.bin",
        data_shape  = data_shape,
        batch_size  = args.batch_size,
        rand_crop   = True,
        rand_mirror = True,
        max_rotate_angle=180,
        max_aspect_ratio = 0.5,
        max_shear_ratio = 0.3,
        min_crop_size=32,
        max_crop_size=48,
        num_parts   = kv.num_workers,
        part_index  = kv.rank)

    val = mx.io.ImageRecordIter(
        path_imgrec = args.data_dir + "va.rec",
        mean_img    = args.data_dir + "mean.bin",
        data_shape  = data_shape,
        batch_size  = args.batch_size,
        num_parts   = kv.num_workers,
        part_index  = kv.rank)
     return (train, val)

I get train/val accuracies of 0.44/0.59 while if I use a data_shape=(3,40,40) I get 0.58/0.59
I guess it is something to do with the crop sizes, could someone explain how to set the correct data_shape?
Thanks ,
Angel

this is the network structure I am using (translated from bowl.conf):


def get_symbol(num_classes = 121):
    input_data = mx.symbol.Variable(name="data")
    conv1 = mx.symbol.Convolution(
        data=input_data, kernel=(4, 4), stride=(1, 1), num_filter=48,pad=(2,2))
    relu1 = mx.symbol.Activation(data=conv1, act_type="relu")
    pool1 = mx.symbol.Pooling(
        data=relu1, pool_type="max", kernel=(3, 3), stride=(2,2))
    ###############                                                                                                                                                             
    conv2 = mx.symbol.Convolution(
        data=pool1, kernel=(3, 3), pad=(1, 1), stride=(1, 1), num_filter=96)
    relu2 = mx.symbol.Activation(data=conv2, act_type="relu")
    conv3 = mx.symbol.Convolution(
        data=relu2, kernel=(3, 3), pad=(1, 1), stride=(1, 1), num_filter=96)
    relu3 = mx.symbol.Activation(data=conv3, act_type="relu")
    pool2 = mx.symbol.Pooling(data=relu3, kernel=(3, 3), stride=(2, 2), pool_type="max")
    ###############                                                                                                                                                             
    conv4 = mx.symbol.Convolution(
        data=pool2, kernel=(2, 2), stride=(1, 1), num_filter=128)
    relu4 = mx.symbol.Activation(data=conv4, act_type="relu")
    conv5 = mx.symbol.Convolution(
        data=relu4, kernel=(3, 3), stride=(1, 1), num_filter=128)
    pool3 = mx.symbol.Pooling(data=conv5, kernel=(3, 3), stride=(2, 2), pool_type="max")
    ##                                                                                                                                                                          
    flatten = mx.symbol.Flatten(data=pool3)
    fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=256)
    dropout1 = mx.symbol.Dropout(data=fc1, p=0.5)
    fc2 = mx.sym.FullyConnected(data=dropout1, num_hidden=num_classes)
    softmax = mx.symbol.SoftmaxOutput(data=fc2, name='softmax')
    return softmax

The text was updated successfully, but these errors were encountered:

antinucleon · 2016-01-03T18:36:04Z

Hi @Gelu74,

I will probably make an example in of original sample today and post it at here. After you successfully run it, I recommend to try example here: https://github.com/auroraxie/Kaggle-NDSB

Lopezurrutia · 2016-01-03T18:40:26Z

Thanks @antinucleon
Look forward to it, yes the solution by auroraxie was my next step, I am trying to understand the simpler model first.

antinucleon · 2016-01-05T09:15:51Z

@Gelu74
Here is a new example. Also you may find MXNet document to be very useful: https://mxnet.readthedocs.org/en/latest/

antinucleon · 2016-01-05T09:16:21Z

Forgot the link: https://github.com/dmlc/mxnet/tree/master/example/kaggle-ndsb1

Lopezurrutia · 2016-01-06T09:56:47Z

Thanks a lot @antinucleon for taking the time to write the example.
Works great except the second sed command that in my system throws an error, I think it should be:
sed -n '20001, 30337p' train.lst > va.lst (i.e without the "p" after 20001)
The training runs ok but I am reaching very different accuracies from run to run. I guess it is due to the random initialization of the network and reaching some local minimum

These are my results for 5 consecutive runs:

INFO:root:Epoch[34] Train-accuracy=0.539437
INFO:root:Epoch[34] Time cost=105.434
INFO:root:Epoch[34] Validation-accuracy=0.576292

INFO:root:Epoch[34] Train-accuracy=0.065096
INFO:root:Epoch[34] Time cost=103.263
INFO:root:Epoch[34] Validation-accuracy=0.061150

INFO:root:Epoch[34] Train-accuracy=0.570038
INFO:root:Epoch[34] Time cost=36.997
INFO:root:Epoch[34] Validation-accuracy=0.604167

INFO:root:Epoch[34] Train-accuracy=0.065096
INFO:root:Epoch[34] Time cost=36.892
INFO:root:Epoch[34] Validation-accuracy=0.061150

INFO:root:Epoch[34] Train-accuracy=0.601637
INFO:root:Epoch[34] Time cost=36.978
INFO:root:Epoch[34] Validation-accuracy=0.630112

antinucleon · 2016-01-06T10:59:23Z

That's strange and should not have such a great gap. Are you using cudnn? Which card are you using?

Lopezurrutia · 2016-01-06T11:22:31Z

Yes cudnn.
Some info of my system:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty

lspci | grep -i nvidia
02:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40m](rev a1)

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

antinucleon · 2016-01-06T22:01:52Z

cudnn may make different output but not in such a great gap. Could you try to add this line at the beginning of the file? mx.random.seed(8964)

Lopezurrutia · 2016-01-06T23:53:55Z

With that random seed, I did not get the low accuracy results in four runs. Although I am still getting quite some variability:

INFO:root:Epoch[34] Train-accuracy=0.550869
INFO:root:Epoch[34] Time cost=37.199
INFO:root:Epoch[34] Validation-accuracy=0.598765

INFO:root:Epoch[34] Train-accuracy=0.510184
INFO:root:Epoch[34] Time cost=37.007
INFO:root:Epoch[34] Validation-accuracy=0.547164

INFO:root:Epoch[34] Train-accuracy=0.460114
INFO:root:Epoch[34] Time cost=37.002
INFO:root:Epoch[34] Validation-accuracy=0.502990

INFO:root:Epoch[34] Train-accuracy=0.538588
INFO:root:Epoch[34] Time cost=36.958
INFO:root:Epoch[34] Validation-accuracy=0.580633

I thought that by setting the random seed there wouldn't be any variability.. where does the randomness come from?

antinucleon · 2016-01-07T00:36:43Z

Try to build without CuDNN. CuDNN doesn't grantee to reproduce result.
Also, in iterator, we should set a seed as we are using random
augmentation.

On Wed, Jan 6, 2016 at 4:53 PM, Angel Lopez-Urrutia <
notifications@github.com> wrote:

With that random seed, I did not get the low accuracy results in four
runs. Although I am still getting quite some variability:

INFO:root:Epoch[34] Train-accuracy=0.550869
INFO:root:Epoch[34] Time cost=37.199
INFO:root:Epoch[34] Validation-accuracy=0.598765

INFO:root:Epoch[34] Train-accuracy=0.510184
INFO:root:Epoch[34] Time cost=37.007
INFO:root:Epoch[34] Validation-accuracy=0.547164

INFO:root:Epoch[34] Train-accuracy=0.460114
INFO:root:Epoch[34] Time cost=37.002
INFO:root:Epoch[34] Validation-accuracy=0.502990

INFO:root:Epoch[34] Train-accuracy=0.538588
INFO:root:Epoch[34] Time cost=36.958
INFO:root:Epoch[34] Validation-accuracy=0.580633

I thought that by setting the random seed there wouldn't be any
variability.. where does the randomness come from?

—
Reply to this email directly or view it on GitHub
#1143 (comment).

Lopezurrutia · 2016-01-07T12:40:23Z

sorry, there must have been an error with my cudnn installation and I guess mxnet was not using cudnn.
I am trying to reinstall now but I am encountering some errors building mxnet with CUDNN support. I will open a separate issue.
thanks!

antinucleon · 2016-01-07T16:43:16Z

No it is not your problem, for CuDNN fastest mode, it is known that the result is not deterministic.

Lopezurrutia · 2016-01-07T16:47:07Z

ummm, I am now not sure whether I was running with cudnn at all... I have reinstalled my system and I am not able to compile with cudnn support now.. see #1207

Lopezurrutia closed this as completed Jan 7, 2016

Lopezurrutia mentioned this issue Jan 16, 2016

Prediction for example kaggle_ndsb1 #1296

Closed

Lopezurrutia mentioned this issue Mar 6, 2016

R-package lr_scheluder, clip_gradient and xavier initializer #1323

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

porting cxxnet data science bowl 1 example to mxnet #1143

porting cxxnet data science bowl 1 example to mxnet #1143

Lopezurrutia commented Jan 3, 2016

antinucleon commented Jan 3, 2016

Lopezurrutia commented Jan 3, 2016

antinucleon commented Jan 5, 2016

antinucleon commented Jan 5, 2016

Lopezurrutia commented Jan 6, 2016

antinucleon commented Jan 6, 2016

Lopezurrutia commented Jan 6, 2016

antinucleon commented Jan 6, 2016

Lopezurrutia commented Jan 6, 2016

antinucleon commented Jan 7, 2016

Lopezurrutia commented Jan 7, 2016

antinucleon commented Jan 7, 2016

Lopezurrutia commented Jan 7, 2016

porting cxxnet data science bowl 1 example to mxnet #1143

porting cxxnet data science bowl 1 example to mxnet #1143

Comments

Lopezurrutia commented Jan 3, 2016

antinucleon commented Jan 3, 2016

Lopezurrutia commented Jan 3, 2016

antinucleon commented Jan 5, 2016

antinucleon commented Jan 5, 2016

Lopezurrutia commented Jan 6, 2016

antinucleon commented Jan 6, 2016

Lopezurrutia commented Jan 6, 2016

antinucleon commented Jan 6, 2016

Lopezurrutia commented Jan 6, 2016

antinucleon commented Jan 7, 2016

Lopezurrutia commented Jan 7, 2016

antinucleon commented Jan 7, 2016

Lopezurrutia commented Jan 7, 2016