Add NASNet models #8714

titu1994 · 2017-12-07T02:44:29Z

Includes the model builders for NASNet CIFAR, Mobile and Large.

Notes:

Weights for NASNet Mobile and Large are available.
Only the pre-built models are made available, not the general builder
Includes application tests
Does not include the auxiliary branch in any model.

ahundt · 2017-12-07T17:12:33Z

looks nice! I think for the pr here in keras it might be wise to consider explaining how the usage of this model is different from using other models in the docs, particularly w.r.t. the aux network.

titu1994 · 2017-12-07T17:22:11Z

As per the suggestion in the preceding issue, I've removed the auxiliary branch to these models. They are meant for inference rather than training, so it makes sense to do so.

The contrib version will contain the full model, with the auxiliary branch for training purposes.

ahundt · 2017-12-07T17:38:36Z

For others coming across this, #8711 is the discussion being referred to.

bdwyer2 · 2017-12-07T18:29:51Z

keras/applications/nasnet.py

+
+Based on the following implementations:
+ - [TF Slim Implementation]
+   (https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/nasnet.)


This link isn't working for me.

Did you mean https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/nasnet.py?

Yup. I missed the .py part.

fchollet

Thanks for the PR!

fchollet · 2017-12-10T19:04:27Z

keras/applications/nasnet.py

+                  default_size=224)
+
+
+def NASNetCIFAR(input_shape=None,


What's the use case for this model? Do we need to include it?

It can be trained on smaller image sizes, and has different parameter requirements than for ImageNet. However, since the purpose of the applications is towards inference rather than training, I suppose we should not include it.

fchollet · 2017-12-10T19:05:28Z

keras/applications/nasnet.py

+from .. import backend as K
+
+_BN_DECAY = 0.9997
+_BN_EPSILON = 1e-3


If we include NASNetCIFAR, then these are not global constants, these are variables that must be changed later on. Thus they should be passed around as function arguments.

If we don't include NASNetCIFAR, that solves the problem I think?

NASNet CIFAR is useful for training on small images (32x32 rather than 224x224). However, since keras applications are not generally used for training, I think removing it is best.

fchollet · 2017-12-10T19:05:53Z

keras/applications/nasnet.py

+_BN_DECAY = 0.9997
+_BN_EPSILON = 1e-3
+
+NASNET_MOBILE_WEIGHT_PATH = "https://github.com/titu1994/Keras-NASNet/releases/download/v1.0/NASNet-mobile.h5"


Use ' as string delimiter, for consistency. This applies everywhere in the file.

fchollet · 2017-12-10T19:06:51Z

keras/applications/nasnet.py

+
+def NASNet(input_shape=None,
+           penultimate_filters=4032,
+           nb_blocks=6,


Use blocks or num_blocks for consistency with the rest of the API.

fchollet · 2017-12-10T19:08:04Z

keras/applications/nasnet.py

+        else:
+            img_input = input_tensor
+
+    assert penultimate_filters % 24 == 0, "`penultimate_filters` needs to be divisible " \


Don't use assert, instead use a ValueError with a nice error message. See https://blog.keras.io/user-experience-design-for-apis.html

fchollet · 2017-12-10T19:09:51Z

keras/applications/nasnet.py

+           nb_blocks=6,
+           stem_filters=96,
+           skip_reduction=True,
+           filters_multiplier=2,


filter_multiplier

fchollet · 2017-12-10T19:10:14Z

keras/applications/nasnet.py

+           skip_reduction=True,
+           filters_multiplier=2,
+           dropout=0.5,
+           weight_decay=5e-5,


Dropout, weight decay are regularization parameter and should not be included.

I would argue that for fine tuning purposes (with include_top=False), weight decay is important.

As to dropout, I included it since the original codebase uses it with these values. I can remove them from the Keras version, but the weights will need to be updated on your end to ignore the Dropout layer (or use by_name=True)

fchollet · 2017-12-10T19:11:01Z

keras/applications/nasnet.py

+    model = Model(inputs, x, name='NASNet')
+
+    # load weights
+    if weights == 'imagenet':


You should also support the case where weights is a path to a file (see other applications).

fchollet · 2017-12-10T19:11:19Z

keras/applications/nasnet.py

+    filters = penultimate_filters // 24
+
+    if not skip_reduction:
+        x = Conv2D(stem_filters, (3, 3), strides=(2, 2), padding='valid', use_bias=False, name='stem_conv1',


Throughout the file, reduce line size to 80 char.

fchollet · 2017-12-10T19:11:54Z

keras/applications/nasnet.py

+        x = SeparableConv2D(filters, kernel_size, strides=strides, name='separable_conv_1_%s' % id,
+                            padding='same', use_bias=False, kernel_initializer='he_normal',
+                            kernel_regularizer=l2(weight_decay))(x)
+        x = BatchNormalization(axis=channel_dim, momentum=_BN_DECAY, epsilon=_BN_EPSILON,


Better to make momentum and epsilon function arguments

Since we are keeping only the imagenet version, there is no need to keep the batchnorm parameters as arguments, and they can be inlined.

titu1994 · 2017-12-10T20:07:46Z

I've applied most of the corrections discussed in the review.
A few that need further discussion :

stem_filters name. I suggest num_stem_block_filters.
Removal of Dropout and the weight decay parameters. I dont think weight decay should be removed.

Dropout can be removed, but the weight files must be updated (to remove the Dropout Layer) when they are replicated in the Keras Deep Learning repository. Currently, I use by_name=True to avoid having to update the weight files.

fchollet

Can you also introduce these 2 in the Application docs page?

fchollet · 2017-12-10T21:08:24Z

keras/applications/nasnet.py

+           stem_filters=96,
+           skip_reduction=True,
+           filter_multiplier=2,
+           weight_decay=5e-5,


Remove weight decay (it's regularization)

fchollet · 2017-12-10T21:08:39Z

keras/applications/nasnet.py

+           pooling=None,
+           classes=1000,
+           default_size=None):
+    '''Instantiates a NASNet model.


Introduce a blank line after this

fchollet · 2017-12-10T21:11:27Z

Currently, I use by_name=True to avoid having to update the weight files.

Don't, as this is extremely brittle. Weights files should always be loaded topologically.

Dropout doesn't affect weight loading. To remove the dp layer, just load the model and save it again.

titu1994 · 2017-12-10T21:49:51Z

@fchollet I am wondering, should we expose the NASNet model builder itself?

Right now I am using __init__.py to expose only NASNetLarge and NASNetMobile, which are the pre-built versions. NASNet - which is the builder method, is hidden.

fchollet · 2017-12-10T23:50:41Z

It's not hidden, it can still be imported from keras.applications.nasnet. This is fine.

titu1994 · 2017-12-10T23:55:16Z

Oh, that's good. I've made the other changes as requested.

fchollet

Thank you! A few last comments (I think)

fchollet · 2017-12-11T00:00:03Z

keras/applications/nasnet.py

+            NASNet models use the notation `NASNet (N @ P)`, where:
+                -   N is the number of blocks
+                -   P is the number of penultimate filters
+        num_stem_block_filters: number of filters in the initial stem block


For consistency with penultimate_filters (and filters arg in conv layers) this should be stem_block_filters

fchollet · 2017-12-11T00:00:57Z

keras/applications/nasnet.py

+            model.load_weights(weights_file)
+        else:
+            raise ValueError(
+                'ImageNet weights can only be loaded on NASNetLarge'


"with" would fit better than "on" I think

fchollet · 2017-12-11T00:01:23Z

keras/applications/nasnet.py

+            into, only to be specified if `include_top` is True, and
+            if no `weights` argument is specified.
+        default_size: specifies the default image size of the model
+    # Returns


Introduce blank line before section

fchollet · 2017-12-11T00:01:29Z

keras/applications/nasnet.py

+        default_size: specifies the default image size of the model
+    # Returns
+        A Keras model instance.
+    # Raises


Introduce blank line before section

fchollet · 2017-12-11T00:01:49Z

keras/applications/nasnet.py

+        p: input tensor which needs to be modified
+        ip: input tensor whose shape needs to be matched
+        filters: number of output filters to be matched
+        weight_decay: l2 regularization weight


There's no weight_decay arg now

fchollet · 2017-12-11T00:02:45Z

keras/applications/nasnet.py

+                                       epsilon=1e-3,
+                                       name='adjust_bn_%s' % id)(p)
+
+        elif p._keras_shape[channel_dim] != filters:


Don't use _keras_shape, instead use K.int_shape

fchollet · 2017-12-11T00:03:03Z

keras/applications/nasnet.py

+        if p is None:
+            p = ip
+
+        elif p._keras_shape[img_dim] != ip._keras_shape[img_dim]:


Don't use _keras_shape, instead use K.int_shape

fchollet · 2017-12-11T00:03:26Z

keras/applications/nasnet.py

+        ip: input tensor `x`
+        p: input tensor `p`
+        filters: number of output filters
+        weight_decay: l2 regularization weight


fchollet · 2017-12-11T00:03:44Z

keras/applications/nasnet.py

+    return p
+
+
+def _normal_A(ip, p, filters, id=None):


Don't use id keyword

function names should be snake case (no caps)

fchollet · 2017-12-11T00:04:39Z

keras/applications/nasnet.py

+    return x, ip
+
+
+def _reduction_A(ip, p, filters, id=None):


Same remarks as above

fchollet · 2017-12-11T03:15:25Z

Tests are failing; seems int_shape is getting called on None. https://travis-ci.org/fchollet/keras/jobs/314533802

titu1994 · 2017-12-11T03:41:58Z

Sorry about that. Forgot about the p being None case. Fixed now.

fchollet

LGTM, thanks!

micahprice · 2017-12-12T20:16:15Z

I'm confused why Dropout is being removed, when it is included in the MobileNet Application? I would also like to echo @athundt's sentiment that it wasn't clear to me that Applications are intended for inference, not transfer learning (which I have always used them for).

fchollet · 2017-12-12T20:32:25Z

Because each training round requires a specific regularization configuration based on the task and dataset you're working with. The base regularization configuration in these models is designed specifically for training them from scratch on ImageNet, and wouldn't really work for fine-tuning anyway.

ahundt · 2018-03-15T22:38:14Z

There are several independent reports of problems with the NASNet weights, see #9586

titu1994 added 3 commits December 6, 2017 20:37

Add NASNet models and associated tests

9a62e76

Remove mention of weight decay and auxiliary branch from comments

2dc0bda

Fix PEP8

50a3e29

bdwyer2 reviewed Dec 7, 2017

View reviewed changes

Correct implementation hyperlink

318bcc7

fchollet reviewed Dec 10, 2017

View reviewed changes

titu1994 added 2 commits December 10, 2017 13:57

Make corrections as requested (other than regularizers)

5462200

Remove dropout

77a2096

Fix PEP8

073c826

fchollet reviewed Dec 10, 2017

View reviewed changes

Remove regularization and dropout

15fc1d1

Update applications.md

9c7385c

fchollet reviewed Dec 11, 2017

View reviewed changes

titu1994 added 2 commits December 10, 2017 18:54

Fix requested changes

6a6d608

Fix PEP8

9c3712c

Add check before calling K.int_shape

49250b4

fchollet approved these changes Dec 12, 2017

View reviewed changes

fchollet merged commit dc95cec into keras-team:master Dec 12, 2017

Add NASNet models #8714

Add NASNet models #8714

Conversation

titu1994 commented Dec 7, 2017

ahundt commented Dec 7, 2017

titu1994 commented Dec 7, 2017

ahundt commented Dec 7, 2017

bdwyer2 Dec 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 Dec 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

titu1994 Dec 10, 2017 • edited Loading

Choose a reason for hiding this comment

titu1994 commented Dec 10, 2017

fchollet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet commented Dec 10, 2017

titu1994 commented Dec 10, 2017 • edited Loading

fchollet commented Dec 10, 2017

titu1994 commented Dec 10, 2017

fchollet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet commented Dec 11, 2017

titu1994 commented Dec 11, 2017

fchollet left a comment

Choose a reason for hiding this comment

micahprice commented Dec 12, 2017

fchollet commented Dec 12, 2017

ahundt commented Mar 15, 2018

bdwyer2 Dec 7, 2017 •

edited

Loading

titu1994 Dec 10, 2017 •

edited

Loading

titu1994 Dec 10, 2017 •

edited

Loading

titu1994 commented Dec 10, 2017 •

edited

Loading