Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NASNet models #8714

Merged
merged 12 commits into from
Dec 12, 2017
Merged

Add NASNet models #8714

merged 12 commits into from
Dec 12, 2017

Conversation

titu1994
Copy link
Contributor

@titu1994 titu1994 commented Dec 7, 2017

Includes the model builders for NASNet CIFAR, Mobile and Large.

Notes:

  • Weights for NASNet Mobile and Large are available.
  • Only the pre-built models are made available, not the general builder
  • Includes application tests
  • Does not include the auxiliary branch in any model.

@ahundt
Copy link
Contributor

ahundt commented Dec 7, 2017

looks nice! I think for the pr here in keras it might be wise to consider explaining how the usage of this model is different from using other models in the docs, particularly w.r.t. the aux network.

@titu1994
Copy link
Contributor Author

titu1994 commented Dec 7, 2017

As per the suggestion in the preceding issue, I've removed the auxiliary branch to these models. They are meant for inference rather than training, so it makes sense to do so.

The contrib version will contain the full model, with the auxiliary branch for training purposes.

@ahundt
Copy link
Contributor

ahundt commented Dec 7, 2017

For others coming across this, #8711 is the discussion being referred to.


Based on the following implementations:
- [TF Slim Implementation]
(https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/nasnet.)
Copy link
Contributor

@bdwyer2 bdwyer2 Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link isn't working for me.

Did you mean https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/nasnet.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I missed the .py part.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

default_size=224)


def NASNetCIFAR(input_shape=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this model? Do we need to include it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be trained on smaller image sizes, and has different parameter requirements than for ImageNet. However, since the purpose of the applications is towards inference rather than training, I suppose we should not include it.

from .. import backend as K

_BN_DECAY = 0.9997
_BN_EPSILON = 1e-3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we include NASNetCIFAR, then these are not global constants, these are variables that must be changed later on. Thus they should be passed around as function arguments.

If we don't include NASNetCIFAR, that solves the problem I think?

Copy link
Contributor Author

@titu1994 titu1994 Dec 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NASNet CIFAR is useful for training on small images (32x32 rather than 224x224). However, since keras applications are not generally used for training, I think removing it is best.

_BN_DECAY = 0.9997
_BN_EPSILON = 1e-3

NASNET_MOBILE_WEIGHT_PATH = "https://github.com/titu1994/Keras-NASNet/releases/download/v1.0/NASNet-mobile.h5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ' as string delimiter, for consistency. This applies everywhere in the file.


def NASNet(input_shape=None,
penultimate_filters=4032,
nb_blocks=6,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use blocks or num_blocks for consistency with the rest of the API.

else:
img_input = input_tensor

assert penultimate_filters % 24 == 0, "`penultimate_filters` needs to be divisible " \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use assert, instead use a ValueError with a nice error message. See https://blog.keras.io/user-experience-design-for-apis.html

nb_blocks=6,
stem_filters=96,
skip_reduction=True,
filters_multiplier=2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_multiplier

skip_reduction=True,
filters_multiplier=2,
dropout=0.5,
weight_decay=5e-5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropout, weight decay are regularization parameter and should not be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that for fine tuning purposes (with include_top=False), weight decay is important.

As to dropout, I included it since the original codebase uses it with these values. I can remove them from the Keras version, but the weights will need to be updated on your end to ignore the Dropout layer (or use by_name=True)

model = Model(inputs, x, name='NASNet')

# load weights
if weights == 'imagenet':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also support the case where weights is a path to a file (see other applications).

filters = penultimate_filters // 24

if not skip_reduction:
x = Conv2D(stem_filters, (3, 3), strides=(2, 2), padding='valid', use_bias=False, name='stem_conv1',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout the file, reduce line size to 80 char.

x = SeparableConv2D(filters, kernel_size, strides=strides, name='separable_conv_1_%s' % id,
padding='same', use_bias=False, kernel_initializer='he_normal',
kernel_regularizer=l2(weight_decay))(x)
x = BatchNormalization(axis=channel_dim, momentum=_BN_DECAY, epsilon=_BN_EPSILON,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make momentum and epsilon function arguments

Copy link
Contributor Author

@titu1994 titu1994 Dec 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are keeping only the imagenet version, there is no need to keep the batchnorm parameters as arguments, and they can be inlined.

@titu1994
Copy link
Contributor Author

I've applied most of the corrections discussed in the review.
A few that need further discussion :

  • stem_filters name. I suggest num_stem_block_filters.
  • Removal of Dropout and the weight decay parameters. I dont think weight decay should be removed.

Dropout can be removed, but the weight files must be updated (to remove the Dropout Layer) when they are replicated in the Keras Deep Learning repository. Currently, I use by_name=True to avoid having to update the weight files.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also introduce these 2 in the Application docs page?

stem_filters=96,
skip_reduction=True,
filter_multiplier=2,
weight_decay=5e-5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove weight decay (it's regularization)

pooling=None,
classes=1000,
default_size=None):
'''Instantiates a NASNet model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a blank line after this

@fchollet
Copy link
Member

Currently, I use by_name=True to avoid having to update the weight files.

Don't, as this is extremely brittle. Weights files should always be loaded topologically.

Dropout doesn't affect weight loading. To remove the dp layer, just load the model and save it again.

@titu1994
Copy link
Contributor Author

titu1994 commented Dec 10, 2017

@fchollet I am wondering, should we expose the NASNet model builder itself?

Right now I am using __init__.py to expose only NASNetLarge and NASNetMobile, which are the pre-built versions. NASNet - which is the builder method, is hidden.

@fchollet
Copy link
Member

It's not hidden, it can still be imported from keras.applications.nasnet. This is fine.

@titu1994
Copy link
Contributor Author

Oh, that's good. I've made the other changes as requested.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! A few last comments (I think)

NASNet models use the notation `NASNet (N @ P)`, where:
- N is the number of blocks
- P is the number of penultimate filters
num_stem_block_filters: number of filters in the initial stem block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with penultimate_filters (and filters arg in conv layers) this should be stem_block_filters

model.load_weights(weights_file)
else:
raise ValueError(
'ImageNet weights can only be loaded on NASNetLarge'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with" would fit better than "on" I think

into, only to be specified if `include_top` is True, and
if no `weights` argument is specified.
default_size: specifies the default image size of the model
# Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce blank line before section

default_size: specifies the default image size of the model
# Returns
A Keras model instance.
# Raises
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce blank line before section

p: input tensor which needs to be modified
ip: input tensor whose shape needs to be matched
filters: number of output filters to be matched
weight_decay: l2 regularization weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no weight_decay arg now

epsilon=1e-3,
name='adjust_bn_%s' % id)(p)

elif p._keras_shape[channel_dim] != filters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use _keras_shape, instead use K.int_shape

if p is None:
p = ip

elif p._keras_shape[img_dim] != ip._keras_shape[img_dim]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use _keras_shape, instead use K.int_shape

ip: input tensor `x`
p: input tensor `p`
filters: number of output filters
weight_decay: l2 regularization weight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

return p


def _normal_A(ip, p, filters, id=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use id keyword

function names should be snake case (no caps)

return x, ip


def _reduction_A(ip, p, filters, id=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remarks as above

@fchollet
Copy link
Member

Tests are failing; seems int_shape is getting called on None. https://travis-ci.org/fchollet/keras/jobs/314533802

@titu1994
Copy link
Contributor Author

Sorry about that. Forgot about the p being None case. Fixed now.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fchollet fchollet merged commit dc95cec into keras-team:master Dec 12, 2017
@micahprice
Copy link

I'm confused why Dropout is being removed, when it is included in the MobileNet Application? I would also like to echo @athundt's sentiment that it wasn't clear to me that Applications are intended for inference, not transfer learning (which I have always used them for).

@fchollet
Copy link
Member

Because each training round requires a specific regularization configuration based on the task and dataset you're working with. The base regularization configuration in these models is designed specifically for training them from scratch on ImageNet, and wouldn't really work for fine-tuning anyway.

@ahundt
Copy link
Contributor

ahundt commented Mar 15, 2018

There are several independent reports of problems with the NASNet weights, see #9586

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants