-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NASNet models #8714
Add NASNet models #8714
Conversation
looks nice! I think for the pr here in keras it might be wise to consider explaining how the usage of this model is different from using other models in the docs, particularly w.r.t. the aux network. |
As per the suggestion in the preceding issue, I've removed the auxiliary branch to these models. They are meant for inference rather than training, so it makes sense to do so. The contrib version will contain the full model, with the auxiliary branch for training purposes. |
For others coming across this, #8711 is the discussion being referred to. |
keras/applications/nasnet.py
Outdated
|
||
Based on the following implementations: | ||
- [TF Slim Implementation] | ||
(https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/nasnet.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link isn't working for me.
Did you mean https://github.com/tensorflow/models/blob/master/research/slim/nets/nasnet/nasnet.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I missed the .py part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
keras/applications/nasnet.py
Outdated
default_size=224) | ||
|
||
|
||
def NASNetCIFAR(input_shape=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for this model? Do we need to include it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be trained on smaller image sizes, and has different parameter requirements than for ImageNet. However, since the purpose of the applications is towards inference rather than training, I suppose we should not include it.
keras/applications/nasnet.py
Outdated
from .. import backend as K | ||
|
||
_BN_DECAY = 0.9997 | ||
_BN_EPSILON = 1e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we include NASNetCIFAR, then these are not global constants, these are variables that must be changed later on. Thus they should be passed around as function arguments.
If we don't include NASNetCIFAR, that solves the problem I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NASNet CIFAR is useful for training on small images (32x32 rather than 224x224). However, since keras applications are not generally used for training, I think removing it is best.
keras/applications/nasnet.py
Outdated
_BN_DECAY = 0.9997 | ||
_BN_EPSILON = 1e-3 | ||
|
||
NASNET_MOBILE_WEIGHT_PATH = "https://github.com/titu1994/Keras-NASNet/releases/download/v1.0/NASNet-mobile.h5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use '
as string delimiter, for consistency. This applies everywhere in the file.
keras/applications/nasnet.py
Outdated
|
||
def NASNet(input_shape=None, | ||
penultimate_filters=4032, | ||
nb_blocks=6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use blocks
or num_blocks
for consistency with the rest of the API.
keras/applications/nasnet.py
Outdated
else: | ||
img_input = input_tensor | ||
|
||
assert penultimate_filters % 24 == 0, "`penultimate_filters` needs to be divisible " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use assert
, instead use a ValueError
with a nice error message. See https://blog.keras.io/user-experience-design-for-apis.html
keras/applications/nasnet.py
Outdated
nb_blocks=6, | ||
stem_filters=96, | ||
skip_reduction=True, | ||
filters_multiplier=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter_multiplier
keras/applications/nasnet.py
Outdated
skip_reduction=True, | ||
filters_multiplier=2, | ||
dropout=0.5, | ||
weight_decay=5e-5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropout, weight decay are regularization parameter and should not be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that for fine tuning purposes (with include_top=False
), weight decay is important.
As to dropout, I included it since the original codebase uses it with these values. I can remove them from the Keras version, but the weights will need to be updated on your end to ignore the Dropout layer (or use by_name=True)
model = Model(inputs, x, name='NASNet') | ||
|
||
# load weights | ||
if weights == 'imagenet': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also support the case where weights
is a path to a file (see other applications).
keras/applications/nasnet.py
Outdated
filters = penultimate_filters // 24 | ||
|
||
if not skip_reduction: | ||
x = Conv2D(stem_filters, (3, 3), strides=(2, 2), padding='valid', use_bias=False, name='stem_conv1', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout the file, reduce line size to 80 char.
keras/applications/nasnet.py
Outdated
x = SeparableConv2D(filters, kernel_size, strides=strides, name='separable_conv_1_%s' % id, | ||
padding='same', use_bias=False, kernel_initializer='he_normal', | ||
kernel_regularizer=l2(weight_decay))(x) | ||
x = BatchNormalization(axis=channel_dim, momentum=_BN_DECAY, epsilon=_BN_EPSILON, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to make momentum
and epsilon
function arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are keeping only the imagenet version, there is no need to keep the batchnorm parameters as arguments, and they can be inlined.
I've applied most of the corrections discussed in the review.
Dropout can be removed, but the weight files must be updated (to remove the Dropout Layer) when they are replicated in the Keras Deep Learning repository. Currently, I use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also introduce these 2 in the Application docs page?
keras/applications/nasnet.py
Outdated
stem_filters=96, | ||
skip_reduction=True, | ||
filter_multiplier=2, | ||
weight_decay=5e-5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove weight decay (it's regularization)
pooling=None, | ||
classes=1000, | ||
default_size=None): | ||
'''Instantiates a NASNet model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce a blank line after this
Don't, as this is extremely brittle. Weights files should always be loaded topologically. Dropout doesn't affect weight loading. To remove the dp layer, just load the model and save it again. |
@fchollet I am wondering, should we expose the NASNet model builder itself? Right now I am using |
It's not hidden, it can still be imported from |
Oh, that's good. I've made the other changes as requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! A few last comments (I think)
keras/applications/nasnet.py
Outdated
NASNet models use the notation `NASNet (N @ P)`, where: | ||
- N is the number of blocks | ||
- P is the number of penultimate filters | ||
num_stem_block_filters: number of filters in the initial stem block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with penultimate_filters
(and filters
arg in conv layers) this should be stem_block_filters
keras/applications/nasnet.py
Outdated
model.load_weights(weights_file) | ||
else: | ||
raise ValueError( | ||
'ImageNet weights can only be loaded on NASNetLarge' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"with" would fit better than "on" I think
keras/applications/nasnet.py
Outdated
into, only to be specified if `include_top` is True, and | ||
if no `weights` argument is specified. | ||
default_size: specifies the default image size of the model | ||
# Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce blank line before section
keras/applications/nasnet.py
Outdated
default_size: specifies the default image size of the model | ||
# Returns | ||
A Keras model instance. | ||
# Raises |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce blank line before section
keras/applications/nasnet.py
Outdated
p: input tensor which needs to be modified | ||
ip: input tensor whose shape needs to be matched | ||
filters: number of output filters to be matched | ||
weight_decay: l2 regularization weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no weight_decay arg now
keras/applications/nasnet.py
Outdated
epsilon=1e-3, | ||
name='adjust_bn_%s' % id)(p) | ||
|
||
elif p._keras_shape[channel_dim] != filters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use _keras_shape
, instead use K.int_shape
keras/applications/nasnet.py
Outdated
if p is None: | ||
p = ip | ||
|
||
elif p._keras_shape[img_dim] != ip._keras_shape[img_dim]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use _keras_shape
, instead use K.int_shape
keras/applications/nasnet.py
Outdated
ip: input tensor `x` | ||
p: input tensor `p` | ||
filters: number of output filters | ||
weight_decay: l2 regularization weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
keras/applications/nasnet.py
Outdated
return p | ||
|
||
|
||
def _normal_A(ip, p, filters, id=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use id
keyword
function names should be snake case (no caps)
keras/applications/nasnet.py
Outdated
return x, ip | ||
|
||
|
||
def _reduction_A(ip, p, filters, id=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remarks as above
Tests are failing; seems |
Sorry about that. Forgot about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
I'm confused why Dropout is being removed, when it is included in the MobileNet Application? I would also like to echo @athundt's sentiment that it wasn't clear to me that Applications are intended for inference, not transfer learning (which I have always used them for). |
Because each training round requires a specific regularization configuration based on the task and dataset you're working with. The base regularization configuration in these models is designed specifically for training them from scratch on ImageNet, and wouldn't really work for fine-tuning anyway. |
There are several independent reports of problems with the NASNet weights, see #9586 |
Includes the model builders for NASNet CIFAR, Mobile and Large.
Notes: