Introduction

This page introduces the layer related configurations of cxxnet.

Layer Specification

All layer configurations comes into

netconfig = start
layer[from->to] = layer_type:name
netconfig = end

from is the from node name, 0 means input data
to is the to node name.
layer_type is described below
name is an optional, but if you need to finetune the network to other task, name is a must, since it is used to indicate which layer to be copied.

Weight Initialization

Fully_Connected_Layers and Convolution_Layers require random weight initialization. We provide two initialization methods: gaussian and xaview:

random_type = gaussian
init_sigma = 0.01

We extra provide Xavier initialization method[1], by using the configuration

random_type = xavier

Global setting can be override in the layer configuration, eg

# global setting
random_type = gaussian
netconfig = start
wmat:lr  = 0.01
wmat:wd  = 0.0005
bias:wd  = 0.000
bias:lr  = 0.02
layer[0->1] = fullc:fc1
  # local setting start
  nhidden = 50
  random_type = xavier
  # local setting end 
layer[1->2] = relu
layer[2-3] = fullc
  # local setting start
  nhidden = 6
  init_sigma = 0.005
  wmat:lr = 0.1
  # local setting end
netconfig = end

By using this configuration, the fc1 layer will use Xavier method to initialize, while fully connected layer without name will use Gaussian random number with mu=0, sigma=0.005 to do initialization. Meanwhile fully connected layer without name will use a learning rate different with global.

=

Layer Types

= Connection Layer

Flatten Layer
Split Layer
Concat Layer
Channel Concat Layer

= Activation Layer

Rectified Linear Layer
Tanh Layer
Sigmoid Layer
Parametric ReLU layer

= Loss Layer

Softmax Layer
Euclidean Layer
Elementwise Logistic Layer

= Computation Layers

Convolution Layer
Fully Connected Layer

= Pooling Layers

Max Pooling Layer
Sum Pooling Layer
Average Pooling Layer

= Other Layers

Dropout Layer
Local Response Normalization Layer
Batch Normalization Layer

=

Connection Layer

Flatten Layer

Flatten Layer is used for flatten convolution layer. After flattening, we can use convolution output in the feed forward neural network. Namely, the shape of the output node is transformed to (batch, 1, 1, num_feature) instead of (batch, channel, width, height). Here is an example:

layer[15->16] = flatten

Split Layer

Split Layer is used for one-to-multi connection. It duplicate the input node in forward pass, and accumulated the gradient from output nodes in backward pass.

layer[15->16,17] = split

Concat Layer

Concat Layer is used to concatenate the last dimension (namely, num_feature) of the output of two nodes. It is usually used along with fully connected layer.

layer[18,19->20] = concat

Channel Concat Layer

Channel Concat Layer is used to concatenate the second dimension (namely, channel) of the output of two nodes. It is usually used along with convolution layer.

layer[18,19->20] = ch_concat

=

Activation Layer

We provide common active layers including , Rectified Linear (RELU), Sigmoid , Tanh and Parametric_RELU (pRELU).

=

Rectified Linear

The output of Rectified Linear is max(0, x). This is the most commonly used activation function in modern deep learning method.

layer[15->16] = relu

=

Tanh

Tanh uses the tanh as activation function. It transforms the input into range [-1, 1].

layer[15->16] = tanh

=

Sigmoid

Sigmoid uses the sigmoid as activation function. It transforms the input into range [0, 1].

layer[15->16] = sigmoid

=

Parametric Rectified Linear

pRELU is basically the implementation of [2]. In addition, we provide a parameter to add noise to the negative slope to reduce overfitting.

layer[15->16] = prelu
  random=0.5

random[optional] denotes standard deviation of the gaussian distribution randomly added to the negative part of pRELU. In testing, this noise part is discarded.

=

Loss Layer

Loss layers are self-looped layer. It defines the loss function for training.

Common Parameters:
grad_scale[optional]: scale the gradient generated by loss layer

=

Softmax

Softmax Loss Layer is the implementation of multi-class softmax loss function.

=

Euclidean

Euclidean Loss Layer is the implementation of elementwise l2 loss function.

=

Elementwise Logistic

Elementwise Logistic Loss Layer is the implementation of elementwise logistic loss function. It is suitable to multi-label classification problem.

=

Computation Layers

Fully Connected Layer

Fully Connection Layer fully connection layer is the basic element in feed forward neural network.

layer[18->19] = fullc
  nhidden = 1024

nhidden denotes the number of hidden units in the layer.

=

Convolution Layer

If built with CuDNN, the default convolution is CuDNN R2. If there is no CuDNN R2, convolution will be run on our own kernel. The configuration looks like

layer[0->1] = conv
  kernel_size = 11
  stride = 4
  nchannel = 96
  pad = 1

kernel_size is the convolution kernel size
stride is stride for convolution operation
nchannel is the output channel
pad is the number of pad
temp_col_max[optional] is the maximum size of expanding in convolution operation. The default value is 64, means the maximum size of temp_col is 64MB. Adjusting this variable may boost speed in training especially the input size is small in the convolution network. Note that this will only take effect when not using CuDNN.

=

Pooling Layer

Currectly we provide three Pooling methods: Sum Pooling , Max Pooling and Average Pooling . All pooling layers shared same parameters: stride and kernel_size

=

Sum Pooling

Sum Pooling sums up the values in the pooling region as result , eg

layer[4->5] = sum_pooling
  kernel_size = 3
  stride = 2

Max Pooling

Max Pooling takes the maximum value in the pooling region as result, eg

layer[4->5] = max_pooling
  kernel_size = 3
  stride = 2

Average Pooling

Average Pooling averages the values in the pooling region as result , eg

layer[4->5] = avg_pooling
  kernel_size = 3
  stride = 2

=

Other Layers

Dropout

Note that Dropout Layer is a self loop layer. You need to set to equal the from, eg

layer[3->3] = dropout:dp
  threshold = 0.5

threshold is the probability to drop an output.

=

Local Response Normalization

LRN normalizes the response of nearby kernels. Details can be found in the Alex's paper[3].

layer[3->4] = lrn
  local_size = 5
  alpha = 0.001
  beta = 0.75
  knorm = 1

local_size denotes the nearby kernel size to be evaluated
alpha, beta and knorm is normalization param.

=

Batch Normalization Layer

BN layer is an implementation of [4]. The difference is that in testing, we only use the mini-batch statistics instead of global statistics in training data as in original paper. It is an experimental layer that may not stable.

=

References

[1] Glorot Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." AISTATS. 2010.

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." arXiv preprint arXiv:1502.01852. 2015.

[3] Krizhevsky Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." NIPS. 2012.

[4] Ioffe Sergey, and Christian Szegedy. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." arXiv preprint arXiv:1502.03167. 2015.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layer.md

layer.md

Introduction

Layer Specification

Weight Initialization

Layer Types

Connection Layer

Flatten Layer

Split Layer

Concat Layer

Channel Concat Layer

Activation Layer

Rectified Linear

Tanh

Sigmoid

Parametric Rectified Linear

Loss Layer

Softmax

Euclidean

Elementwise Logistic

Computation Layers

Fully Connected Layer

Convolution Layer

Pooling Layer

Sum Pooling

Max Pooling

Average Pooling

Other Layers

Dropout

Local Response Normalization

Batch Normalization Layer

References

Files

layer.md

Latest commit

History

layer.md

File metadata and controls

Introduction

Layer Specification

Weight Initialization

Layer Types

Connection Layer

Flatten Layer

Split Layer

Concat Layer

Channel Concat Layer

Activation Layer

Rectified Linear

Tanh

Sigmoid

Parametric Rectified Linear

Loss Layer

Softmax

Euclidean

Elementwise Logistic

Computation Layers

Fully Connected Layer

Convolution Layer

Pooling Layer

Sum Pooling

Max Pooling

Average Pooling

Other Layers

Dropout

Local Response Normalization

Batch Normalization Layer

References