Introduction

This page introduce updater setting for cxxnet

General SGD Updater
Constant Learning Rate Scheduling
Exp Decay Learning Rate Scheduling
Poly Decay Learning Rate Scheduling
Factor Decay Learning Rate Scheduling

Updater

In default, the cxxnet will use the SGDUpdater. The eta, wd and momentum can be set differently to wmat or bias (namely, the weight and bias in each layer), by configure

wmat:eta = 0.1
bias:eta = 0.2

If not specify the target, the setting will take effect globally.

Basic configuration:

updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005

eta is known as learning rate, default is 0.01
momentum is momentum, default is 0.9
wd is known as weight decay (l2 regularization), default is 0.005
Global updater setting can be overridden in the layer configuration. eg.

# Global setting
eta = 0.01
momentum = 0.9
# Layer setting
netconfig=start
layer[0->1] = fullc:fc1
  nhidden = 100
  eta = 0.02
  momentum = 0.5
layer[1->2] = sigmoid:se1
layer[2->3] = fullc:fc1
  nhidden = 10
layer[3->3] = softmax
netconfig=end

In the layer fc1, the learning rate will be 0.02 and momentum will be 0.5, but layer fc2 will follow the global setting, whose learning rate will be 0.01 and momentum will be 0.9

=

Learning Rate Scheduling

There are some advanced features for SGDUpdater, like learning rate scheduling. We provides four learning rate scheduling method: constant , expdecay and polydecay and factor.

Common parameters

lr:start_epoch start learning rate scheduling after iteration, default is 0. Before that, we use constant learning rate specified in eta.
lr:minimum_lr minimum of learning rate, default is 0.0001
lr:step the parameter used in each scheduling method, elaborate in each method

Constant Scheduling

Example of constant scheduling: In this way the learning rate keep same

updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = constant

Exp Decay

Exponential Learning rate decay adjust learning rate like this formula: new_learning_rate = base_learning_rate * pow(gamma, iteration / step )

Example of expdecay scheduling: In this way the learning rate drop in exponential way

updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = expdecay
lr:start_iteration = 3000
lr:minimum_lr = 0.001
lr:gamma = 0.5
lr:step = 1000

lr:gamma learning decay param, default is 0.5

Poly Decay

Polynomial learning rate decay adjusts the learning rate like this formula: new_learning_rate = base_learning_rate * pow( 1.0 + (iteration/step) * gamma, -alpha )

Example of polydecay scheduling: In this way the learning rate drop in polynomial way

updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = polydecay
lr:start_epoch = 3000
lr:minimum_lr = 0.001
lr:alpha = 0.5
lr:gamma = 0.1
lr:step = 1000

lr:gamma learning decay param, default is 0.5
lr:alpha learning decay param, default is 0.5

Factor Decay

Factor Decay multiplies the learning rate by factor each step iterations. It is also called step scheduling.

Example of factor scheduling:

updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = factor
lr:minimum_lr = 0.001
lr:factor = 0.1
lr:step = 10000

lr:factor learning decay param, default is 0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updater.md

updater.md

Introduction

Updater

Learning Rate Scheduling

Common parameters

Constant Scheduling

Exp Decay

Poly Decay

Factor Decay

Files

updater.md

Latest commit

History

updater.md

File metadata and controls

Introduction

Updater

Learning Rate Scheduling

Common parameters

Constant Scheduling

Exp Decay

Poly Decay

Factor Decay