I found Caffe Initializers are hidden from the Caffe documentation. So in this post I would like to summarize the available initializers of Caffe.
Table of Contents
Specify initializers
In Caffe, initializers for layers with trainable parameters can be specified while defining a neural network.
On defining nets with Caffe prototxt
Caffe provides an interface to define the architecture of a neural network with a simple .prototxt
file. If you are not familiar with how it works, please refer to the Caffe documentation.
In the following example, I take the definition of the first convolutional layer of LeNet from Caffe examples.
Here, the weights of this layer is initialized with Xavier initialization and the bias is initialized with a constant (default value is 0).
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
Caffe uses weight_filler
to indicate the initializer being used for the weights and bias_filler
for the bias.
On defining nets with Pycaffe
Defining complex networks with .prototxt
files could be problematic. Therefore writing the .prototxt
files programatically with a script could be beneficial in practice. Fortunately, Pycaffe provides an interface to do so. Again, if you are not very familiar with Pycaffe, This blog by Tarek Allam Jr may be helpful.
The following python snippet defines a same convolutional layer as the layer aforementioned.
import sys
import caffe
from caffe import layers
net = caffe.NetSpec()
net.data = ...
net.conv1 = layers.Convolution(net.data, num_output=20, kernel_size=5,
stride=1, weight_filler='xavier',
bias_filler='constant')
...
Now you may want to know what are these fillers and how do I know which one is appropriate?
Well, as a rule of thumbs, the “xavier” filler [Glorot & Bengio, He et. al.] works just fine most of the times.
Caffe Initializers
Source codes, together with brief docstring, of all the weight/bias fillers can be found at in the filler.hpp
This section aims at briefly introducing the idea accompanied with the different initialization methods.
Ongoing……
Constant Filler
- Type: “constant”
- Simply set \(x = const.\).
Uniform Filler
- Type: “uniform”
- Sample small values from a uniform distribution.
- \(x\sim U(a, b)\) where \(a\), \(b\) determines the sampling range.
Gaussian Filler
- Type: “gaussian”
- Sample small values from a gaussian distribution. \(x \sim N(\mu, \sigma)\) where \(\mu\), \(sigma\) are fixed values for all the layers.
Positive Unitball Filler
- Type: “positive_unitball”
Xavier Filler
- Type: “xavier”
- Sample values from a uniform distribution
- \(x \sim U(-a, +a)\), where \(a=\sqrt{\frac{3}{n}}\)
- Here, \(n\) can be
fan_in
,fan_out
or their average fan_in
refers to the number of inputs of a neuron andfan_out
is the number of outputs (and same below)- Xavier initialization makes sure the initial values of the weights for each layer are:
- neither too small so that the signal passed through shrinks
- nor too large so that the signal explodes
- He et. al found \(a=\sqrt{\frac{6}{n}}\) works better for ReLU nonlinearity
- Similiar to:
keras.initializers.glorot_uniform
tensorflow.contrib.layers.xavier_initializer
MSRA Filler
- Type: “msra”
- \(x \sim N(0, \sigma^2)\) where \(\sigma \propto \frac{1}{n}\)
Bilinear Filler
key: “bilinear”