Caffe Initializers

Jihong on May 10, 2017

I found Caffe Initializers are hidden from the Caffe documentation. So in this post I would like to summarize the available initializers of Caffe.

Table of Contents

Specify initializers

In Caffe, initializers for layers with trainable parameters can be specified while defining a neural network.

On defining nets with Caffe prototxt

Caffe provides an interface to define the architecture of a neural network with a simple .prototxt file. If you are not familiar with how it works, please refer to the Caffe documentation.

In the following example, I take the definition of the first convolutional layer of LeNet from Caffe examples.

Here, the weights of this layer is initialized with Xavier initialization and the bias is initialized with a constant (default value is 0).

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

Caffe uses weight_filler to indicate the initializer being used for the weights and bias_filler for the bias.

On defining nets with Pycaffe

Defining complex networks with .prototxt files could be problematic. Therefore writing the .prototxt files programatically with a script could be beneficial in practice. Fortunately, Pycaffe provides an interface to do so. Again, if you are not very familiar with Pycaffe, This blog by Tarek Allam Jr may be helpful.

The following python snippet defines a same convolutional layer as the layer aforementioned.

import sys
import caffe
from caffe import layers

net = caffe.NetSpec()
net.data = ...
net.conv1 = layers.Convolution(net.data, num_output=20, kernel_size=5,
                               stride=1, weight_filler='xavier',
                               bias_filler='constant')
...

Now you may want to know what are these fillers and how do I know which one is appropriate?

Well, as a rule of thumbs, the “xavier” filler [Glorot & Bengio, He et. al.] works just fine most of the times.

Caffe Initializers

Source codes, together with brief docstring, of all the weight/bias fillers can be found at in the filler.hpp

This section aims at briefly introducing the idea accompanied with the different initialization methods.

Ongoing……

Constant Filler

  • Type: “constant”
  • Simply set \(x = const.\).

Uniform Filler

  • Type: “uniform”
  • Sample small values from a uniform distribution.
  • \(x\sim U(a, b)\) where \(a\), \(b\) determines the sampling range.

Gaussian Filler

  • Type: “gaussian”
  • Sample small values from a gaussian distribution. \(x \sim N(\mu, \sigma)\) where \(\mu\), \(sigma\) are fixed values for all the layers.

Positive Unitball Filler

  • Type: “positive_unitball”

Xavier Filler

  • Type: “xavier”
  • Sample values from a uniform distribution
  • \(x \sim U(-a, +a)\), where \(a=\sqrt{\frac{3}{n}}\)
  • Here, \(n\) can be fan_in, fan_out or their average
  • fan_in refers to the number of inputs of a neuron and fan_out is the number of outputs (and same below)
  • Xavier initialization makes sure the initial values of the weights for each layer are:
    • neither too small so that the signal passed through shrinks
    • nor too large so that the signal explodes
  • He et. al found \(a=\sqrt{\frac{6}{n}}\) works better for ReLU nonlinearity
  • Similiar to:
    • keras.initializers.glorot_uniform
    • tensorflow.contrib.layers.xavier_initializer

MSRA Filler

  • Type: “msra”
  • \(x \sim N(0, \sigma^2)\) where \(\sigma \propto \frac{1}{n}\)

Bilinear Filler

key: “bilinear”