In this tutorial, we demonstrate how to apply batch normalization to a Convolutional Neural Network (CNN) using MatConvNet. This network is a SimpleNN type and we use the Cifar10 dataset.

Part 1: Building a Batch Normalized Network

Before we begin, please make sure that you have familiarized yourself with batch normalization: "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". Also, make sure you've done our tutorial for building a CNN with the cifar10 dataset. This example expands on that tutorial and it incorporates batch normalization into the network. Similar to that tutorial, we first create our training/validation database (which we will use for training and testing) and then we create our CNN architecture. However, this time we will create an optional variable, opts.batchNormalization, and we set it equal to true. Afterwards, we will insert a batch normalization layer after each convolutional layer using a short if-statement. Afterwards, we begin training our CNN with designated training data. Once training is finished, we prepare our trained CNN for testing by making a few small changes to the network.


	clc, clear all, close all;

	run ../../../matconvnet-1.0-beta24/matlab/vl_setupnn ;                          % Run required MatConvNet Setup

	%--------------------------------------------------------------
	% Initialize Parameters
	%--------------------------------------------------------------

	opts.train = struct();                                                          % Initalize SimgpleNN
	opts.train.gpus = [1];                                                          % Training with GPU, so setting to [1], otherwise set to []
	opts.train.continue = true;                                                     % Keep true, so will continue every epoch
	opts.expDir = 'epoch_data';                                                     % Folder used for storing each epoch

	% opts.whitenData = true ;                                                      % Optional parameter for whitening data
	% opts.contrastNormalization = true ;                                           % Optional parameter for contrast normalization

	% -------------------------------------------------------------------------
	% Prepare model and data
	% -------------------------------------------------------------------------

	load('imdb_cifar.mat');                                                         % Load in Cifar Data or IMDB
	net.meta.classes.name = imdb.meta.classes(:)';                                  % Assign class names from IMDB

	% -------------------------------------------------------------------------
	% Create CNN Architecture and Train
	% -------------------------------------------------------------------------

	net.layers = {};                                                                % Initalize CNN Architecture, layers

	net.meta.inputSize = [32 32 3];                                                 % Assign data input size, cifar sample is [32, 32, 3]
	net.meta.trainOpts.learningRate = 1e-3 ;                                        % Assign learning rate
	net.meta.trainOpts.weightDecay = 3e-4 ;                                         % Assign weight decay
	net.meta.trainOpts.batchSize = 100 ;                                            % Assign batch size
	net.meta.trainOpts.momentum = 0.9;                                              % Assign momentum
	net.meta.trainOpts.numEpochs = 150 ;                                            % Assign number of epochs

	opts.batchNormalization = true ;                                                % Set true for batch normalization

	% Block 1
	net.layers{end+1} = struct('type', 'conv', 'weights', {{0.05*randn(5,5,3,32, 'single'), randn(1, 32, 'single')}}, 'stride', [1 1], 'pad', [0 0 0 0]) ;
	net.layers{end+1} = struct('type', 'relu') ;
	net.layers{end+1} = struct('type', 'pool', 'method', 'max', 'pool', [2 2], 'stride', [2 2], 'pad', [0 0 0 0]);

	% Block 2
	net.layers{end+1} = struct('type', 'conv', 'weights', {{0.05*randn(5,5,32,32, 'single'), randn(1, 32, 'single')}}, 'stride', [1 1], 'pad', [0 0 0 0]) ;
	net.layers{end+1} = struct('type', 'relu') ;
	net.layers{end+1} = struct('type', 'pool', 'method', 'avg', 'pool', [2 2], 'stride', [2 2], 'pad', [0 0 0 0]);

	% Block 3
	net.layers{end+1} = struct('type', 'conv', 'weights', {{0.05*randn(5,5,32,64, 'single'), randn(1,64,'single')}}, 'stride', [1 1], 'pad', [0 0 0 0]) ;
	net.layers{end+1} = struct('type', 'relu') ;

	% Block Multi-Layer-Perceptron
	net.layers{end+1} = struct('type', 'conv', 'weights', {{0.05*randn(1,1,64,10, 'single'), zeros(1,10,'single')}}, 'stride', [1 1], 'pad', [0 0 0 0]) ;

	% Loss layer
	net.layers{end+1} = struct('type', 'softmaxloss') ;


	% optionally switch to batch normalization
	if opts.batchNormalization                                                                    % Be sure that this flag is set to true
	  net = insertBnorm(net, 1) ;                                                                 % Insert Batch Norm after each convolutional layer
	  net = insertBnorm(net, 5) ;                                                                 % Keep in mind the indexes of the layers will change every time we add a new layer
	  net = insertBnorm(net, 9) ;
	end

	% -------------------------------------------------------------------------
	% Call CNN_Train Function
	% -------------------------------------------------------------------------

	trainfn = @cnn_train ;

	[net, info] = trainfn(net, imdb, getBatch(opts), 'expDir', opts.expDir, net.meta.trainOpts, opts.train, 'val', find(imdb.images.set == 2));

	net = TidySimpleCNN(net);

	fprintf('\n***** CNN.mat has been created! *****\n');
	save('CNN.mat', 'net', '-v7.3');                                                              % Saving CNN as .mat file


This function adds a batch normalization layer to the CNN architecture. It takes the net in process of being created and it adds a layer index of the convolutional layers. Note that each time a batch normalization is added, the indexing for the CNN architecture is adjusted.


	function net = insertBnorm(net, l)

	    assert(isfield(net.layers{l}, 'weights'));                                                % Check to make sure the input is conv layer
	    ndim = size(net.layers{l}.weights{1}, 4);                                                 % Find size of filters
	    layer = struct('type', 'bnorm', ...                                                       % Batch Norm Layer format
	                   'weights', {{ones(ndim, 1, 'single'), zeros(ndim, 1, 'single')}}, ...
	                   'learningRate', [1 1 0.05], ...
	                   'weightDecay', [0 0]) ;
	 %   net.layers{l}.weights{2} = []                                                            % Removed due to biases issue with MergeBatchNorm()
	    net.layers = horzcat(net.layers(1:l), layer, net.layers(l+1:end)) ;                       % Formatting

	end

Next, we can view how the data dimensionality is reduced throughout the network and the addition of batch normalization using the vl_simplenn_display() command. Note, the last convolutional layer (Classification Layer) does not need to be batch normalized.


 vl_simplenn_display(net)


     layer|    0|     1|     2|     3|     4|     5|     6|     7|     8|     9|     10|     11|     12|     13|
      type|input|  conv| bnorm|  relu| mpool|  conv| bnorm|  relu| apool|  conv|  bnorm|   relu|   conv|softmxl|
      name|  n/a|layer1|layer2|layer3|layer4|layer5|layer6|layer7|layer8|layer9|layer10|layer11|layer12|layer13|
----------|-----|------|------|------|------|------|------|------|------|------|-------|-------|-------|-------|
   support|  n/a|     5|     1|     1|     2|     5|     1|     1|     2|     5|      1|      1|      1|      1|
  filt dim|  n/a|     3|   n/a|   n/a|   n/a|    32|   n/a|   n/a|   n/a|    32|    n/a|    n/a|     64|    n/a|
filt dilat|  n/a|     1|   n/a|   n/a|   n/a|     1|   n/a|   n/a|   n/a|     1|    n/a|    n/a|      1|    n/a|
 num filts|  n/a|    32|   n/a|   n/a|   n/a|    32|   n/a|   n/a|   n/a|    64|    n/a|    n/a|     10|    n/a|
    stride|  n/a|     1|     1|     1|     2|     1|     1|     1|     2|     1|      1|      1|      1|      1|
       pad|  n/a|     0|     0|     0|     0|     0|     0|     0|     0|     0|      0|      0|      0|      0|
----------|-----|------|------|------|------|------|------|------|------|------|-------|-------|-------|-------|
   rf size|  n/a|     5|     5|     5|     6|    14|    14|    14|    16|    32|     32|     32|     32|     32|
 rf offset|  n/a|     3|     3|     3|   3.5|   7.5|   7.5|   7.5|   8.5|  16.5|   16.5|   16.5|   16.5|   16.5|
 rf stride|  n/a|     1|     1|     1|     2|     2|     2|     2|     4|     4|      4|      4|      4|      4|
----------|-----|------|------|------|------|------|------|------|------|------|-------|-------|-------|-------|
 data size|   32|    28|    28|    28|    14|    10|    10|    10|     5|     1|      1|      1|      1|      1|
data depth|    3|    32|    32|    32|    32|    32|    32|    32|    32|    64|     64|     64|     10|      1|
  data num|  100|   100|   100|   100|   100|   100|   100|   100|   100|   100|    100|    100|    100|      1|
----------|-----|------|------|------|------|------|------|------|------|------|-------|-------|-------|-------|
  data mem|  1MB|  10MB|  10MB|  10MB|   2MB|   1MB|   1MB|   1MB| 312KB|  25KB|   25KB|   25KB|    4KB|     4B|
 param mem|  n/a|  10KB|  512B|    0B|    0B| 100KB|  512B|    0B|    0B| 200KB|    1KB|     0B|    3KB|     0B|


The following functions prepare our trained CNN for testing. For testing, due to how MatConvNet handles batch normalization, it needs to be bypassed. The same goes for dropout (if included). To evaluate their generated code, see /*Insert-MatconvNet-Version*/examples/imagenet/cnn_imagenet_deploy. It should be noted that invoking these functions will help the batch normalized CNN have unique outputs from each testing sample. These functions are not needed if batch normalization isn't included in the CNN architecture.


	%----------------------------------------------------------------------------
	% Prepare CNN for Testing that includes: Batch Normalization
	% - For DagNN code, see cnn_imagenet_deploy file found in
	% /MatconvNet/examples/imagenet or in this folder
	%----------------------------------------------------------------------------

	function net = TidySimpleCNN(net)

	    net = simpleRemoveLayersOfType(net, 'softmaxloss') ;                                            % First, remove the softmaxloss, only useful for training
	    net = simpleRemoveLayersOfType(net, 'dropout') ;                                                % Remove dropout if used as well, only useful for training

	    net.layers{end+1} = struct('name', 'prob', 'type', 'softmax') ;                                 % For Testing, a softmax layer is more desirable

	    net = simpleMergeBatchNorm(net) ;                                                               % Function to evaluate Batch Norm
	    net = simpleRemoveLayersOfType(net, 'bnorm') ;                                                  % Remove the Batch Norm Layers

	    net = simpleRemoveMomentum(net) ;                                                               % Remove Momentum

	    for l = simpleFindLayersOfType(net, 'conv')                                                     % Find Conv Layers
	        net.layers{l}.opts = removeCuDNNMemoryLimit(net.layers{l}.opts) ;                           % Remove Memory Limit
	    end

	end

	%----------------------------------------------------------------------------
	% Function to remove memory limit, GPU
	%----------------------------------------------------------------------------

	function opts = removeCuDNNMemoryLimit(opts)

	    remove = false(1, numel(opts)) ;
	    for i = 1:numel(opts)
	        if isstr(opts{i}) && strcmp(lower(opts{i}), 'CudnnWorkspaceLimit')
	        remove([i i+1]) = true ;
	        end
	    end
	    opts = opts(~remove) ;
	end

	%----------------------------------------------------------------------------
	% Function to remove Momentum, SimpleNN
	%----------------------------------------------------------------------------

	function net = simpleRemoveMomentum(net)
	    for l = 1:numel(net.layers)
	        if isfield(net.layers{l}, 'momentum')
	        net.layers{l} = rmfield(net.layers{l}, 'momentum') ;
	        end
	    end
	end

	%----------------------------------------------------------------------------
	% Function to remove Momentum, SimpleNN
	%------------------------------------------------------------------------

	function layers = simpleFindLayersOfType(net, type)
	    layers = find(cellfun(@(x)strcmp(x.type, type), net.layers)) ;
	end

	%----------------------------------------------------------------------------
	% Function to find layers , SimpleNN
	%------------------------------------------------------------------------

	function net = simpleRemoveLayersOfType(net, type)
	    layers = simpleFindLayersOfType(net, type) ;
	    net.layers(layers) = [] ;
	end

	function net = simpleMergeBatchNorm(net)
	    for l = 1:numel(net.layers)
	        if strcmp(net.layers{l}.type, 'bnorm')
	            if ~strcmp(net.layers{l-1}.type, 'conv')
	            error('Batch normalization cannot be merged as it is not preceded by a conv layer.') ;
	            end
	            [filters, biases] = mergeBatchNorm(...
	            net.layers{l-1}.weights{1}, ...
	            net.layers{l-1}.weights{2}, ...
	            net.layers{l}.weights{1}, ...
	            net.layers{l}.weights{2}, ...
	            net.layers{l}.weights{3}) ;
	            net.layers{l-1}.weights = {filters, biases} ;
	        end
	    end
	end

	function [filters, biases] = mergeBatchNorm(filters, biases, multipliers, offsets, moments)
	    % wk / sqrt(sigmak^2 + eps)
	    % bk - wk muk / sqrt(sigmak^2 + eps)
	    a = multipliers(:) ./ moments(:,2) ;
	    b = offsets(:) - moments(:,1) .* a ;

	    %biases(:) = biases(:) + b(:) ;
	    biases(:) = a(:) .* biases(:) + b(:);                                                           % Personal Modification
	    sz = size(filters) ;
	    numFilters = sz(4) ;
	    filters = reshape(bsxfun(@times, reshape(filters, [], numFilters), a'), sz) ;
	end


During the training phase of the CNN, the SimpleNN network produces three plots (Top1 error, Top5 error, and objective) for each successful epoch. The top1 error is the chance that class with the highest probability is the true correct target. In other words, the CNN guesses the target correctly. The top5 error is the chance that the true target is one of the five top probabilities. Objective (unlike objective in the DagNN network) is the energy of the network vs training epochs. The objective for the SimpleNN network should mirror the form of the top1 and top5 error. In all the plots, the training error is represented by blue and the validation error is represented by orange. LEFT: Output Batch Normalization, RIGHT: Output No Batch Normalization. Observing these graphs, one can see that incorporating batch normalization not only reduces top1error and top5error, but it does it in fewer epochs.

Part 2: Testing a Batch Normalized Network

Begin by loading the database and the trained SimpleNN network into memory. Assign the class names from the database to the trained CNN. Switch trained CNN into testing mode.


	%--------------------------------------------------------------------
	% Loading created CNN and IMDB
	%--------------------------------------------------------------------

	load('imdb_cifar.mat');                                                                             % Load IMDB
	load('CNN.mat');                                                                                    % Load in CNN
	                                                                                                    % Train with softmaxlog, but evaluate as softmax
	net.meta.classes.name = imdb.meta.classes(:)' ;                                                     % Give CNN the needed class descriptions for testing
	net.mode = 'test';                                                                                  % CNN into test mode
	net = vl_simplenn_tidy(net) ;                                                                       % Trained Cifar CNN is a simpleNN, thus treat it as such


Lastly, use the database as testing data. We are using the database for testing values here, but one can also test with the testing values provided from the helperCIFAR10Data() function above.


%--------------------------------------------------------------------
% Loading IMDB test data
%--------------------------------------------------------------------

i = 1;
while(1)                                                                                            % Run until satisfied

    im = imdb.images.data(:,:,:,i);                                                                 % Initalize current sample from IMDB (Zero mean + single precision)
    orig = imdb.images.original(:,:,:,i);                                                           % Load in corresponding image to go with sample (For viewing)
    i = i + 1;                                                                                      % Next Sample

    res = vl_simplenn(net, im);                                                                     % Run test sample through CNN
    scores = squeeze(gather(res(end).x));                                                           % Gather all scores

    % show the classification results
    [bestScore, best] = max(scores);                                                                % Acquire best score and assocaited class name
    figure(1) ; clf ; imagesc(orig);                                                                % Plot Results from sample, use original image for visual assessment
    title(sprintf('%s (%d), score %.3f', net.meta.classes.name{best}, best, bestScore));
    output = net.meta.classes.name{best};                                                           % Outputs associated class name of best score
    keyboard;

 end


The following are example classification outputs from the SimpleNN network on the Cifar10 data. Due to the [32 x 32] size of the Cifar images, the images look as expected.