In this tutorial, we demonstrate how to manually create a Convolutional Neural Network using the library, Tensorflow. The network will be trained using the MNIST dataset. Before we continue, it is to note that Google has many wonderful tutorials using Tensorflow (including a great MNIST example), so please make sure you've at least looked into the basics of Tensorflow (Node, Graph, Session, Placeholder, One Hot Vector, etc), and Tensorflow Visualization (TensorBoard): Getting Started Tensorflow, Google (link), MNIST Tensorflow, Google (link), Cifar Tensorflow, Google (link), and TensoBoard Visualization, Google (link).

Part 1: Preparing Data, Tensorflow Formatting

Before we get started, first import the Tensorflow library. Afterwards, we need to set up our training and testing data. We can do this by importing the already arranged MNIST dataset, and then loading it into python. This command loads the training data, testing data, and their corresponding labels into "One Hot Vector" format.


import tensorflow as tf                                                         # First, import tensorflow activate activation
from tensorflow.examples.tutorials.mnist import input_data                      # Data is conveniently stored, ready for training
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)                   # Load in MNIST data, "one hot vector" format for labels

Next, we can go ahead and initalize our Interactive Session.


#---------------------------------------------------------------------
# InteractiveSession
#   - One can run variables without having to refer to session object
#   - Essentially the same as a normal Session (Execution of a graph)
#   - https://www.tensorflow.org/api_docs/python/tf/InteractiveSession
#---------------------------------------------------------------------

sess = tf.InteractiveSession()                                                  # Creating InteractiveSession Node

Now, we initialize the properties of our input data (e.g MNIST Dataset), by creating placeholder nodes on our computational graph. Each sample of the MNIST dataset is [28 x 28 x 1]. The flattened transform of each sample is 28 * 28 * 1= 784. There are 10 labels of the MNIST data (0-9). By creating placeholders, each new batch will be able to pass their values to the corresponding placeholders: x(data), y_(actual labels).


#---------------------------------------------------------------------
# Create Placeholder Nodes, Reshape Input Sample
#   - Placeholder being a promise to fill its value later
#   - Note: Nodes exist on a Computational Graph (Tensorflow Basics)
#---------------------------------------------------------------------

x = tf.placeholder(tf.float32, shape=[None, 784])                               # None: *any* batch-size; 784 pixels: single flattened 28 x 28 sample
x_image = tf.reshape(x, [-1, 28, 28, 1])                                        # Reshape input image into 4D Tensor [1, Height, Width, Depth]
y_ = tf.placeholder(tf.float32, shape=[None, 10])                               # Initialze node for target output classes, 2 classes specified

Part 2: Create Convolutional Neural Network (CNN)

Next, we can initialize the architecture of our Convolutional Neural Network on our existing graph. In this example, each "block" can be thought of as a series of convolutional and relu layers followed by a max pooling layer. After the blocks, the data will be passed through a fully-connected layer and then a dropout layer (To reduce overfitting). Lastly the data will be finalized through a softmax regression layer.


#---------------------------------------------------------------------
# Create CNN Architecture
#   - Conv Filter: [Height, Width, Depth, Neurons]
#   - Dropout is used in training and turned off for testing
#---------------------------------------------------------------------

# Block 1
conv1 = conv_and_relu(x_image, [5, 5, 1, 32])                                   # Apply Convoluitonal layer and ReLU layer
pool1 = max_pool_2x2(conv1)                                                     # Apply max_pooling layer after ReLU input

# Block 2
conv2 = conv_and_relu(pool1, [5, 5, 32, 64])                                    # Apply Convoluitonal layer and ReLU layer
pool2 = max_pool_2x2(conv2)                                                     # Pooling Layer after ReLU, keep track of data dimensionality

# Fully-Connected Layer (MLP)
fc = fully_connected(pool2, [7 * 7 * 64, 1024])                                 # Weights, Fully-Connected layer. 1024 Neurons/feature maps.

# Dropout Layer - Reduce Overfitting, Scaling handeled automatically
keep_prob = tf.placeholder(tf.float32)                                          # Stores probability that neurons output is kept during dropout.
fc_drop = tf.nn.dropout(fc, keep_prob)                                          # The placeholder allows us to train with dropout and test without it

# Readout Layer (Network Conclusion)
weights, biases = weight_and_bias([1024,10])                                    # Weights & Bias, Convolutional Filter [Height, Width, Depth, Feature Maps]
y_conv = tf.matmul(fc_drop, weights) + biases                                   # Softmax Regression Layer


Before we continue, lets look at the details behind each layer of the CNN. We will start with the Convolutional and ReLU layer. The Convolutional layer is quickly defined function below, requiring an input (New batch, or Previous Layer Output), and a Convolutional filter [Height, Width, Depth, Neurons]. Once the requirements are met, the weights and bias are created for the layer using the Convolutional filter. Next, we can invoke Tensorflow's Convolution method by passing in our input, weights, stride, and a chosen type of padding (SAME: ceil(float(input - filter + 2( (input - 1)/2 ) + 1)/float(stride)); VALID: ceil(float(input - filter + 2(0) + 1)/float(stride)) ). Afterwards, we can invoke Tensorflow's ReLU method by passing in the recently created Convolutional Layer. It is to note that functions covered here and below will be how one invokes the corresponding layers (Conv, ReLU, Max Pooling, Fully-Connected, etc), but not the details of how they are programmatically defined.


#--------------------------------------------------------------------------
# Define Convolutional and Pooling Layers
#   - Initialize weights with noise to prevent 0 gradients
#   - Initialize slightly positive bias to avoid "dead neurons" (B/c RELU)
#   - Stride -> Batch, Height, Width, Channels
#--------------------------------------------------------------------------

def conv_and_relu(x, conv_filter, name = 'Conv and ReLU'):                      # Function, design convolutional layer. Needs 4D input, filter tensors
    weights, biases = weight_and_bias(conv_filter)                              # Weights & Bias, Convolutional Filter [Height, Width, Depth, Feature Maps]
    conv = tf.nn.conv2d(x, weights, strides=[1, 1, 1, 1], padding='SAME')       # Convolutional Layer
    relu = tf.nn.relu(conv + biases)                                            # ReLU layer: Takes in output of Convolutional Layer, ReLU(w*x + b)
    return relu                                                                 # Return ReLU layer

For the Convolutional layer, the weights are initialized with a small amount of noise for symmetry breaking. The bias will be initialized slightly positive to avoid "dead neurons", due to using ReLU.


#--------------------------------------------------------------------------
# Define Weights and Bias
#   - Initialize slightly positive bias to avoid "dead neurons" (B/c RELU)
#   - Weights: normal distribution, specified mean and standard deviation
#       - |Values| > 2 * stddev's from the mean are dropped and re-picked.
#--------------------------------------------------------------------------

def weight_and_bias(myFilter):                                                  # Function, Initalize weights given shape from placeholder
    initial_w = tf.truncated_normal(myFilter, stddev=0.1)                       # Normal distribution Node, specified standard deviation value
    if(len(myFilter) == 2):                                                     # When len(shape) == 2, at end of CNN network
        initial_b = tf.constant(0.1, shape = [myFilter[1]])                     # Assign second element of filter to bias instead of last
    else:
        initial_b = tf.constant(0.1, shape = [myFilter[3]])                     # Positive value, shape being idcential the arugment.
    return tf.Variable(initial_w), tf.Variable(initial_b)                       # Return weights as Variable (So they can be updated via training)

Next, we can look at the max Pooling layer. This network uses standard [2 x 2] max pooling with stride [2 x 2]. Note that neurons and channels are unaffected by max pooling.


#--------------------------------------------------------------------------
# Define Max Pooling Layer
#   - Stride -> Neurons, Height, Width, Channels
#--------------------------------------------------------------------------

def max_pool_2x2(x, name = 'Max Pooling'):                                      # Function, design max pooling layer, need 4D input
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],                                # Specify pooling filters, stride, padding.
                        strides=[1, 2, 2, 1], padding='SAME')

Lastly, we can look at the fully connected layer. At this point, the image has been reduced spatially to a [7 x 7], so to cover the entire image, a filter of size [7 x 7] will be utilized. It is to note that the filter passed into this function already has its input(Height, Width, Depth) flattened ([7 * 7 * 64, 1024]).


#--------------------------------------------------------------------------
# Define Fully-Connected Layer
#   - Stride -> Batch, Height, Width, Channels
#--------------------------------------------------------------------------

def fully_connected(x, conv_filter, name = 'Fully Connected'):                  # Function, design fully connected layer
    weights, biases = weight_and_bias(conv_filter)                              # Weights & Bias, Convolutional Filter [Height, Width, Depth, Feature Maps]
    x_flat = tf.reshape(x, [-1, 7*7*64])                                        # Reshape pooling layer into batch of vectors. Flatten!
    fc = tf.nn.relu(tf.matmul(x_flat, weights) + biases)                        # Take ReLU. ** Note: matmul used instead of Convolution **
    return fc                                                                   # Return Fully Connected Layer

Now, what if one wants to confirm the current input data size after passing through N layers of the Convolutional Neural Network? One can easily print by directly printing the shape of each layer (python numpy library). It is to note that since SAME padding is being used with a Stride of 1, the spatial properties (Height, Width) of the input data will only change after each max pooling layer. However, one can print out all layer shapes.


print(conv1.shape)
print(pool1.shape)
print(conv2.shape)
print(pool2.shape)
print(fc.shape)
print(fc_drop.shape)
print(y_conv.shape)

Part 3: Training and Testing the CNN

After the network architecture has been created, we must include how to train our CNN on our existing graph. To do this, we must first implement a way to determine the loss of our model. Here, we will use the popular concept known as cross_entropy.


cross_entropy = tf.reduce_mean(                                                 # Implement Cross_Entropy to compute the softmax activation
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))          # Cross Entropy: True Output Labels (y_), Softmax output (y_conv)

Next, an optimization tool needs to be implemented to minimize the loss of our network. A popular option is Gradient Descent, but here we will use the ADAM optimizer. For most Tensorflow optimizers, we pass in the learning rate as an argument to the optimization tool. Here we use a learning rate of 1e-4.


train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)               # Train Network, Tensorflow minimizes cross_entropy via ADAM Optimization instead of SGD

Next, we need to find a way to evaluate how well our CNN performs. To do this, we will check each prediction and use that to calculate the overall accuracy of the model.


correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))           # Check if prediction is wrong with tf.equal(CNN_result,True_result)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))              # Find the percent accuracy, take mean of correct_prediction output

Lastly, we begin our session and initialize all of our Tensorflow variables. We then specify how long we want to train for in iterations instead of epochs. Each iteration, a new mini-batch is pushed through the network containing new data and their associated labels. After all mini-batches, a single epoch is complete.


with tf.Session() as sess:                                                      # Make sure session is initalized
    sess.run(tf.global_variables_initializer())                                 # Initalize all Tensorflow Variables
    for i in range(20000):                                                      # 20,000 (default) training iterations, these are NOT epochs.
        batch = mnist.train.next_batch(50)                                      # Specify Batch size here, default is 50
        if i % 100 == 0:                                                        # Test Progress ever 100 iterations
            train_accuracy = accuracy.eval(feed_dict={                          # Evaluate CNN
                x: batch[0], y_: batch[1], keep_prob: 1.0})
            print('step %d, training accuracy %g' % (i, train_accuracy))        # Display Evaluation Results while training
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})   # Feed Batch into the networkw


Lastly, after training we can test trained network with a simple line of code. The final testing accuracy should be around 99%.


print('test accuracy %g' % accuracy.eval(feed_dict={
		x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))      	# Print overalll loss


Below is our finalized code



#--------------------------------------------------------------------------
# Example of MNIST CNN
#--------------------------------------------------------------------------

import tensorflow as tf                                                         # First, import tensorflow activate activation
from tensorflow.examples.tutorials.mnist import input_data                      # Data is conveniently stored, ready for training

#--------------------------------------------------------------------------
# Define Weights and Bias
#   - Initialize slightly positive bias to avoid "dead neurons" (B/c RELU)
#   - Weights: normal distribution, specified mean and standard deviation
#       - |Values| > 2 * stddev's from the mean are dropped and re-picked.
#--------------------------------------------------------------------------

def weight_and_bias(myFilter):                                                  # Function, Initalize weights given shape from placeholder
    initial_w = tf.truncated_normal(myFilter, stddev=0.1)                       # Normal distribution Node, specified standard deviation value
    if(len(myFilter) == 2):                                                     # When len(shape) == 2, at end of CNN network
        initial_b = tf.constant(0.1, shape = [myFilter[1]])                     # Assign second element of filter to bias instead of last
    else:
        initial_b = tf.constant(0.1, shape = [myFilter[3]])                     # Positive value, shape being idcential the arugment.
    return tf.Variable(initial_w), tf.Variable(initial_b)                       # Return weights as Variable (So they can be updated via training)

#--------------------------------------------------------------------------
# Define Convolutional and Pooling Layers
#   - Initialize weights with noise to prevent 0 gradients
#   - Initialize slightly positive bias to avoid "dead neurons" (B/c RELU)
#   - Default Conv format: NHWC -> Batch Size, Height, Width, Channels
#   - Default Pool format: NHWC
#--------------------------------------------------------------------------

def conv_and_relu(x, conv_filter, name = 'Conv and ReLU'):                      # Function, design convolutional layer. Needs 4D input, filter tensors
    weights, biases = weight_and_bias(conv_filter)                              # Weights & Bias, Convolutional Filter [Height, Width, Depth, Feature Maps]
    conv = tf.nn.conv2d(x, weights, strides=[1, 1, 1, 1], padding='SAME')       # Convolutional Layer
    relu = tf.nn.relu(conv + biases)                                            # ReLU layer: Takes in output of Convolutional Layer, ReLU(w*x + b)
    return relu                                                                 # Return ReLU layer

def max_pool_2x2(x, name = 'Max Pooling'):                                      # Function, design max pooling layer, need 4D input
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],                                # Specify pooling filters, stride, padding.
                            strides=[1, 2, 2, 1], padding='SAME')

def fully_connected(x, conv_filter, name = 'Fully Connected'):                  # Function, design fully connected layer
    weights, biases = weight_and_bias(conv_filter)                              # Weights & Bias, Convolutional Filter [Height, Width, Depth, Feature Maps]
    x_flat = tf.reshape(x, [-1, 7*7*64])                                        # Reshape pooling layer into batch of vectors. Flatten!
    fc = tf.nn.relu(tf.matmul(x_flat, weights) + biases)                        # Take ReLU. ** Note: matmul used instead of Convolution **
    return fc                                                                   # Return Fully Connected Layer

def main():

    mnist = input_data.read_data_sets('MNIST_data', one_hot=True)               # Load in MNIST data, "one hot vector" format for labels
    sess = tf.InteractiveSession()                                              # Creating InteractiveSession Node

    #---------------------------------------------------------------------
    # Create Placeholders, Reshape Input Sample
    #   - Placeholder being a promise to fill its value later
    #---------------------------------------------------------------------

    x = tf.placeholder(tf.float32, shape=[None, 784])                           # None: *any* batch-size; 784 pixels: single flattened 28 x 28 sample
    x_image = tf.reshape(x, [-1, 28, 28, 1])                                    # Reshape input image into 4D Tensor [1, H, W, D]
    y_ = tf.placeholder(tf.float32, shape=[None, 10])                           # Initialze node for target output classes, 2 classes specified

    #---------------------------------------------------------------------
    # Create CNN Architecture
    #   - Dropout is used in training and turned off for testing
    #---------------------------------------------------------------------

    # Block 1
    conv1 = conv_and_relu(x_image, [5, 5, 1, 32])                               # Apply Convoluitonal layer and ReLU layer
    pool1 = max_pool_2x2(conv1)                                                 # Apply max_pooling layer after ReLU input

    # Block 2
    conv2 = conv_and_relu(pool1, [5, 5, 32, 64])                                # Apply Convoluitonal layer and ReLU layer
    pool2 = max_pool_2x2(conv2)                                                 # Pooling Layer after ReLU, keep track of data dimensionality

    # Fully-Connected Layer (MLP)
    fc = fully_connected(pool2, [7 * 7 * 64, 1024])                             # Weights, Fully-Connected layer. 1024 Neurons/feature maps.

    # Dropout Layer - Reduce Overfitting, Scaling handeled automatically
    keep_prob = tf.placeholder(tf.float32)                                      # Stores probability that neurons output is kept during dropout.
    fc_drop = tf.nn.dropout(fc, keep_prob)                                      # The placeholder allows us to train with dropout and test without it

    # Readout Layer (Network Conclusion)
    weights, biases = weight_and_bias([1024,10])                                # Weights & Bias, Convolutional Filter [Height, Width, Depth, Feature Maps]
    y_conv = tf.matmul(fc_drop, weights) + biases                               # Softmax Regression Layer

    print(conv1.shape)
    print(pool1.shape)
    print(conv2.shape)
    print(pool2.shape)
    print(fc.shape)
    print(fc_drop.shape)
    print(y_conv.shape)

    #---------------------------------------------------------------------
    # Training and Testing Convolutional Neural Network
    #   - Learning rate is set to 1e-4 in "Train_step"
    #   - Tensorflow offers MANY Optimization techniques: ADAM, SGD, etc
    #   - tf.argmax() returns index of highest entry. **One hot Vector**
    #       - tf.argmax(y_conv) - prediction output from model
    #       - tf.argmax(y_) - True Output label
    #   - keep_prob and feed_dict are used to control dropout for testing
    #---------------------------------------------------------------------

    cross_entropy = tf.reduce_mean(                                             # Implement Cross_Entropy to compute the softmax activation
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))          # Cross Entropy: True Output Labels (y_), Softmax output (y_conv)

    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)           # Train Network, Tensorflow minimizes cross_entropy via ADAM Optimization instead of SGD

    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))       # Check if prediction is wrong with tf.equal(CNN_result,True_result)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))          # Find the percent accuracy, take mean of correct_prediction output

    with tf.Session() as sess:                                                  # Make sure session is initalized

        sess.run(tf.global_variables_initializer())                             # Initalize all Tensorflow Variables

        for i in range(100):                                                    # 20,000 (default) training iterations... epochs?
            batch = mnist.train.next_batch(50)                                  # Specify Batch size here, default is 50

            if i % 5 == 0:                                                      # Test Progress ever 100 epochs
                train_accuracy = accuracy.eval(feed_dict={                      # Evaluate CNN
                    x: batch[0], y_: batch[1], keep_prob: 1.0})
                print('step %d, training accuracy %g' % (i, train_accuracy))    # Display Evaluation Results while training

            train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) # Feed Batch into the network

        print('test accuracy %g' % accuracy.eval(feed_dict={
            x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))      # Print overalll loss

main()