Getting Started with Deep MNIST and TensorFlow on iOS

November 25, 2016

In this article, we’ll walk through getting TensorFlow, Google’s machine learning library, set up to perform inference directly on an iOS device. We’ll work with the MNIST dataset of handwritten digits.

Installing (or Upgrading) TensorFlow

The binary distribution of TensorFlow for macOS does not include the iOS static library or some of the scripts we’ll need, so we’ll have to build both ourselves in a later section. Since it helps if the version of the iOS library is the same as the version of TensorFlow installed on our system, we’ll re-install the latest version of TensorFlow from source (even if it’s already installed). This can save us some headaches down the road.

At the time of this writing, I ended up building TensorFlow 0.12 from the tip-of-tree sources on GitHub as the latest development version (0.12) fixes some compatibility issues with macOS 10.12.

You can build TensorFlow without GPU support since neither OpenCL or CUDA are supported on iOS. When running the configure script, you can also disable Google Cloud Platform support. If you already have TensorFlow installed and run into installation issues, you can force an upgrade by installing the newly-built pip package with sudo pip install -U ....

Writing the Training Script

This article builds on top of the official Deep MNIST for Experts TensorFlow tutorial, which we’ll extend such that we can save and re-use the trained model for inference in an iOS app. For reference, this is the script we’re left with after completing the tutorial (slightly re-formatted for clarity):

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
session = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

x_image = tf.reshape(x, [-1,28,28,1])

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.initialize_all_variables()
session.run(init)

for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i % 100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
      x:batch[0], y_: batch[1], keep_prob: 1.0
    })
    print("step %d, training accuracy %g" % (i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g" % accuracy.eval(feed_dict={
  x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0
}))

All work in TensorFlow is built up as a computational graph. This training script sets up a graph that takes a 28x28 image as input and runs it through two convolutional layers, each using a ReLU activation function and 2x2 max pooling. Further, we have two fully-connected layers that eventually leave us with a 10-unit output vector. We run this output vector through a softmax function and interpret the result as a probability distribution of likely digit classes (we have 10 outputs since we support the digits 0-9). Once the graph is set up, we minimize the loss function (cross-entropy in this case) with an ADAM optimizer and a learning rate of 0.0001. In each iteration, we train on batches of 50 images each and evaluate the accuracy on the training set. Once training is complete, we evaluate the final accuracy on the test set. If you train on 20,000 iterations, the final accuracy should be around 99%. For a more complete description of the code, please reference the original tutorial before continuing, if necessary.

Extending the Training Script

Now we’ll make some modifications to the training script. When using our trained model from our iOS project, we’ll have to reference the inputs and outputs by name. To make this easier, we’ll give our input placeholder x an explicit name:

x = tf.placeholder(tf.float32, shape=[None, 784], name="x")

We’ll also give the output softmax an explicit name:

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2, name="softmax")

These names are shorter than the default names TensorFlow provides (such as Softmax:0 in the latter case).

One issue with our graph as it stands is that Dropout operations are not supported on iOS. This is because Dropout is only useful during training, which normally would occur offline, not on an iOS device. Moreover, during inference, we always pass in a “keep probability” of 1 to the Dropout operation, effectively making it a no-op (none of the inputs end up getting zero-ed out).

Since our graph currently contains a Dropout operation, we won’t be able to load it on iOS. If we try it, we’ll get the following error:

Invalid argument: No OpKernel was registered to support Op 'RandomUniform' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

      [[Node: dropout/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0](dropout/Shape)]]

One potential solution involves maintaining two graphs: one with Dropout for training and one without Dropout for inference. Since this takes us a bit beyond the scope of this blog post and is just an implementation nuisance, we’ll remove Dropout altogether for now (the loss in accuracy won’t be significant). Our training script should now look something like this:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
session = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None, 784], name="x")
y_ = tf.placeholder(tf.float32, shape=[None, 10])

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

x_image = tf.reshape(x, [-1,28,28,1])

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1, W_fc2) + b_fc2, name="softmax")

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.initialize_all_variables()
session.run(init)

for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i % 100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
      x:batch[0], y_: batch[1]
    })
    print("step %d, training accuracy %g" % (i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

print("test accuracy %g" % accuracy.eval(feed_dict={
  x: mnist.test.images, y_: mnist.test.labels
}))

Now, we just have to export the graph and our learned parameters so we can load both of them in our iOS app. To export our learned parameters, we’ll first create a Saver object after all of our variables are declared:

saver = tf.train.Saver()

I put this line of code right before:

session.run(init)

After training is complete, we can save our learned parameters with the tf.train.Saver.save method:

saver.save(session, "model.ckpt")

Since our “checkpoint” of learned parameters does not know anything about the structure of our deep network, we’ll also have to save our graph with a call to tf.train.write_graph:

tf.train.write_graph(session.graph_def, '', 'graph.pb')

The final training script should look something like:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
session = tf.InteractiveSession()

x = tf.placeholder(tf.float32, shape=[None, 784], name="x")
y_ = tf.placeholder(tf.float32, shape=[None, 10])

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

x_image = tf.reshape(x, [-1,28,28,1])

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1, W_fc2) + b_fc2, name="softmax")

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.initialize_all_variables()
saver = tf.train.Saver()
session.run(init)

for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i % 100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
      x:batch[0], y_: batch[1]
    })
    print("step %d, training accuracy %g" % (i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

saver.save(session, "model.ckpt")
tf.train.write_graph(session.graph_def, '', 'graph.pb')

print("test accuracy %g" % accuracy.eval(feed_dict={
  x: mnist.test.images, y_: mnist.test.labels
}))

If you run the training script, you should see a test accuracy of around 98-99%, and your working directory should contain the graph.pb graph file and model.ckpt.* checkpoint files. If you’ve just installed (or upgraded) TensorFlow in the first section, this will also ensure your installation works correctly.

Optimizing the Graph

Before continuing, we’ll optimize the graph and checkpoint we exported for inference.

TensorFlow ships with a freeze_graph script that can merge our exported graph and checkpoint files, turning any variables we used during training into constants. This script is not installed by default, so we’ll have to build it from the sources we obtained in the first section.

bazel build tensorflow/python/tools:freeze_graph

We can now invoke it as such:

bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=/path/to/graph.pb --input_checkpoint=/path/to/model.ckpt --output_node_names=softmax --output_graph=/path/to/frozen.pb

TensorFlow also includes a optimize_for_inference script that can remove operations from the graph that are only needed for training (unfortunately, it doesn’t currently remove Dropout operations, but I’ve filed an issue tracking this).

Again, we’ll have to build the script from sources:

bazel build tensorflow/python/tools:optimize_for_inference

And we can invoke it on our frozen model as such:

bazel-bin/tensorflow/python/tools/optimize_for_inference --input=/path/to/frozen.pb --output=/path/to/final.pb --output_names=softmax --frozen_graph=True --input_names=x

Next, we’ll build the TensorFlow static library for iOS.

Building TensorFlow for iOS

From our directory of TensorFlow sources, we can simply run:

tensorflow/contrib/makefile/build_all_ios.sh

Running this script will place a TensorFlow static library in the tensorflow/contrib/makefile/gen/lib subdirectory. This library is compiled for ARMv7, ARM64, and x86 architectures, so you can use it in both the iOS simulator and on iOS hardware.

Setting up the Xcode Project

We’ll create a new Xcode project that uses our optimized graph and static library to perform inference directly on an iOS device. We’ll call the project MNIST and use Objective-C as the target language (we can’t use C++ APIs from Swift without writing C or Objective-C++ wrappers). Since we’ll have to tell our iOS target about TensorFlow’s sources, it’s easiest to save the project in a directory that is a sibling of the tensorflow directory.

Now, we have to tell the iOS target about the TensorFlow static library we built for iOS (and the protocol buffer library is relies on). To do this, add ../tensorflow/tensorflow/contrib/makefile/gen/lib and ../tensorflow/tensorflow/contrib/makefile/gen/protobuf_ios/lib to Library Search Paths. We’ll also add the following flags to Other Linker Flags:

-ltensorflow-core
-lprotobuf-lite
-lprotobuf
-force_load
../tensorflow/tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a

The force_load argument ensures that global C++ objects that register classes inside the library are not eliminated during optimization. The TensorFlow iOS Examples page includes more information about this and is also a good reference if you run into issues. Before we forget, we also have to link the target with the Accelerate framework.

Finally, we’ll set up our Header Search Paths to point to the following directories:

../tensorflow
../tensorflow/tensorflow/contrib/makefile/gen/proto
../tensorflow/tensorflow/contrib/makefile/downloads
../tensorflow/tensorflow/contrib/makefile/downloads/protobuf/src
../tensorflow/tensorflow/contrib/makefile/downloads/eigen

Now that the Xcode project is set up, we’ll load up our model and perform inference on the test images from the MNIST dataset.

Loading the Model

First, drag the final.pb graph we obtained in the third section into the Xcode project and ensure it gets copied as a bundle resource. The 10k MNIST test set can be obtained here. Be sure to download both the images and their labels. For simplicity, I’ll rename these files to images and labels, respectively. Again, we’ll drag both of them into the Xcode project and ensure they get copied as bundle resources.

Next, we’ll rename ViewController.m to ViewController.mm (Objective-C++) so we can use TensorFlow’s C++ library from it. At the top of the file, we’ll include the TensorFlow sources we need and import its namespace:

#include <tensorflow/core/public/session.h>
#include <tensorflow/core/platform/env.h>

using namespace tensorflow;

We’ll also add a new -test: method that will get invoked with a button pressed from the UI. Here, we’ll set up a new TensorFlow session and load our optimized model:

Session* session;
Status status = NewSession(SessionOptions(), &session);
if (!status.ok()) {
    std::cout << status.ToString() << "\n";
    return;
}

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"final" ofType:@"pb"];

GraphDef graph;
status = ReadBinaryProto(Env::Default(), modelPath.fileSystemRepresentation, &graph);
if (!status.ok()) {
    std::cout << status.ToString() << "\n";
    return;
}

status = session->Create(graph);
if (!status.ok()) {
    std::cout << status.ToString() << "\n";
    return;
}

// Write inference code here.

session->Close();

You can try running the project now. Even though it won’t do anything useful, you can make sure the model is being loaded correctly by looking for TensorFlow-related error messages in the debug console.

Performing Inference

Lastly, we run inference on our test data to see how well our model performs. As we already know from the second section, we should get around 98-99% accuracy.

We’ll first define a few useful constants for values such as the number of output classes and input image dimensions:

static constexpr int kUsedExamples = 5000;
static constexpr int kImageSide = 28;
static constexpr int kOutputs = 10;
static constexpr int kInputLength = kImageSide * kImageSide;

We’ll also only use 5,000 out of a total of 10,000 examples for testing. TensorFlow appears to be using more memory than necessary when first loading the data and this may cause the app to get terminated if we try to analyze too many images at once.

We can now load our test data into a Tensor x:

Tensor x(DT_FLOAT, TensorShape({ kUsedExamples, kInputLength }));

NSString *imagesPath = [[NSBundle mainBundle] pathForResource:@"images" ofType:nil];
NSString *labelsPath = [[NSBundle mainBundle] pathForResource:@"labels" ofType:nil];
NSData *imageData = [NSData dataWithContentsOfFile:imagesPath];
NSData *labelsData = [NSData dataWithContentsOfFile:labelsPath];

uint8_t *expectedLabels = new uint8_t[kUsedExamples];

for (auto exampleIndex = 0; exampleIndex < kUsedExamples; exampleIndex++) {
    // Actual labels start at offset 8.
    [labelsData getBytes:&expectedLabels[exampleIndex] range:NSMakeRange(8 + exampleIndex, 1)];

    for (auto i = 0; i < kInputLength; i++) {
        uint8_t pixel;
        // Actual image data starts at offset 16.
        [imageData getBytes:&pixel range:NSMakeRange(16 + exampleIndex * kInputLength + i, 1)];
        x.matrix<float>().operator()(exampleIndex, i) = pixel / 255.0f;
    }
}

std::vector<std::pair<string, Tensor>> inputs = {
    { "x", x }
};

This is fairly routine data processing code. We first load the images and labels data from our app bundle. Then we form a vector of expected labels and a matrix of input images, where each row corresponds to an image and each column is one of 28x28 grayscale pixels of data. Note that we normalize the ubytes, which range from 0-255, to a float32 ranging from 0-1 as this is the format TensorFlow expects. This matrix will get passed to the placeholder x we defined in our training script.

Now, we can actually run the computation graph by asking for the softmax (class probability distribution) of our test data:

const auto start = CACurrentMediaTime();

std::vector<Tensor> outputs;
status = session->Run(inputs, {"softmax"}, {}, &outputs);
if (!status.ok()) {
    std::cout << status.ToString() << "\n";
    return;
}

NSLog(@"Time: %g seconds", CACurrentMediaTime() - start);

Note we also time this operation to get a sense for how long inference takes. On an 2016 iPad Pro, this takes around 5.4 seconds and 2 CPU cores are being utilized. If you do the math, it takes about 1 ms to analyze each image. This number may be of interest to anyone using TensorFlow for classifying images in sliding window searches (for example, to perform OCR with TensorFlow, you may end up doing classification on thousands of small image patches taken from a scanned document).

Finally, let’s see how well our model performs:

const auto outputMatrix = outputs[0].matrix<float>();
int correctExamples = 0;

for (auto exampleIndex = 0; exampleIndex < kUsedExamples; exampleIndex++) {
    int bestIndex = -1;
    float bestProbability = 0;
    for (auto i = 0; i < kOutputs; i++) {
        const auto probability = outputMatrix(exampleIndex, i);
        if (probability > bestProbability) {
            bestProbability = probability;
            bestIndex = i;
        }
    }

    if (bestIndex == expectedLabels[exampleIndex]) {
        correctExamples++;
    }
}

NSLog(@"Accuracy: %f", static_cast<float>(correctExamples) / kUsedExamples);

Here, we simply obtain the output matrix and check how many of the predicted labels (effectively the argmax of the probability distribution) match the vector of expected labels. In my case, this is around 98.5% and this matches the result we obtained when running the training script.

Conclusion

And that’s what it takes to run TensorFlow on iOS! While there may have been quite a bit of setup work involved, most of it is something you’ll only have to do once per project. As you obtain better training data or come up with better networks, you can simply export a new new graph, optimize it, and drop it into your Xcode project for evaluation!

The test project can be found on GitHub. If you have any questions, feel free to reach out to me.

Matt Rajca