Formatting Issues : There should be Jupyter style cells within this post... if you don't see them you can refer directly to the notebook itself through the GitHub link.
In many computer vision problems; we want to classify images into classes, for example is the animal a cat, dog, hamster etc. In the world of CNNs a classifier looks at an image and outputs a series of probabilities for the various classes that it might be (you are then free to pick the highest probability and label the image as this class). This is different to an object detector; which has to locate the object as well as classify it.
Writing a basic CNN in keras is very simple and there are plenty of good tutorials or Jupyter notebooks that will show you how to do it. For simple tasks; a shallow CNN with only a few layers is enough. But for many tasks where classification is applied to feature rich objects (animals, people's faces for example) it is preferable to try out more complex (that is to say, deeper) networks. Deep networks run into many issues if implemented naively by just adding layers ad infinitum; and a lot of research has yielded techniques and tricks to solve these problems and allow deep networks to perform well on complex classification challenges.
So, why not make use of all this work? Good news, you can, because keras now ships with a lot of the state-of-the-art classifier architectures and you can just use them off the shelf. This tutorial aims to give you the code to do that easily.
The code is available on my GitHub, and the examples in this post follow the Jupyter notebook found there.
1. Load Some Data
The first step is to get some data to work with. Since this is a tutorial about the classifiers and not the data I will make use of another handy feature in keras; the in-built datasets.
You can see that the shape of image is printed, and they are each of shape (w,h,d) = (32,32,3). We can examine a few of the images using the plotting function found at the end of this tutorial
The first 100 images from the Cifar-10 training data
2. Choose Your Model and Construct it
Once you've chosen your model; most of them work the same way, you import a class which when you instantiate it creates a Model object. All the arguments are optional, but it is worth understanding a few of them now; we will focus on the ResNet model (for the others see the keras documentation)
Let's start by just running the default model (this will trigger some downloads if you have never instantiated the class before)
The summary provides a lot of very useful information (I have shortened it in the snippet above as ResNet has a lot of layers). You can see from the summary that the model is expecting images with shape(224,224,3), but to be sure let's check;
well, that makes sense, it is the same shape as the data used in the original paper. However, it does not match the data we have now.
We shouldn't be forced to use the original shape though, since the weights across the entire model don't depend at all on the shape of the input image except for the fully connected last layer/s.
We can replace the original input layer with one having a shape of our choosing using the arguments
Or explicitly,
This has done two things :
Remove the fully connected final layers, freeing the model weights of any dependency on input shape. You can check this yourself by looking at the .summary() method
Replace the input layer to expect images of the shape we want.
This leaves us with an incomplete model though; because the fully connected layer (which actually performs the classification) has been removed, leaving the model with an output tensor of shape (2, 2, 2048). We want an output shape (10) representing the probabilities for each of the 10 classes.
Normally the last convolutional layer is flattened with the Flatten() layer; we could do that here; but most people opt to use Global Pooling (i'll be honest, I'd have to look up exactly why); which takes each feature map (the 2048 is the number of feature maps) and finds the global maximum (GlobalMaxPooling) or average (GlobalAvePooling) across the feature map.
This can be done by hand, or just by passing the argument
into the model constructor.
Now the output of the model will be a vector of length 2048 (regardless of the original image size; this wouldn't be the case with a flatten operation which would have given us a vector of length 2x2x2048=8192). If the input image is very large then the Pooling will prevent an extremely large number of weights being required in any fully connected layers.
Right... we still need to reduce the 2048 long vector into 10. This is most simply done by putting back the fully connected layers (Dense layers in keras speak); it can be done with a single extra layer and a slight rearrangement of the code;
the Dense layer takes our number of classes (10) and the activation should be appropriate for creating mutually exclusive probabilities (softmax is the best).
A second, equivalent way to build the same model is
2.2 Transfer Learning
The idea of transfer learning is simple; you save the weights that you got training some datatset A, and re-use them when training dataset B which may be targeting a completely different problem.
The logic is that a network is trained to pick out certain features from very simple common features (lines and colours) as well as more abstract features. The common features are likely to be quite generic and be equally useful for different types of problems.
It's a detailed topic, but the main advantage to us right now of transfer learning is
Training is likely to be much faster as the network doesn't have to re-learn simple features.
Training on a limited dataset gets a "leg-up" where otherwise it would struggle to learn complex features from limited data.
fortunately; it is such a powerful and common practice that it's already implemented! You have to accept that the ImageNet dataset was the one chosen to train the weights on. I have used ImageNet weights on x-ray data with improvement in training time so it is true that there must be generic features in there somewhere.
Waffle waffle waffle... To use the pretrained ImageNet weights just set
Bear in mind that the first time you do this for each architecture an automatic download will start and will be around 100MB.
3. Train the Model
Section 2 covered creating the model; and from here the steps are the same as if you had built the model layer by layer yourself. But I will include the steps so this tutorial has a full end-to-end example.
First we compile the model, which is where we select the optimizer and the loss (be careful to pair the loss and the activation of the output layer correctly). The metrics which will be printed and stored in the model history are also selected and at a minimum should include 'accuracy'
Now we can fit the model
This will run the full training (10 epochs) and then return the dictionary "history" with the metrics for the training and validation data which can then be plotted up into your training curves. I have cut the training very short above as I was on a CPU-only machine and didn't want to wait.
Unfortunately if you stop the training before it completes the history dictionary won't be stored and you won't be able to plot the training curves. Let's now solve that problem...
4. Extra : Saving the Model via Checkpoints
There is a problem you will soon run into when choosing the number of epochs; if the training per epoch is slow and you choose a large number; you have to wait a while before you can save your model.
Additionally, the model performance may fluctuate up and down the the best weights may not necessarily be the ones from the last epoch.
So, you use a callback (a function which is injected into another function to be called at some particular time or on some particular condition). There is a quick way to create a callback to save your model during training; via the ModelCheckpoint class
This will cause the fitting loop to check the 'val_acc' metric at the end of each epoch, and save the model in HDF5 format if the metric is better than the previous best. There will be a few saves early in training as the accuracy has the most rapid improvements; but will be less frequent after 10 or so epochs.
The callbacks_list is passed to the callbacks argument of the fit function as above.
When you need to use the model again (for further training or for inference) you can simply call
The entire trained model can be passed around in a single file, making collaboration much easier!
Conclusions & References
That's about all for this tutorial; it's fairly long but the underlying code is just a few lines to run a state of the art classifier network on your data! Incredible stuff, and credit to the guys who work so hard in the background to create this powerful open source software.
Although, if you are going to run some of the larger networks, Google colab (free use of GPU) or a local GPU machine is pretty much essential.
The keras documentation is very good as a reference and has useful code snippets:
Making the grid plot
Below is the code to make the grid plot of training data
Comments