Convolutional neural networks are powerful tools in the field of computer vision, and they tend to do very well at image recognition and classification tasks. Understanding why they work as well as they do can be a very daunting task however, especially when considering networks that do more than my toy MNIST example. Luckily the tools and techniques for visualizing ConvNet filters already exist, and should be easy to apply in this example. Almost all of the concepts and much of the code below is adapted from a blog post by Francois Chollet, the creator of the Keras library.
In my last post I created a convolutional neural network (ConvNet) using Keras and trained it on MNIST data for a Kaggle competition. This time I will create images for all of the filters in each of the four convolutional layers of that model, and then have the model generate the “perfect” version of each of the 10 digits it has been trained on. My hope is that this will help illuminate how the model goes about turning a bunch of messy pixels of human numeric digits in to nice clean digital representations.
If you squint really hard, then the above images do sort of look like the digits they’re meant to represent. The are very grey, however, which isn’t at all like the original white on black MNIST digits provided. We can de-average the digits to restore them to a darker and less grey state:
We can now make sure we computed things correctly by feeding these “perfect” digits back to the neural network for classification. If it doesn’t return a correct classification for any of them, then we should probably suspect that something strange has happened.
[0 1 2 3 4 5 6 7 8 9]
Everything checks out. It looks like the above images are good representations of what my MNIST convolutional neural network considers “perfect” for each digit. Ultimately it didn’t do too terribly either; I can sort of recognize the digits myself. This probably has a lot to do with my prior knowledge of what each is supposed to represent though, and it might not go so well if I asked someone what each image was supposed to be without providing any context.
Still, it is reassuring that there are features in the above image that a human can recognize. In the blog post by Francois Chollet that inspired me to use these same techniques on my own model, it was found that the VGG16 (OxfordNet) model’s idea of the perfect sea snake or perfect magpie looked nothing like what a human would consider either of those to be. My model has at least made it past the psychedelic patterns stage to where it looks like some proper abstraction is occurring in the final layer. This reassures me that it may not be completely over fit to the training data, and gives me a shred of hope that it will do well on the final private leader board scoring of the Kaggle MNIST digit recognition competition.