Deep learning explained

Deep Learning Hardware

In recent years Deep Learning has become the most successful approach to pattern recognition for perceptual tasks. When you speak to Siri, Cortana, or Google Voice, your speech is being interpreted by a Deep Neural Network. And in the Large Scale Vision Recognition Challenge, Deep Neural Networks are outperforming humans at visual recognition tasks.

Deep Learning is fast becoming mainstream; from film recommendations to Facebook tags, from autonomous vehicles to defeating the European GO champion, Deep Learning is finding its application everywhere. Deep Learning is not new however; in fact it’s been around since the 1970’s in one form or another.

Deep learning in a nutshell

The basic idea is to train a very deep (i.e. lots of layers) neural network. Multiple studies have shown that neural networks, if appropriately configured, can reproduce any function (think universal Turing machines). However this doesn't mean that we know how to configure them.

This is where Deep Learning comes in as by having lots and lots of layers a Deep Neural Network will solve a problem in lots of little steps, rather than in one or two big steps.

While this may not seem such a revolutionary idea, it means that with a sufficiently large set of training data it is possible to train / configure these neural networks to solve tasks that have previously eluded us.

Deep learning revolution

While all this is true, the training of Deep Neural Networks is incredibly computationally expensive, not only are the networks themselves very large but huge data sets are required to train them well. Until recently we simply didn’t have the computational power, or access to the data required for Deep Learning to showcase what it can do, this changed with the use of NVIDIA graphics cards for parallel programming and Deep Learning is now almost exclusively trained on GPUs, while the deployment of the resulting trained networks can be a relatively light load.

Deep learning is essentially large (many complex layers) neural networks. What has changed over neural networks we knew in the 80s and the 90s compared to the current networks, is that (a) computers such as the NVIDIA DGX-1 have become fast enough, (b) data sets are big enough [imaging and video data], and (c) we can now (through many improved techniques) initialize the neural network training better

Deep Neural Networks

Most deep learning applications today are still very much supervised learning (although great strides are currently being made in unsupervised learning).

However, unlike other machine learning algorithms, whose performance plateaus with scale and volume of data, if we construct larger neural networks with greater quantities of data, their performance continues to increase.

How does Deep Learning work?

The most common form of Deep Learning applies to what is called a convolutional neural network, this is a special kind of neural network in which each artificial neurone is connected to a small window over the input or previous layer. For example, in a visual task, each neurone in the first convolution layer will only see a small part of the image, maybe only a few pixels. This convolution layer consists of multiple maps, each searching for a different feature, and each neurone in a map searching for that feature in a slightly different location.

This first layer will come (after some training) to identify useful low level features in the image, such as lines, edges, and gradients in different orientations. This convolution layer is then sub-sampled in what is called a pooling layer, before the whole process starts again with another convolution layer this time finding combinations of the features of the previous layer (lines, corners, curves etc). As with most neural networks, the parameters or weights of the system start out randomly, and the network will perform poorly. During training however you can program the network what the correct classification of an image is, and over many many examples the network parameters / weights are slowly modified to give the correct classification.