10 Days Of Grad: Deep Learning From The First Principles
Day by day, here we demystify neural networks.
Neural networks is a topic that recurrently appears throughout my life. Once, when I was a BSc student, I got obsessed with the idea to build an "intelligent" machine1. I spent a couple of sleepless nights thinking. I read a few essays shedding some light on this philosophical subject, among which the most prominent, perhaps, stand Marvin Minsky's writings2. As a result, I came across the neural networks idea. It was 2010, and deep learning was not nearly as popular as it is now3.
In the previous article, we have introduced the concept of learning in a single-layer neural network. Today, we will learn about the benefits of multi-layer neural networks, how to properly design and train them.
Sometimes I discuss neural networks with students who have just started discovering machine learning techniques:
"I have built a handwritten digits recognition network. But my accuracy is only Y."
"It seems to be much less than state-of-the-art", I contemplate.
Now that we have seen how neural networks work, we realize that understanding of the gradients flow is essential for survival. Therefore, we will revise our strategy on the lowest level. However, as neural networks become more complicated, calculation of gradients by hand becomes a murky business. Yet, fear not young padawan, there is a way out! I am very excited that today we will finally get acquainted with automatic differentiation, an essential tool in your deep learning arsenal.
Which purpose do neural networks serve for? Neural networks are learnable models. Their ultimate goal is to approach or even surpass human cognitive abilities. As Richard Sutton puts it, 'The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective'. In his essay, Sutton argues that only models without encoded human-knowledge can outperform human-centeric approaches. Indeed, neural networks are general enough and they leverage computation.
Today we will talk about one of the most important deep learning architectures, the "master algorithm" in computer vision. That is how François Chollet, author of Keras, calls convolutional neural networks (CNNs). Convolutional network is an architecture that, like other artificial neural networks, has a neuron as its core building block. It is also differentiable, so the network is conveniently trained via backpropagation. The distinctive feature of CNNs, however, is the connection topology, resulting in sparsely connected convolutional layers with neurons sharing their weights.
Last week Apple has acquired XNOR.ai startup for amazing $200 million. The startup is known for promoting binarized neural network algorithms to save the energy and computational resources. That is definitely a way to go for mobile devices, and Apple just acknowledged that it is a great deal for them too. I feel now is a good time to explain what binarized neural networks are so that you can better appreciate their value for the industry.
So far we have explored neural networks almost in the vacuum. Although we have provided some illustrations for better clarity, relying an existing framework would allow us to benefit from the knowledge of previous contributors. One such framework is called Hasktorch. Among the practical reasons to use Hasktorch is relying on a mature Torch Tensor library. Another good reason is strong GPU acceleration, which is necessary for almost any serious deep learning project.
Wouldn't it be nice if the model also told us which predictions are not reliable? Can this be done even on unseen data? The good news is yes, and even on new, completely unseen data. It is also simple to implement in practice. A canonical example is in a medical setting. By measuring model uncertainty, the doctor can learn how reliable is their AI-assisted patient's diagnosis. This allows the doctor to make a better informed decision whether to trust the model or not.
Imagine you are a designer and you want a new font: A little bit heavier, with rounder letters, more casual or a little bit more fancy. Could this font be created just by tuning a handful of parameters? Or imagine that you are a fashion designer and you would like to create a new collection as a mix of previous seasons? Or that you are a musician desperately looking for inspiration.
Ever wondered how machines defeated the best human Go player Lee Sedol in 2016? A historical moment for the game that was previously considered to be very tough. What is reinforcement learning that generates so much buzz recently? Superhuman performance playing arcade Atari games, real-time Tokamak plasma control, Google data center cooling, and autonomous chemical synthesis are all the recent achievements behind the approach. Let's dive to learn what empowers deep reinforcement learning.