10 Days Of Grad: Deep Learning From The First Principles
Day by day, here we demystify neural networks.
Neural networks is a topic that recurrently appears throughout my life. Once, when I was a BSc student, I got obsessed with the idea to build an "intelligent" machine1. I spent a couple of sleepless nights thinking. I read a few essays shedding some light on this philosophical subject, among which the most prominent, perhaps, stand Marvin Minsky's writings2. As a result, I came across neural networks idea. It was 2010, and deep learning was not nearly as popular as it is now3.
In the previous article, we have introduced the concept of learning in a single-layer neural network. Today, we will learn about the benefits of multi-layer neural networks, how to properly design and train them.
Sometimes I discuss neural networks with students who have just started discovering machine learning techniques:
"I have built a handwritten digits recognition network. But my accuracy is only Y."
"It seems to be much less than state-of-the-art", I contemplate.
Now that we have seen how neural networks work, we realize that understanding of the gradients flow is essential for survival. Therefore, we will revise our strategy on the lowest level. However, as neural networks become more complicated, calculation of gradients by hand becomes a murky business. Yet, fear not young padawan, there is a way out! I am very excited that today we will finally get acquainted with automatic differentiation, an essential tool in your deep learning arsenal.
Which purpose do neural networks serve for? Neural networks are learnable models. Their ultimate goal is to approach or even surpass human cognitive abilities. As Richard Sutton puts it, 'The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective'. In his essay, Sutton argues that only models without encoded human-knowledge can outperform human-centeric approaches. Indeed, neural networks are general enough and they leverage computation.
Today we have finally arrived at the "master algorithm" in computer vision. That is how François Chollet calls convolutional neural networks (CNNs). First, we are going to build an intuition behind those algorithms. Then, we are taking a look at basic CNN architecture. After discussing the differences between convolutional layer types, we are going to implement them in Haskell.
Day 1: Learning Neural Networks The Hard Way Day 2: What Do Hidden Layers Do?