Deep learning (full stack deep learning) is a subset of Machine Learning (ML) related to algorithms based on brain structure and functions – artificial Neural Networks (Neural Networks).
If you are just starting out in deep learning or have some experience with neural networks or you want to be full stack deep learning, you may be confused. Leaders and experts in the field have an idea of what deep learning is, and these narrow viewpoints shed light on the concept.
Deep learning is big neural networks
Andrew Eun, co-founder of Coursera and chief scientist at Baidu Research, also founded Google Brain, which eventually led to the introduction of deep learning technologies into Google services. He has talked and written extensively about what deep learning is, and that’s a good place to start.
In his early talks on deep learning, Andrew described deep learning in the context of traditional artificial neural networks. In his 2013 talk titled “Deep Learning, Self-Learning, and Teacherless Learning,” he described the idea of deep learning as follows:
“By using brain simulation, we hope to: Make learning algorithms much better and easier to use Make revolutionary advances in Machine Learning and Artificial Intelligence. I believe this is our best chance at real AI.”
Later, his comments became more nuanced. According to Andrew, the point of deep learning is that we now have enough fast computers and enough data to train large neural networks. Discussing exactly why deep learning is gaining momentum now at ExtractConf 2015 in a talk titled “What Data Scientists Should Know About Deep Learning,” he commented:
“…the very large neural networks that we have, and . the enormous amounts of data that we have access to…”
He also commented on an important point: It’s all about scale. As we build larger neural networks and train them with more and more data, their performance continues to grow. This is usually different from other machine learning methods, which reach a plateau in performance:
“For most old-generation learning algorithms … performance will be stable. … Deep learning … is the first class of algorithms … that are scalable. … The performance gets better as you give them more data.”
Finally, he makes it clear that the benefits of deep learning that we are seeing in practice come from learning with a teacher. From a presentation at ExtractConf in 2015, he commented:
“Almost all of the value of deep learning today is in Supervised Learning, or learning from marked-up data.”
Andrew often mentions that we will see more benefits coming from unsupervised learning as the field of deep learning matures, because in reality we often have to deal with an abundance of unplaced data.
Jeff Dean is a senior researcher at Google in the Systems and Infrastructure group. He has been involved in, and may have been partially responsible for, the scaling and implementation of deep learning at Google. Jeff was involved in the Google Brain project and the development of the large-scale deep learning software DistBelief and then TensorFlow.
In his 2016 talk, titled “Deep Learning for Building Intelligent Computer Systems,” he made a comment in the same vein: deep learning is really about large neural networks:
“When you hear the term deep learning, just imagine a large deep neural network. Deep usually refers to the number of layers, which is why this popular term is used in the press. I think of them generally as deep neural networks.”
He has given this talk several times and in a modified set of slides for the same talk he emphasizes the scalability of neural networks, pointing out that results improve with more data and larger Models, which in turn require more processing power.
Deep Learning – Hierarchical Feature Learning
In addition to scalability, another often-cited advantage of deep learning models is their ability to perform automatic Feature extraction from raw data, also called feature learning
Joshua Benjio is another leader in the field of deep learning, although he began with a strong interest in the automatic feature learning that large neural networks are capable of.
He describes deep learning in terms of the ability of algorithms to detect and learn good representations using the learning function. In his 2012 paper titled “Deep Learning of Representations for Unsupervised and Transfer Learning,” he commented:
“Deep learning algorithms seek to use an unknown input distribution structure to discover good representations, often at multiple levels, with learned higher-level features defined in terms of lower-level features.”
Scalable deep learning across domains
Deep learning is best for problem domains where the inputs (and even outputs) are analog. That means not a few values in a tabular format, but images of pixel data, documents of text data, or files of audio data.
Yann LeCun is the director of Facebook Research and the father of a network architecture that stands out when recognizing objects in image data, called the Convergent Neural Network (CNN). This method is a great success because, like Perceptron’s forward-coupled multilayer neural networks, the method scales with data and model size and can be trained with Back Propagation.
This distorts his definition of deep learning as a development of very large CNNs that have had great success in recognizing objects in photos.
In his 2016 talk at the Lawrence Livermore National Laboratory titled “Accelerating Understanding: Deep Learning, Intelligent Applications, and Graph Processors,” he described deep learning in general as learning from hierarchical representations and defined it as a scalable approach to building object recognition systems:
“Deep learning is a set of modules, each of which can be trained . … Learning is deep because [has] multiple steps in the object recognition process, and all of those steps are part of the learning.”
Hierarchical presentation of learning types. Photo: [email protected]
Jürgen Schmidhuber is the father of another popular algorithm that, like the Multi-Level Perceptron (MLP) and CNN, also scales according to model size, data set and can be trained using backward error propagation, but is instead adapted for data learning sequences called Long Term Memory (LSTM).
We do see some confusion in the wording of the concept of Deep Learning. In his 2014 article titled “Deep Learning in Neural Networks: An Overview,” he comments on the problematic naming of the domain and the distinction between deep and surface learning. He also interestingly describes depth in terms of the complexity of the problem rather than the model used to solve the problem:
“At what depth of the problem does surface learning end and deep learning begin? Discussions with experts have not yet given a definite answer to this question. […], Let me just define for the purposes of this review: problems of depth greater than 10 layers require deep learning.
Demis Hassabis is the founder of DeepMind, later acquired by Google. The startup once made a breakthrough by combining Deep Learning techniques with Reinforcement Learning to solve complex learning problems such as gaming, as well demonstrated in Atari products and the Alpha Go game.
In keeping with the name, they called their new technique Deep Q-Network, combining Deep Learning with Q-Learning. They also called this broader field “Deep Learning with Reinforcement.”
In their 2015 research paper, titled “Human-level Control through Deep Learning with Reinforcement,” they comment on the important role of deep neural networks in their breakthrough and emphasize the need for hierarchical abstraction:
“To this end, we have developed a new agent, the deep Q-network (DQN), which is capable of combining reinforcement learning with a class of artificial neural networks known as deep neural networks. Notably, recent advances in deep neural networks, in which multiple levels of nodes are used to build increasingly abstract representations of data, have allowed artificial neural networks to learn concepts such as object categories directly from raw sensory data.”
Finally, in what could be considered a defining article in this field, Yann LeCun, Joshua Benjio, and Jeffrey Hinton published a paper in Nature called “Deep Learning.” In it, they open with a clear definition of deep learning, emphasizing a layered approach.
Deep learning allows computational models consisting of multiple levels of processing to learn representations of data with multiple levels of abstraction.