Deep Neural Networks are all the craze in the field of Artificial Intelligence. But contrary to their very recent success, the main idea behind neural networks has been around for a longer time. Let’s take a look at when neural networks were first introduced and how they took off to become so popular.
The origins of deep learning goes back to as early as the 1940s. The reason most people are unaware of the origins of deep learning going this far back is due to the fact that the approaches used back then lacked the resources for them to be impactful, and hence had a lot of shortcomings and were deemed unsuccessful.
The evolution of deep learning can be divided into three waves - cybernetics, connectionism and deep learning.
One of the earliest predecessors of modern Deep Learning is something known as “Cybernetics”. Cybernetics is based on how human brains learn and these simple computational models were designed to help build systems that would start learning like actual animal brains. Research in this field still continues to this day under the term of “Computational Neuroscience”. The time period from 1940s to 1960s was the era of cybernetics in our timeline.
At the core of cybernetics was the attempt to mimic an actual biological neuron. A lot of basic concepts that were introduced under cybernetics, are used today for neural networks as basic functionalities. Some of these are perceptrons and stochastic gradient descent. These models, however, had made very bold promises in terms of output and performance that they could not fulfil at the time because of which they saw a decline in their popularity. Because these models were inspired by neuro-scientific research, the dip in their popularity also inspired exploration of models apart from neuroscientific basis.
Connectionism or Parallel Distributed Processing became popular in the 1980s. This approach was inspired by cognitive science. The concept of Artificial Neural Networks was established during this wave. The basic idea behind them was to develop a network of individual components that can be programmed to achieve “artificial intelligence”. This was also the very first time the concept of multiple layers was introduced, or better known as “hidden layers”.
Such a network is connected with each other allowing parallel signal processing that is distributed along multiple branches of the network. The connection between individual units (neurons) contained weights that indicated the strength of the connection between two neurons. This approach was modelled on what happens inside our nervous system.
During this phase in our timeline various models like LSTM and back-propagation to train deep neural networks were developed. These are essential components of deep neural networks to this day.
This wave ended in the 1990s when AI-based startups began to make impractical claims on what these models could do, and were ultimately not able to deliver on these claims due to computational limitations. Due to this inability to deliver, investors pulled back, leading to a dip in the second wave of deep learning.
Although, technically this wave didn’t die because research still continued in the field, however applications were not aplenty till the early 2000s.
After the first two waves in this evolution of deep learning, came the final breakthrough in 2006 in the form of greedy layer-wise training for the training of deep belief networks (DBN). Simply put, these are a composition of multiple hidden layers with each layer containing various latent variables. Connections only exist between layers but not between the variables inside a layer. The simplest implementation of DBN is also called a Boltzmann Machine.
These advancements were used to train different types of deep neural networks, enabling researchers to train deep neural networks with much more ease, leading to the term “Deep Learning” becoming very popular.
Other factors that led to this popularisation were the increase in computational power and the widespread availability of large datasets. The latter was discovered to be highly important in increasing the deep learning models to be extremely potent and powerful.Algorithms that were developed during the connectionism period started to give better results when trained on larger and larger datasets.
One of the major differences between the previous phases and this phase is that more people are using online services and thus we have a lot more data and much better resources to work with that data, thus increasing the accuracy for these models. Another effect of these factors has been that we are able to discover and implement many more practical uses of deep learning.