Understandably, we don’t pass a single day without innovation or research. And hence in recent times, it gets difficult to cope up with the latest advancements happening in deep learning.
Deep learning has an extraordinary impact on several high-profile fields, which include research like natural language processing (NLP), speech recognition, face recognition, machine translation, and many more! Let us know about the advancements in deep learning considering the architectures which have been successful in the Computer Vision domain.
- Among the five deep learning architectures, CNN and LSTM are the oldest yet most widely used architectures.
- The RNN is useful in Speech recognition, handwriting recognition.
- LSTM/GRU network is helpful in Natural language text compression, handwriting recognition, speech recognition, gesture recognition, image captioning.
- CNN is for image recognition, video analysis, and natural language processing.
- DBN is for Image recognition, information retrieval, natural language understanding, and failure prediction.
- DSN is mostly for information retrieval and continuous speech recognition.
Let us now study them in brief.
Recurrent Neural Network (RNN)
All the other deep learning architectures are built using the Recurrent Neural Network (RNN) architecture since it is the foundational network architecture. A typical recurrent network and a multilayer network have a primary difference. Rather than entirely feed-forward connections, a recurrent network might have connections that give feedback into the same layers (or into the prior layer). RNNs maintain the memory of past inputs and model problems in time, allowed by the input.
RNNs consist of a rich set of architectures. The primary and critical differentiator is feedback within the network, which could manifest itself from the output layer, a hidden layer, or some combination. RNNs can be trained with standard back-propagation or by using a variant of back-propagation, called back-propagation in time (BPTT), or can be unfolded in time.
LSTM or GRU Networks
Hochreiter and Schimdhuber created LSTM in 1997, but its popularity has grown in recent times as an RNN architecture for various applications. You can find LSTMs in products that you use in your everyday life, such as smartphones. For milestone-setting conversational speech recognition, IBM applied LSTMs in IBM Watson.
The LSTM introduced the concept of a memory cell departing from the typical neuron-based neural network architectures. Retention of value in the memory cell can happen for a long or short time concerning its inputs; this allows the cell to remember its last computed value and what’s essential.
The LSTM memory cell consists of three gates that control how information flows out of or into the cell. The input gate controls the flow of any new information into the memory.
In case an existing piece of information is forgotten, the forget gate controls, allowing the cell to remember new data.
A simplification of the LSTM was introduced in 2014 called the Gated Recurrent Unit (GRU). Getting rid of the output gate present in the LSTM model, the GRU model consists of two gates. Performance of the GRU is similar to the LSTM in many applications, but the difference is that being simpler means fewer weights and faster execution in GRU.
The GRU consists of two gates viz: an update gate and a reset gate. The update gate looks after how much to maintain the previous cell content. The reset gate defines the ways to incorporate the new input with the cell earlier contents. Only by setting the reset gate to 1 and the update gate to 0, a GRU can model a standard RNN.
The GRU can be trained more quickly and can be more efficient in its execution and is way more straightforward than the LSTM.
Convolutional Neural Network or CNN
A Convolutional Neural Network aims to learn higher-order features in the data with the help of convolutions. As the name suggests, the network employs convolution, a mathematical operation. In at least one of their layers, these networks use convolution in place of general matrix multiplication, so these networks are neural networks.
They can identify faces, street signs, platypuses, individuals, and many other visual data aspects. Using optical character recognition, CNNs overlap with text analysis since they are well suited to object recognition and consistently top image classification competitions. They are also useful in analyzing sound or when analyzing words as discrete textual units.
Deep Belief Network or DBN
The DBN includes a novel training algorithm and is a typical network architecture. The DBN is a multilayer network in which each connected layer is a restricted Boltzmann machine RBM). Corresponding to this, DBN is represented as a stack of RBMs.
In the DBN, the raw sensory inputs are represented by the input layer, and each hidden layer learns abstract representations of this input. The network classification is implemented by the output layer, which is treated somewhat differently than the other layers. Unsupervised pre-training and supervised fine-tuning are the two steps of training.
In unsupervised pre-training, to reconstruct its input is trained to each RBM. The next RBM is trained on similar lines, but the first hidden layer is treated as the visible or the input layer. Till each layer pertains, this process continues. Fine-tuning begins when the pre-training is complete.
Deep stacking networks
The DSN is the final architecture, and it is also called a deep convex network. Although it consists of a deep system, a DSN is different from traditional deep learning frameworks. DSN is a deep set of individual networks, each with its hidden layers. This architecture overcomes one of the problems with deep learning: the complexity of training. The complexity of training increases exponentially in a deep learning architecture, so the DSN doesn’t view training as a single problem but as a set of individual training problems.
Conclusion
There are wide and varied numbers of architectures and algorithms in deep learning. We have successfully learned about five of the most popular deep learning architectures.