Illustrated Guide to Recurrent Neural Networks

If you are just getting started in ML and want to get some intuition behind Recurrent neural networks, this post is for you.You can also watch the video version of this post if you prefer.If you want to get into machine learning, recurrent neural networks are a powerful technique that is important to understand..It has its input layer, hidden layer, and the output layer.Feed Forward Neural NetworkHow do we get a feed-forward neural network to be able to use previous information to effect later ones?.An RNN has a looping mechanism that acts as a highway to allow information to flow from one step to the next.Passing Hidden State to next time stepThis information is the hidden state, which is a representation of previous inputs..RNN’s work sequentially so we feed it one word at a time.Breaking up a sentence into word sequencesThe first step is to feed “What” into the RNN..The RNN encodes “What” and produces an output.For the next step, we feed the word “time” and the hidden state from the previous step..You can see by the final step the RNN has encoded information from all the words in previous steps.Since the final output was created from the rest of the sequence, we should be able to take the final output and pass it to the feed-forward layer to classify an intent.For those of you who like looking at code here is some python showcasing the control flow.Pseudo code for RNN control flowFirst, you initialize your network layers and the initial hidden state..The shape and dimension of the hidden state will be dependent on the shape and dimension of your recurrent neural network..Then you loop through your inputs, pass the word and hidden state into the RNN..The control flow of doing a forward pass of a recurrent neural network is a for loop.Vanishing GradientYou may have noticed the odd distribution of colors in the hidden states..That is to illustrate an issue with RNN’s known as short-term memory.Final hidden state of the RNNShort-term memory is caused by the infamous vanishing gradient problem, which is also prevalent in other neural network architectures..Short-Term memory and the vanishing gradient is due to the nature of back-propagation; an algorithm used to train and optimize neural networks..You can think of each time step in a recurrent neural network as a layer..So not being able to learn on earlier time steps causes the network to have a short-term memory.LSTM’s and GRU’sOk so RNN’s suffer from short-term memory, so how do we combat that?. More details

Leave a Reply