Long Short-Term Memory is an advanced version of recurrent neural network (RNN) architecture. this is because of the vanishing gradient problem in RNN.
During back propagation there is the problem of vanishing gradient , when the weights are getting updated with the help of back propagation using the chain rule , at that point the weight actually becomes a very very small value or the weight updation doesn’t happen and the derivative will be a very small number.
In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations.
There are divided to few components
- Memory cell
- Forget gate
- Input gate
- Output gate
All these consist of lstm network, First we will start with Memory cell:
Memory cell is used to remember and forget things based on the context of the input .as the context of the statement changes it should be able to remember the previous and also add some new information.
The first operation is point wise operation to the vector of output of the previous state to the input which is given at present .when we do this step of point wise operation when the vector contains zero then it will forget the information ,when there is no zero then it will pass the whole vector to the next step, which performs the addition operation, when the context is changing its forgetting some information.
The previous output and the input is being concatenated , at this time some weights will be initialized along with the previous input and the present input. And then it forms into an equation And the whole information is passed into the sigmoid ,sigmoid transforms the values between zeros to one. If the previous value is not similar to this new input , if we get more number of ones it means the context has not changed .zeros means its changed and it will forget few information.
The next step is to decide what we have to store in the cell state .the point wise operator is to add and the same operation will take place and then a tanh is being used which will transfer the values from -1 to +1 , then both the sigmoid operation and the tanh will perform a point wise operation and then we will get the ouput and will be adding it to the memory cell . the whole process is called as input layer .
Finally it’s the output layer , the same concatenated information is again passed to the sigmoid function .then the ino=formation from the memory cell is then passed to the tanh function and they will get combined with the point wise operation .only the meaningful information will be passed as output to the next cell .
- LSTM is a special kind of recurrent neural network capable of handling long-term dependencies.
- The above blog gives us the architecture and working of an LSTM network
- LSTM are widely used when compared to RNN .