In Tan-h (Tangent hyperbolic function) which is non-linear when you pass any value from – infinity to + infinity it will always take up values between –1 and +1 the plot below shows how a tan-h function looks like. At – infinity the value of tan h becomes –1 and at infinity it becomes 1. The tanh function is mainly used classification between two classes.
In this equation the numerator is e^x – e^ -x and the denominator is e^ x + e^ -x.
Tan-h function is also called a shifted version of the sigmoid function. Like sigmoid, tanh also has the vanishing gradient problem.
f(x) = tanh(x) = 2/(1 + e-2x) – 1
tanh(x) = 2 * sigmoid(2x) – 1
It is used in hidden layers of a neural network, its values lies between -1 to 1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps to centralize the data by bringing mean close to 0. This makes learning for the next layer much easier.
The derivatives of the tan-h are larger than the derivatives of the sigmoid. In other words, you minimize your cost function faster if you use tanh as an activation function.
The range is between -1 and 1 compared to 0 and 1, makes the function to be more convenient for neural networks.
- Tanh functions — error surface can be very flat at origin. So, initializing very small weights should be avoided.
- Vanishing gradient problem and sometimes Exploding gradient problem.
- May show different results during convergence based on variation in data.