SoftMax is a non-linear activation function an activation function is a non-linear transformation that we do over the input before sending it to next layer of neuron output. Suppose if we have a neural network trained to predict based on some classes (A, B, C, D…) and use SoftMax layer as output the SoftMax function will predict probability (P1, P2, P3, P4…) from each class from which the input belongs. The number of SoftMax units in output layer should always be equal to the number of classes so each unit can hold the probability of the class also known as probability distribution where sum of probabilities of each class is equal to 1.

**INTUTION**

When you have multiple classes (greater than 2) SoftMax function is really helpful as it will convert all your values into normalized probability function. We use numerator and denominator in the SoftMax function it is basically the ratio of both the terms.

Formula –

**When to use?**

While dealing with classification problem with greater than 2 classes for example Cat, Dog, Mouse etc. With neural network output values varying from negative to positive we can pass in the SoftMax values to find the argument with the maximum value.

**Key TAKEAWAYS**

SoftMax is used as the activation function in the output layer of neural network models.

SoftMax assumes that each example is a member of exactly one class.

Takes real takes vector values as inputs of classes and returns probability distribution.

SoftMax outputs sum between 0 and 1.

SoftMax is a non-linear activation and is arguably the simplest of the activation functions.