Loss for multiclass classification

Map $x$ to one of $K > 2$ classes
Distribution - Categorical Distribution
The parameters are constrained to take values between zero and one, and they must collectively sum to one to ensure a valid probability distribution $P r (y = k) = λ_{k}$
Network computes $K$ params from input x
Since the output of the network does not conform to the format, we use Softmax so results are positive, and the K numbers sum to one
Loss function : Negative Log Likelihood
This is then - Cross Entropy

Subhaditya's KB