Loss for multiclass classification

 
  • Map to one of classes
  • Distribution - Categorical Distribution
  • The parameters are constrained to take values between zero and one, and they must collectively sum to one to ensure a valid probability distribution
  • Network computes params from input x
  • Since the output of the network does not conform to the format, we use Softmax so results are positive, and the K numbers sum to one
  • Loss function : Negative Log Likelihood
  • This is then - Cross Entropy