Loss for binary classification
- Two classes
- Probability distribution - Bernoulli Distribution
- Training → Model to predict , but these values might not lie in [0,1] and we need it to. → Sigmoid
- Cross Entropy
- This represents the probability that y = 1, and it follows that 1 − λ represents the probability that y = 0. When we perform inference, we may want a point estimate of y, so we set y = 1 if λ > 0.5 and y = 0 otherwise.