Loss for binary classification

Two classes
Probability distribution - Bernoulli Distribution
Training → Model to predict $λ$ , but these values might not lie in [0,1] and we need it to. → Sigmoid
Cross Entropy
This represents the probability that y = 1, and it follows that 1 − λ represents the probability that y = 0. When we perform inference, we may want a point estimate of y, so we set y = 1 if λ > 0.5 and y = 0 otherwise.