Contrastive loss
- Minimize distance between similar inputs Gradient Descent, maximize between dissimilar Gradient Ascent
- Learn Embedding/Feature space using neighbors
- dim(Embedding d) < dim(input Space D)
- Encoded using a learnable function(NN) Gθ(x):RD→Rd
- Binary labels : similar or not
- $$D_\theta(x_1, x_2) = ||G_\theta(x_1) - G_\theta(x_2)||_2$
- $L(\theta, y, x_1, x_2) = \frac{(1-y)(D_\theta(x_1, x_2))^2}{2} + \frac{y(max(0,m-D\theta(x_1, x_2)))^2}{2}$$
- m is enforced margin between similar and dissimilar (m>0)
- Labeled points (y,x1,x2) are generated