Distillation Loss l(p,softmax(z))+T2l(softmax(Tr),softmax(Tz)) Negative Cross Entropy + other p is the true Probability Distributions z,r are outputs of the student and teacher model T is the temperature to make Softmax smoother