Recipe for constructing loss functions
- Using Maximum Likelihood
- For training data xi,yi
- Choose a probability distribution Pry∣θ defined over the domain of the predictions y with distribution parameters θ
- Choose an ML model f∣x,ϕ∣ where θ=f∣x,ϕ∣ and Pr(y∣θ)=Pr(y∣f∣x,ϕ∣)
- Training → Find the parameters ϕ that minimize the Negative Log Likelihood over the training data xi,yi
- Inference → Either return Pr(y∣f[x,ϕ^]) or the value where this distribution is minimized
- If data is differently distributed and there is no loss associated, just transform the distribution beforehand