Maximum Likelihood

 

Effect

  • Consider a model that computes an output from input .
  • Consider the model computes a conditional probability distribution , Y is output
  • This encourages each output to have high probability under computed from input

Computing a distribution over inputs

  • Choose a parametric distribution defined on output domain y.
  • use the network to compute one or more of the parameters of this distribution
  • For example, suppose the prediction domain is the set of real numbers, so y ∈ R. Here, we might choose the univariate normal distribution, which is defined on R. This distribution is defined by the mean μ and variance σ2, so θ = {μ,σ^2}. The machine learning model might predict the mean μ, and the variance σ^2 could be treated as an unknown constant.

Maximum likelihood criterion

Log likelihood criterion

Minimizing Log Likelihood Loss

Inference

  • When we perform inference, we often want a point estimate rather than a distribution, so we return the maximum of the distribution