Bayesian Prior

  • Use prior knowledge as beliefs (param vectors ). Cast in the form of a Probability distribution over the space .
    • Weak knowledge most times
    • For a K parametric PDF , .
    • Not connected to Random variable(RVS).
    • Does not model outcomes. Instead has “beliefs” about true distribution
    • Each corresponds to one specific PDF single candidate distribution for values (In frequentist, it models single data points)
    • Since this is a distribution over Distributions, it is a hyperdistribution
    • N dim PDF for the distribution of
      • PDF values on a data sample D
    • When is fixed then is a function of data vectors D. For each sample, it describes how probable this distribution is assuming the true distribution of X is
    • When D is fixed, then it is a function of . But this does not really measure anything.
      • Integral over is not 1
      • It is a function of and so it is a likelihood function. MLE
      • If given data D it can show which models are more likely than others.
      • Higher values of are better