- Use prior knowledge as beliefs (param vectors θ). Cast in the form of a Probability distribution over the space Θ .
- Weak knowledge most times
- For a K parametric PDF px , Θ∈RK .
- Not connected to Random variable(RVS).
- Does not model outcomes. Instead has “beliefs” about true distribution PXi
- Each θ∈RK corresponds to one specific PDF pX(θ) → single candidate distribution P^X for values xi (In frequentist, it models single data points)
- Since this is a distribution over Distributions, it is a hyperdistribution
- N dim PDF p⊗xi:RN→R≥0 for the distribution of RV⊗iXi
- p⊗ixi((xi,…,xN))=px1,…,pxN(xN)=ΠipX(xi)
- p⊗ix(D∣θ) → PDF values on a data sample D p⊗ixi((xi,…,xN))=p⊗i(θ)(D)
- When θ is fixed then p⊗ix(D∣θ) is a function of data vectors D. For each sample, it describes how probable this distribution is assuming the true distribution of X is pX(θ)
- When D is fixed, then it is a function of θ. But this does not really measure anything.
- Integral over θ is not 1
- It is a function of θ and so it is a likelihood function. MLE
- If given data D → it can show which models are more likely than others.
- Higher values of p⊗ix(D∣θ) are better