Subhaditya's KB

❯

❯

Bayesian Prior

Sep 18, 20242 min read

temp

Bayesian Prior

Use prior knowledge as beliefs (param vectors $θ$ ). Cast in the form of a Probability distribution over the space $Θ$ .
- Weak knowledge most times
- For a K parametric PDF $p_{x}$ , $Θ \in R^{K}$ .
- Not connected to Random variable(RVS).
- Does not model outcomes. Instead has “beliefs” about true distribution $P_{X_{i}}$
- Each $θ \in R^{K}$ corresponds to one specific PDF $p_{X} (θ)$ → single candidate distribution $\hat{P}_{X}$ for values $x_{i}$ (In frequentist, it models single data points)
- Since this is a distribution over Distributions, it is a hyperdistribution
- N dim PDF $p_{\otimes} x_{i} : R^{N} \to R^{\geq 0}$ for the distribution of $R V \otimes_{i} X_{i}$
  - $p_{\otimes_{i}} x_{i} ((x_{i}, \dots, x_{N})) = p_{x_{1}}, \dots, p_{x_{N}} (x_{N}) = Π_{i} p_{X} (x_{i})$
  - $p_{\otimes_{i}} x (D ∣ θ)$ → PDF values on a data sample D $p_{\otimes_{i}} x_{i} ((x_{i}, \dots, x_{N})) = p_{\otimes_{i}} (θ) (D)$
- When $θ$ is fixed then $p_{\otimes_{i}} x (D ∣ θ)$ is a function of data vectors D. For each sample, it describes how probable this distribution is assuming the true distribution of X is $p_{X} (θ)$
- When D is fixed, then it is a function of $θ$ . But this does not really measure anything.
  - Integral over $θ$ is not 1
  - It is a function of $θ$ and so it is a likelihood function. MLE
  - If given data D → it can show which models are more likely than others.
  - Higher values of $p_{\otimes_{i}} x (D ∣ θ)$ are better

Graph View

Backlinks

Bayesian Model Estimation
Inductive Bias

Created with Quartz v4.3.1 © 2025

GitHub