Proxy Objective

Easier to change or measure than the actual objective
Suppose we have some sample space $S$ (such as the set of possible question-answer pairs), some Probability distribution $P$ over $S$ , a true objective (or “reward”) $R_{t r u e} : S \to R$ , proxy objective $R_{p ro x y} : S \to R$ and we optimize $R_{p ro x y}$ to get a new distribution $P^{'}$
$E_{x^{'} \sim P^{'}} [Rt r u e (x')]$ is how well the true objective is optimized
- Monte Carlo estimator used
- If $N \geq n$ samples from P, simultaneously consider every possible subset of these samples of size nnn, weight each sample by the number of subsets for which it is the best according to the proxy objective, and then take the weighted average true objective $(n - 1 k - 1)$ where k is the rank of the sample under the proxy objective, from 1 (worst) up to N (best)
- Can reuse samples of n
KL Divergence $P^{'} ∣∣ P$ measures how much optimization is done
- As long as Continous , $n - \frac{n - 1}{n}$

Refs