Proxy Objective

  • Easier to change or measure than the actual objective
  • Suppose we have some sample space (such as the set of possible question-answer pairs), some Probability distribution over , a true objective (or “reward”) , proxy objective and we optimize to get a new distribution
  • is how well the true objective is optimized
    • Monte Carlo estimator used
    • If samples from P, simultaneously consider every possible subset of these samples of size nnn, weight each sample by the number of subsets for which it is the best according to the proxy objective, and then take the weighted average true objective where k is the rank of the sample under the proxy objective, from 1 (worst) up to N (best)
    • Can reuse samples of n
  • KL Divergence measures how much optimization is done
    • As long as Continous ,

Refs