WebGPT

  • WebGPT: Browser-assisted Question-answering with Human Feedback
  • fine-tuned version of GPT-3 to more accurately answer open-ended questions using a text-based web browser.
  • submits search queries, follows links, and scrolls up and down web pages
  • trained to cite its sources
  • By setting up the task so that it can be performed by humans, they are able to train models on the task using imitation learning
  • models must collect references while browsing in support of their answers
  • ELI5
  • dataset of questions asked by Reddit users
  • fine-tuning GPT-3 using behavior cloning, and then performing Rejection Sampling against a reward model trained to predict human preferences