GPT3

  • Language Models are Few-Shot Learners
  • shows that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches
  • Autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting
  • without any gradient updates or fine-tuning
  • on-the-fly reasoning or domain adaptation
  • methodological Issues related to training on large web corpora
  • can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans