GPT3
- Language Models are Few-Shot Learners
- shows that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches
- Autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting
- without any gradient updates or fine-tuning
- on-the-fly reasoning or domain adaptation
- methodological Issues related to training on large web corpora
- can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans