shows that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches
Autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting
without any gradient updates or fine-tuning
on-the-fly reasoning or domain adaptation
methodological Issues related to training on large web corpora
can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans