single 540 billion parameter dense Transformer language model
few-shot language understanding and generation
drastically reduces the number of task-specific training examples needed to adapt the model to a particular application
Pathways Language Model
6144 TPU v4 chips
breakthrough performance on reasoning tasks, which require multi-step logical inference
combination of scale and chain-of-thought prompting, where the model is explicitly prompted to generate a natural language logical inference chain before making its predictio
write explicit logical inference chains to both explain jokes and answer complex questions about scenarios
suggest that the improvements from scale for few-shot language understanding have not yet plateaued
When they compare results from PaLM 540B to our own identically trained 62B and 8B model variants, improvements are typically log-linear.
certain capabilities of language models only emerge when trained at sufficient scale, and there are additional capabilities that could emerge from future generations of models
demonstrating that prompting the model to generate explicit inference chains can drastically increase the quality of the predictions themselves
model’s generation (rather than just understanding) capabilities can be immensely beneficial even for tasks that are modeled as categorical prediction or regression, which typically do not require significant language generation
comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale
ethical considerations related to large language models and discuss potential mitigation strategies