PaLM

  • PaLM: Scaling Language Modeling with Pathways
  • single 540 billion parameter dense Transformer language model
  • few-shot language understanding and generation
  • drastically reduces the number of task-specific training examples needed to adapt the model to a particular application
  • Pathways Language Model
  • 6144 TPU v4 chips
  • breakthrough performance on reasoning tasks, which require multi-step logical inference
  • combination of scale and chain-of-thought prompting, where the model is explicitly prompted to generate a natural language logical inference chain before making its predictio
  • write explicit logical inference chains to both explain jokes and answer complex questions about scenarios
  • Big-Bench
  • suggest that the improvements from scale for few-shot language understanding have not yet plateaued
  • When they compare results from PaLM 540B to our own identically trained 62B and 8B model variants, improvements are typically log-linear.
  • certain capabilities of language models only emerge when trained at sufficient scale, and there are additional capabilities that could emerge from future generations of models
  • demonstrating that prompting the model to generate explicit inference chains can drastically increase the quality of the predictions themselves
  • model’s generation (rather than just understanding) capabilities can be immensely beneficial even for tasks that are modeled as categorical prediction or regression, which typically do not require significant language generation
  • comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale
  • ethical considerations related to large language models and discuss potential mitigation strategies