Big-Bench

  • Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models
  • present and near-future capabilities and limitations of language models
  • Beyond the Imitation Game benchmark (BIG-bench)
  • benchmark that can measure progress well beyond the current state-of-the-art
  • 204 tasks, contributed by 442 authors across 132 institutions
  • Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development
  • tasks that are believed to be beyond the capabilities of current language models
  • valuate the behavior of OpenAI’s GPT models, Google-internal dense Transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters