Big-Bench
- Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models
- present and near-future capabilities and limitations of language models
- Beyond the Imitation Game benchmark (BIG-bench)
- benchmark that can measure progress well beyond the current state-of-the-art
- 204 tasks, contributed by 442 authors across 132 institutions
- Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development
- tasks that are believed to be beyond the capabilities of current language models
- valuate the behavior of OpenAI’s GPT models, Google-internal dense Transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters