present and near-future capabilities and limitations of language models
Beyond the Imitation Game benchmark (BIG-bench)
benchmark that can measure progress well beyond the current state-of-the-art
204 tasks, contributed by 442 authors across 132 institutions
Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development
tasks that are believed to be beyond the capabilities of current language models
valuate the behavior of OpenAI’s GPT models, Google-internal dense Transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters