Benchmark LLM

[[A Benchmark for LLMs on Planning and Reasoning about Change|Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change|[A Benchmark for LLMs on Planning and Reasoning about Change)]]]])))
The recent advances in large language models (LLMs) have transformed the field of natural language processing (NLP)
From GPT3 to PaLM, the state-of-the-art performance on natural language tasks is being pushed forward with every new large language model
current benchmarks are relatively simplistic and the performance over these benchmarks cannot be used as an evidence to support
extensible assessment framework motivated by the above gaps in current benchmarks to test the abilities of LLMs on a central aspect of human intelligence, which is reasoning about actions and change
multiple test cases

Subhaditya's KB