AI’s understanding and reasoning skills can’t be assessed by current tests

Claims that large language models have humanlike cognitive abilities prove tough to verify

An illustration of a humanoid presenting manifestation of artificial intelligence taking tests while surrounded by "A+"s and "100/100"s on graded tests. Behind it are two researchers in a classroom watching the AI take its tests.

AI performs well on cognitive benchmarks, but how do we know if it really understands?

Glenn Harvey

“Sparks of artificial general intelligence,” “near-human levels of comprehension,” “top-tier reasoning capacities.” All of these phrases have been used to describe large language models, which drive generative AI chatbots like ChatGPT. Since that bot arrived on the scene in late 2022, it seems as if every new generative AI is the next, best iteration — not just producing humanlike content but also approaching near-human cognition (SN: 12/11/23).