Apple pours water on unreasonable LLM hype

The large language models that underpin generative AI aren’t all they’re cracked up to be, according to new research from US gadget giant Apple.

Scott Bicheno

October 14, 2024

3 Min Read

The academic paper published by a team of Apple AI boffins, apparently led by ex-DeepMind researcher Mehrdad Farajtabar, is titled GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. They used variations of GSM8K, which consists of thousands of school-level exam-style maths problems, to put a bunch of LLMs through their paces.

“Ultimately, our work underscores significant limitations in the ability of LLMs to perform genuine mathematical reasoning,” concludes the report. “The high variance in LLM performance on different versions of the same question, their substantial drop in performance with a minor increase in difficulty, and their sensitivity to inconsequential information indicate that their reasoning is fragile. It may resemble sophisticated pattern matching more than true logical reasoning.

“We believe further research is essential to develop AI models capable of formal reasoning, moving beyond pattern recognition to achieve more robust and generalizable problem-solving skills. This remains a critical challenge for the field as we strive to create systems with human-like cognitive abilities or general intelligence.”

In other words, AI is still pretty far from ‘thinking’, however we may define that term. Instead, the current generation of LLMs and the chatbots that rely on them are instead acting more like search engines on steroids. This isn’t a massive surprise but it nonetheless useful research for anyone looking to rationalise the current AI hype cycle.

“Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers?” asks Farajtabar in his X thread introducing the study. “Overall, we found no evidence of formal reasoning in language models including  open-source models like #Llama, #Phi, #Gemma, and #Mistral and leading closed models, including the recent  #OpenAI #GPT-4o and #o1-series,” he concludes.  

“Their behavior is better explained by sophisticated pattern matching—so fragile, in fact, that changing names can alter results by ~10%!  We can scale data, parameters, and compute—or use better training data for Phi-4, Llama-4, GPT-5. But we believe this will result in 'better pattern-matchers,' not necessarily 'better reasoners.”

“If the machines can’t reason, then the promises upon which multi-billion-dollar valuations are built will fall to pieces, setting the AI industry up for a painful correction,” writes Radio Free Mobile, which has long been sceptical of AI hype. “Much like the Internet before it, I think that AI has a long and bright future, but current expectations are way beyond what is possible now meaning that a painful reset is required as reality reasserts itself.”

Meanwhile Gary Marcus, self-described as ‘AI’s leading critic’, has been doing victory laps on X. “LLMs are cooked,” he tweeted. “So many investors are going to lose sooo much money. AI will survive, and even thrive, but a new paradigm is needed.” OpenAI boss Sam Altman, arguably AI’s most prominent cheerleader, doesn’t seem to have publicly commented on the report. Neither has OpenAI antagonist Elon Musk, but he does seem to have been exceptionally busy recently, even by his standards.

Apple was conspicuously absent from the latest OpenAI funding round, perhaps put-off by the absurd condition that investors be monogamous thereafter. Or maybe it’s because Apple has no interest playing excessively speculative investment games.

It seems increasingly likely that AI is currently a speculative bubble akin to the internet at the turn of the millennium. It will presumably burst at some stage and the publication of studies such as this probably bring us closer to that day of reckoning.

About the Author

Scott Bicheno

As the Editorial Director of Telecoms.com, Scott oversees all editorial activity on the site and also manages the Telecoms.com Intelligence arm, which focuses on analysis and bespoke content.
Scott has been covering the mobile phone and broader technology industries for over ten years. Prior to Telecoms.com Scott was the primary smartphone specialist at industry analyst Strategy Analytics’. Before that Scott was a technology journalist, covering the PC and telecoms sectors from a business perspective.
Follow him @scottbicheno

Subscribe and receive the latest news from the industry.
Join 56,000+ members. Yes it's completely free.

You May Also Like