Apple Questions AI’s “Reason”-Ability

Brain in a Machine

Dr. Zunaid Kazi
CEO, Knowtomation

Apple recently published a study titled “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models” (Mirzadeh et al., 2024). In it, they discuss the inherent flaws in how today’s “remarkable” AI models handle logical reasoning. They show that even small changes in phrasing a question can send these models completely off track. This shows how much they depend on simple pattern matching instead of real understanding. The full paper is GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.

But, of course, this paper has sparked much debate amongst the cognoscenti. Many tongues are wagging, and for good reason. Apple’s research confirms that many of us who’ve been around since the days of symbolic computing have long believed these sophisticated language models are not really thinking. They’re just word prediction engines on steroids — flashy and impressive, sure, but they can ‘think’ about it as well as Searle’s Chinese Room: there is lots of show but not much actual understanding. Sorry, Altman.

This brings me back to my early AI career when symbolic reasoning was still the gold (and perhaps the only) standard. Back then, intelligence was all about structured logic, even if it was as rigid as my old professor’s views on grading. In my PhD work in the late 1990s (yes, I’m dating myself here), I built a knowledge-driven model that allowed me to interact with a robot through voice and gestures. It wasn’t just mimicking responses but making decisions based on structured knowledge. The contrast with today’s language models, which are mostly about pattern recognition rather than true understanding, is stark.

Zunaid working with robot

With that contextual prehistory, it might seem like a natural next step to think about neurosymbolic AI, combining the structured, logical power of symbolic AI with the pattern-finding ability of neural networks. The paper doesn’t explicitly suggest this, but it’s the direction many researchers, including myself, believe we need to move if we really want true reasoning in machines. The goal is to blend the best of both worlds: taking the logical rigor of old-school AI and mixing it with the deep learning capabilities of modern neural networks. Think of it like having a chatbot that doesn’t just parrot back a fact because it reads it online but actually understands enough to tell you, ‘Hey, it’s raining. Why don’t you grab an umbrella and reschedule that hike?

I’ve also always had a soft spot for knowledge graphs too (hey, I was building these in the ’90s before they were cool!). As I mentioned earlier, my PhD work relied on a knowledge-driven model that brought explicit contextual awareness. It’s like having a virtual Rolodex (dating myself again) of everything the system knows.

The great thing about knowledge graphs is that they’re not just flashy; they bring context and relationships to the table. They don’t just spit out responses because they’ve seen something similar; they can connect the dots, which is precisely why they’re a strong contender for advancing machine reasoning. Imagine your AI assistant not only pulling up a fact but also understanding the web of connections around it. That’s the power knowledge graphs have, and it’s something that pattern-matching neural networks often struggle with.

And so here we are again. Every few years (more like just months these days), a shiny new direction promises to lead us to the promised land of true AI. Remember when AI winter ended, and suddenly, everything was supposed to be intelligent again? Are we still debating whether our chatbots can think or are just really good at mimicking?

Are we any closer to genuine AI reasoning, or do we still have a long road ahead?