Subscribe Now

Edit Template

Subscribe Now

Edit Template

Apple Researchers Unveil Limitations of Large Language Models In Mathematical Reasoning | Technology News

AIArt
-October 14, 2024
- No Comments

New Delhi: A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics. They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question.

Literature suggests that the reasoning process in LLMs is probabilistic pattern-matching rather than formal reasoning. Although LLMs can match more abstract reasoning patterns, they fall short of true logical reasoning. Small changes in input tokens can drastically alter model outputs, indicating a strong token bias and suggesting that these models are highly sensitive and fragile.

“Additionally, in tasks requiring the correct selection of multiple tokens, the probability of arriving at an accurate answer decreases exponentially with the number of tokens or steps involved, underscoring their inherent unreliability in complex reasoning scenarios,” said Apple researchers in their paper titled “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.”

The ‘GSM8K’ benchmark is widely used to assess the mathematical reasoning of models on grade-school level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics.

To address these concerns, the researchers conducted a large-scale study on several state-of-the-art open and closed models. “To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions,” the authors wrote.

GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models.

“Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question,” said researchers, adding that overall, “our work provides a more nuanced understanding of LLMs’ capabilities and limitations in mathematical reasoning”.

thecrossroadtimes.com

Writer & Blogger

Considered an invitation do introduced sufficient understood instrument it. Of decisively friendship in as collecting at. No affixed be husband ye females brother garrets proceed. Least child who seven happy yet balls young. Discovery sweetness principle discourse shameless bed one excellent. Sentiments of surrounded friendship dispatched connection is he.

About Me

Kapil Kumar

Founder & Editor

As a passionate explorer of the intersection between technology, art, and the natural world, I’ve embarked on a journey to unravel the fascinating connections that weave our world together. In my digital haven, you’ll find a blend of insights into cutting-edge technology, the mesmerizing realms of artificial intelligence, the expressive beauty of art.

Build Semantic Search with LLM Embeddings

By thecrossroadtimes.com

~March 2, 2026

Samsung Galaxy S26 Ultra Price in India: 200MP camera, Snapdragon 8 Elite Gen 5, AI features tipped ahead of February…

By thecrossroadtimes.com

~February 23, 2026

Google Pixel 10a Vs Vivo V70 Elite: Display, Battery, Camera, Chipset, Price and other specs compared; Which phone should you…

By thecrossroadtimes.com

~February 23, 2026

Instagram

Follow on Instagram

Edit Template

As a passionate explorer of the intersection between technology, art, and the natural world, I’ve embarked on a journey to unravel the fascinating connections.

Quick Links

Home

Features

Terms & Conditions

Privacy Policy

Contact

From Microsoft, Meta to Nvidia: Global tech giants to invest $700 billion dollar in AI as India rises as a…

By thecrossroadtimes.com

~February 22, 2026

Vivo V70 Elite Vs OnePlus 15R Vs Oppo Reno 15: Battery, Performance, Camera and Price in India explained; Which one…

By thecrossroadtimes.com

~February 22, 2026

Contact Us

As a passionate explorer of the intersection between technology, art, and the natural world, I’ve embarked on a journey to unravel the fascinating connections.

Quick Links

Home

Features

Terms & Conditions

Privacy Policy

Contact

From Microsoft, Meta to Nvidia: Global tech giants to invest $700 billion dollar in AI as India rises as a…

By thecrossroadtimes.com

~February 22, 2026

Vivo V70 Elite Vs OnePlus 15R Vs Oppo Reno 15: Battery, Performance, Camera and Price in India explained; Which one…

By thecrossroadtimes.com

~February 22, 2026

Subscribe Now

Subscribe Now

Apple Researchers Unveil Limitations of Large Language Models In Mathematical Reasoning | Technology News

thecrossroadtimes.com

Writer & Blogger

Leave a Reply Cancel reply

About Me

Kapil Kumar

Founder & Editor

Popular Articles

Build Semantic Search with LLM Embeddings

Samsung Galaxy S26 Ultra Price in India: 200MP camera, Snapdragon 8 Elite Gen 5, AI features tipped ahead of February…

Google Pixel 10a Vs Vivo V70 Elite: Display, Battery, Camera, Chipset, Price and other specs compared; Which phone should you…

Instagram

Quick Links

Home

Features

Terms & Conditions

Privacy Policy

Contact

Recent Posts

From Microsoft, Meta to Nvidia: Global tech giants to invest $700 billion dollar in AI as India rises as a…

Vivo V70 Elite Vs OnePlus 15R Vs Oppo Reno 15: Battery, Performance, Camera and Price in India explained; Which one…

Contact Us

Quick Links

Home

Features

Terms & Conditions

Privacy Policy

Contact

Recent Posts

From Microsoft, Meta to Nvidia: Global tech giants to invest $700 billion dollar in AI as India rises as a…

Vivo V70 Elite Vs OnePlus 15R Vs Oppo Reno 15: Battery, Performance, Camera and Price in India explained; Which one…

Contact Us

Fill Your Contact Details

Fill out this form, and we’ll reach out to you through WhatsApp for further communication.