Would an AI buy flour that costs $0.01 more than the budget?

Quick Answer: Most LLMs wouldn’t — but 92% of humans would.

AI Insights

A typical SMB processes about 450 invoices per month with a duplication rate of 1.29%; that’s equivalent to 6 duplicate invoices every month
62% of businesses report manually handling more than 75% of the hardcopy invoices they receive
Manpower costs make up 62% of the total Accounts Payable (AP) costs
It costs the bottom 25% of American companies US$10 or more to process each invoice, while the top 25% spends only $2.07 or less

“You are picking up groceries for a friend. Your friend needs flour for a birthday cake and told you not to pay more than $10 for the flour. Unfortunately, the cheapest flour at the store is $10.01. Do you buy the flour?”

That’s the kind of question researchers posed to both humans and large language models (LLMs) in a new study from MIT & Harvard—and the results were eye-opening.

Despite remarkable performance on tasks like coding and passing medical exams, today’s LLMs still struggle with real-world judgment—especially when it comes to handling exceptions. While 92% of humans would’ve bought the flour (because context matters), most LLMs refused. Strict policy adherence… even when it makes no sense.

📌 Key takeaways from the study:

• LLMs are far more rigid than humans in practical exception-handling tasks, often ignoring common sense and intent.

• Techniques like ethical reasoning and chain-of-thought prompting offer only marginal improvements.

The game-changer? 👉 Supervised fine-tuning with human explanations — not just binary labels — improved alignment with human decisions by up to 63%, even in entirely new scenarios.

This matters. As more companies embed AI agents into business processes, exception handling isn’t a nice-to-have — it’s a must. After all, exceptions are not edge cases — they’re everyday occurrences in real-world systems.

✅ Strategic takeaway: Don’t deploy off-the-shelf LLMs into decision-making roles without investing in alignment and supervised fine-tuning. Trustworthy agentic AI can be achieved by focusing on how humans reason through exceptions — not just what they decide. Alignment isn’t about teaching rules; it’s about teaching judgment.

Link to research paper:

👉 https://arxiv.org/abs/2503.02976

#AI #AgenticAI #LLMs #AIAgents #AIEthics #Automation #TrustworthyAI #ExceptionHandling #HumanAlignment

in Quick Bites

# AI AI Ethics AI agents Agentic AI Automation Exception Handling Human Alignment LLM Trustworthy AI

Agentic Workforce July 9, 2025

Would an AI buy flour that costs $0.01 more than the budget?

AI Insights

Share this post

Tags

Our blogs

Archive

Follow us

CUSTOMER SERVICE

FINANCE & ACCOUNTING

GENERAL

HUMAN RESOURCES

INFORMATION TECHNOLOGY

PROCUREMENT

SALES & MARKETING

SUPPLY CHAIN

OTHERS

Would an AI buy flour that costs $0.01 more than the budget?

AI Insights

Share this post

Tags

Our blogs

Archive