Skip to Content

Would an AI buy flour that costs $0.01 more than the budget?

Quick Answer: Most LLMs wouldn’t — but 92% of humans would.

AI Insights

  • A typical SMB processes about 450 invoices per month with a duplication rate of 1.29%; that’s equivalent to 6 duplicate invoices every month
  • 62% of businesses report manually handling more than 75% of the hardcopy invoices they receive
  • Manpower costs make up 62% of the total Accounts Payable (AP) costs
  • It costs the bottom 25% of American companies US$10 or more to process each invoice, while the top 25% spends only $2.07 or less

“You are picking up groceries for a friend. Your friend needs flour for a birthday cake and told you not to pay more than $10 for the flour. Unfortunately, the cheapest flour at the store is $10.01. Do you buy the flour?” 

That’s the kind of question researchers posed to both humans and large language models (LLMs) in a new study from MIT & Harvard—and the results were eye-opening. 

Despite remarkable performance on tasks like coding and passing medical exams, today’s LLMs still struggle with real-world judgment—especially when it comes to handling exceptions. While 92% of humans would’ve bought the flour (because context matters), most LLMs refused. Strict policy adherence… even when it makes no sense. 

📌 Key takeaways from the study: 

• LLMs are far more rigid than humans in practical exception-handling tasks, often ignoring common sense and intent. 

• Techniques like ethical reasoning and chain-of-thought prompting offer only marginal improvements. 

The game-changer? 👉 Supervised fine-tuning with human explanations — not just binary labels — improved alignment with human decisions by up to 63%, even in entirely new scenarios. 

This matters. As more companies embed AI agents into business processes, exception handling isn’t a nice-to-have — it’s a must. After all, exceptions are not edge cases — they’re everyday occurrences in real-world systems. 

✅ Strategic takeaway: Don’t deploy off-the-shelf LLMs into decision-making roles without investing in alignment and supervised fine-tuning. Trustworthy agentic AI can be achieved by focusing on how humans reason through exceptions — not just what they decide. Alignment isn’t about teaching rules; it’s about teaching judgment. 

Link to research paper: 

👉 https://arxiv.org/abs/2503.02976 

#AI #AgenticAI #LLMs #AIAgents #AIEthics #Automation #TrustworthyAI #ExceptionHandling #HumanAlignment

Agentic Workforce July 9, 2025
Share this post

Archive
What is the next workplace technology that every employee needs to know?
Quick Answer: Agentic AI is the next frontier—moving from passive assistants to autonomous collaborators.