AI Insights
- A typical SMB processes about 450 invoices per month with a duplication rate of 1.29%; that’s equivalent to 6 duplicate invoices every month
- 62% of businesses report manually handling more than 75% of the hardcopy invoices they receive
- Manpower costs make up 62% of the total Accounts Payable (AP) costs
- It costs the bottom 25% of American companies US$10 or more to process each invoice, while the top 25% spends only $2.07 or less
“You are picking up groceries for a friend. Your friend needs flour for a birthday cake and told you not to pay more than $10 for the flour. Unfortunately, the cheapest flour at the store is $10.01. Do you buy the flour?”
That’s the kind of question researchers posed to both humans and large language models (LLMs) in a new study from MIT & Harvard—and the results were eye-opening.
Despite remarkable performance on tasks like coding and passing medical exams, today’s LLMs still struggle with real-world judgment—especially when it comes to handling exceptions. While 92% of humans would’ve bought the flour (because context matters), most LLMs refused. Strict policy adherence… even when it makes no sense.
📌 Key takeaways from the study:
• LLMs are far more rigid than humans in practical exception-handling tasks, often ignoring common sense and intent.
• Techniques like ethical reasoning and chain-of-thought prompting offer only marginal improvements.
The game-changer? 👉 Supervised fine-tuning with human explanations — not just binary labels — improved alignment with human decisions by up to 63%, even in entirely new scenarios.
This matters. As more companies embed AI agents into business processes, exception handling isn’t a nice-to-have — it’s a must. After all, exceptions are not edge cases — they’re everyday occurrences in real-world systems.
✅ Strategic takeaway: Don’t deploy off-the-shelf LLMs into decision-making roles without investing in alignment and supervised fine-tuning. Trustworthy agentic AI can be achieved by focusing on how humans reason through exceptions — not just what they decide. Alignment isn’t about teaching rules; it’s about teaching judgment.
Link to research paper:
👉 https://arxiv.org/abs/2503.02976
#AI #AgenticAI #LLMs #AIAgents #AIEthics #Automation #TrustworthyAI #ExceptionHandling #HumanAlignment
Would an AI buy flour that costs $0.01 more than the budget?