How reliable are AI agents at completing real-world tasks today?

Not very. Even the most capable AI agents complete less than one-third of professional tasks autonomously.

AI Insights

State-of-the-art does not equate to job readiness: Gemini 2.5 Pro, the top performer, succeeded on just 30.3% of workplace tasks in a controlled benchmark.
Failure on “simple” tasks: Agents struggled more with admin and finance workflows than with coding tasks, due to poor social reasoning and UI handling.
Progress comes at a price: High-performing agents took 27+ steps per task and cost over $4 each—raising questions about real-world scalability.

We talk a lot about what AI agents could do—but what can they actually do today?

TheAgentCompany, a newly released benchmark from CMU, evaluates AI agents on 175 real-world tasks inside a simulated software company—everything from sprint planning to resume screening and tax form completion.

The findings? We’re still far from full autonomy. Even top-tier models like Gemini 2.5 Pro could only complete 30% of tasks without human help. Worse, they fumbled tasks that seem trivial to humans—like finding information in shared drives or messaging a colleague for clarification—precisely the type of mundane, menial work that most employees would love to outsource.

It turns out the hard part isn't “thinking”—it’s navigating messy tools, collaborating effectively, and staying on track. And a gentle reminder that while there is a lot of hype over agentic AI at the moment, tried-and-tested technologies like Robotic Process Automation (RPA) still has its place in any enterprise software stack, especially if you value automation that is reliable and repeatable.

If you're building or buying into AI agents, this benchmark is a must-read. It’s the clearest picture yet of what’s real, what’s not, and where we go next.

📍 Full benchmark: https://the-agent-company.com

🧠 Code & tasks: https://github.com/TheAgentCompany

#AIagents #AgenticAI #AgenticAutomation #EnterpriseAI #DigitalWorkforce #LLMbenchmarks #AIstrategy #Automation #FutureOfWork #GenerativeAI

in Quick Bites

# AI agents Agentic AI reliability

Agentic Workforce July 30, 2025

How reliable are AI agents at completing real-world tasks today?

AI Insights

Share this post

Tags

Our blogs

Archive

Follow us

CUSTOMER SERVICE

FINANCE & ACCOUNTING

GENERAL

HUMAN RESOURCES

INFORMATION TECHNOLOGY

PROCUREMENT

SALES & MARKETING

SUPPLY CHAIN

OTHERS

How reliable are AI agents at completing real-world tasks today?

AI Insights

Share this post

Tags

Our blogs

Archive