Skip to Content

How reliable are AI agents at completing real-world tasks today?

Not very. Even the most capable AI agents complete less than one-third of professional tasks autonomously.

AI Insights

  • State-of-the-art does not equate to job readiness: Gemini 2.5 Pro, the top performer, succeeded on just 30.3% of workplace tasks in a controlled benchmark. 
  • Failure on “simple” tasks: Agents struggled more with admin and finance workflows than with coding tasks, due to poor social reasoning and UI handling
  • Progress comes at a price: High-performing agents took 27+ steps per task and cost over $4 each—raising questions about real-world scalability.

We talk a lot about what AI agents could do—but what can they actually do today? 

TheAgentCompany, a newly released benchmark from CMU, evaluates AI agents on 175 real-world tasks inside a simulated software company—everything from sprint planning to resume screening and tax form completion. 

The findings? We’re still far from full autonomy. Even top-tier models like Gemini 2.5 Pro could only complete 30% of tasks without human help. Worse, they fumbled tasks that seem trivial to humans—like finding information in shared drives or messaging a colleague for clarification—precisely the type of mundane, menial work that most employees would love to outsource. 

It turns out the hard part isn't “thinking”—it’s navigating messy tools, collaborating effectively, and staying on track. And a gentle reminder that while there is a lot of hype over agentic AI at the moment, tried-and-tested technologies like Robotic Process Automation (RPA) still has its place in any enterprise software stack, especially if you value automation that is reliable and repeatable

If you're building or buying into AI agents, this benchmark is a must-read. It’s the clearest picture yet of what’s real, what’s not, and where we go next. 

📍 Full benchmark: https://the-agent-company.com 

🧠 Code & tasks: https://github.com/TheAgentCompany 

#AIagents #AgenticAI #AgenticAutomation #EnterpriseAI #DigitalWorkforce #LLMbenchmarks #AIstrategy #Automation #FutureOfWork #GenerativeAI

Agentic Workforce July 30, 2025
Share this post

Archive
Are AI tools actually making us more productive at work?
Generative AI is tripling task productivity for some—but not everyone is benefiting equally.