New study shows AI isn’t ready for office work

Mercor’s APEX-Agents benchmark finds top AI models score under 25% accuracy on realistic consulting, legal, and finance tasks. The post New study shows AI isn’t ready for office work appeared first on Digital Trends.

Kass

Jan 25, 2026 - 07:58

0 35

New study shows AI isn’t ready for office work

Your job is safe for now as AI still struggles with real office tasks

Levart_Photographer / Unsplash

It has been nearly two years since Microsoft CEO Satya Nadella predicted that generative AI would take over knowledge work, but if you look around a typical law firm or investment bank today, the human workforce is still very much in charge. Despite all the hype about “reasoning” and “planning,” a new study from training-data company Mercor explains exactly why the robot revolution is stalled: AI just can’t handle the messiness of real work.

A reality check for the “replacement” theory

Mercor released a new benchmark called APEX-Agents, and it is brutal. unlike the usual tests that ask AI to write a poem or solve a math problem, this one uses actual queries from lawyers, consultants, and bankers. It asks the models to do complete, multi-step tasks that require jumping between different types of information.

The results? Even the absolute best models on the market—we are talking about Gemini 3 Flash and GPT-5.2—couldn’t crack a 25% accuracy rate. Gemini led the pack at 24%, with GPT-5.2 right behind it at 23%. Most others were stuck in the teens.

Why AI is failing the “office test”

Mercor CEO Brendan Foody points out that the issue isn’t raw intelligence; it’s context. In the real world, answers aren’t served up on a silver platter. A lawyer has to check a Slack thread, read a PDF policy, look at a spreadsheet, and then synthesize all that to answer a question about GDPR compliance.

Humans do this context-switching naturally. AI, it turns out, is terrible at it. When you force these models to hunt for information across “scattered” sources, they either get confused, give the wrong answer, or just give up entirely.

The “Unreliable Intern”

For anyone worried about their job security, this is a bit of a relief. The study suggests that right now, AI functions less like a seasoned professional and more like an unreliable intern who gets things right about a quarter of the time.

That said, the progress is terrifyingly fast. Foody noted that just a year ago, these models were scoring between 5% and 10%. Now they are hitting 24%. So, while they aren’t ready to take the wheel yet, they are learning to drive much faster than we expected. For now, though, the “knowledge work” revolution is on hold until the bots learn how to multitask p

Moinak Pal

Moinak Pal is has been working in the technology sector covering both consumer centric tech and automotive technology for the…

Google Research suggests AI models like DeepSeek exhibit collective intelligence patterns

AI models are now holding meetings in their own heads

Phone running Deepseek on a laptop keyboard.

It turns out that when the smartest AI models "think," they might actually be hosting a heated internal debate. A fascinating new study co-authored by researchers at Google has thrown a wrench into how we traditionally understand artificial intelligence. It suggests that advanced reasoning models - specifically DeepSeek-R1 and Alibaba’s QwQ-32B - aren't just crunching numbers in a straight, logical line. Instead, they appear to be behaving surprisingly like a group of humans trying to solve a puzzle together.

The paper, published on arXiv with the evocative title Reasoning Models Generate Societies of Thought, posits that these models don't merely compute; they implicitly simulate a "multi-agent" interaction. Imagine a boardroom full of experts tossing ideas around, challenging each other's assumptions, and looking at a problem from different angles before finally agreeing on the best answer. That is essentially what is happening inside the code. The researchers found that these models exhibit "perspective diversity," meaning they generate conflicting viewpoints and work to resolve them internally, much like a team of colleagues debating a strategy to find the best path forward.

Microsoft tells you to uninstall the latest Windows 11 update

Microsoft says uninstall the January 2026 security update after POP email bugs and system issues surface.

Windows 11

Microsoft has issued an unusual public advisory telling users to uninstall the Windows 11 January 2026 security update (KB5074109) after widespread reports that it is causing serious system and application issues. The update, which began rolling out automatically on January 13 and advances affected systems to OS Build 26200.7623 or similar releases, has been linked to problems including Outlook Classic freezing, black screens, and app crashes.

https://twitter.com/hapico0109/status/2013480169840001437?s=20

You could see faster AMD Ryzen AI Max chips soon

New leaks suggest Ryzen AI Max 400 “Gorgon Halo” could land with slicker performance.

AMD Ryzen AI Max Chipset

AMD appears to be working on a refreshed version of its Ryzen AI MAX 400 family, codenamed "Gorgon Halo". According to recent leaks by VideoCardz, this next-gen refresh targets faster performance for Ryzen-powered machines, especially those focused on AI workloads and integrated graphics.

The rumored Gorgon Halo series would essentially be a clock-bumped iteration of the current Strix Halo-branded processors, with the same core counts but higher boost speeds on both the CPU and Radeon iGPU sides. Additionally, it'll also add support for faster LPDDR5X-8533 memory to further improve responsiveness and performance under AI-heavy workloads.