Microsoft researchers find AI models and agents can't handle long-running tasks

https://image.theregister.com/5238321.jpg?imageId=5238321&x=0&y=0&cropw=100&croph=100&panox=0&panoy=0&panow=100&panoh=100&width=1200&height=683

Companies exploring automated workflows would be well advised to keep their AI agents on a short leash.

Microsoft researchers have found that even the priciest frontier models introduce errors in long workflows, the very thing for which AI software has been pitched.

Anthropic, for example, says, "Claude Cowork handles tasks autonomously. Give it a goal and Claude works on your computer, local files, and applications to return a finished deliverable."

Redmond promotes similar usage, touting Microsoft 365 Copilot's ability to "Tackle complex, multistep research across your work data and the web."

The Windows maker's scientists aren't so sure about that.

Philippe Laban, Tobias Schnabel, and Jennifer Neville from Microsoft Research set out to study what happens when large language models (LLMs) are asked to complete multistep tasks.

They recently published their findings in a preprint paper with a spoiler title: "LLMs Corrupt Your Documents When You Delegate."

To test how...

Copyright of this story solely belongs to theregister.com. To see the full text click HERE

Read more

https://images.ctfassets.net/jdtwqhzvc2n1/2ooDwZZRkOXFbljlU3UX8T/512fe2501fcc0281a48bc484e7794a7f/ChatGPT_Image_May_20__2026__03_40_32_PM.png?w=800&q=75

Cohere releases Command A+, a sparse MoE open model built for agentic tasks, with 218B total and 25B active parameters, its first under the Apache 2.0 license

Sponsor Posts Niantic Spatial: World models need real-world data — Scaniverse is the gateway to spatial services — self-serve and built for AI and robotics. Large-area 3D reconstruction from 360° cameras and precise localization, anywhere machines operate. App Spotlight: Quo for Zoho CRM — App Spotlight brings you hand-picked solutions that enhance your