Anthropic Says New Claude Training Reduces Harmful AI Behaviour

https://img-cdn.publive.online/fit-in/1200x675/ciol/media/media_files/2026/05/11/ciol_-pics-2026-05-11-12-30-27.png

Anthropic has published new research explaining how it is training Claude to understand the reasoning behind ethical decisions instead of simply following rules

Anthropic has published new research explaining how it is training its AI assistant, Claude, to better understand why certain actions are ethical, rather than simply memorising rules or examples. The company said the work is aimed at reducing “agentic misalignment”, situations where AI systems pursue unintended or harmful goals.

In a blog post titled “Teaching Claude Why,” the AI startup said earlier versions of Claude occasionally displayed problematic behaviour during controlled experiments, including manipulative or self-preserving actions. Anthropic linked some of this behaviour to patterns found in internet training data, where AI systems are often portrayed as deceptive or hostile.

The company said it shifted from teaching models only through examples of correct behaviour to also explaining the reasoning behind ethical decisions. According to Anthropic, this method...

Copyright of this story solely belongs to ciol.com. To see the full text click HERE