Anthropic flags AI's potential to 'automate sophisticated destructive cyber attacks'

Just_Super/Getty Images

Anthropic, maker of the Claude family of large language models, this week updated its policy for safety controls over its software to reflect what it says is the potential for malicious actors to exploit the AI models to automate cyber attacks.

The PDF document, detailing the company's "responsible scaling policy," outlines several procedural changes that it says are needed to monitor the ongoing risks of misuse of AI models. That includes several levels of escalating risk, known as AI Safety Level Standards (ASL) defined as "technical and operational safeguards."

Also: Gmail users, beware of new AI scam that looks very authentic

As part of the company's "routine testing" of AI models for safety -- known as a "capability assessment" -- Anthropic reports that it has uncovered a capability that "requires significant investigation and may require stronger safeguards."

That capability is described as a threat within cyber operations ...

Copyright of this story solely belongs to zdnet.com . To see the full text click HERE

Share: