Leading AI models are more vulnerable to malicious prompts than vendors claim
Dive Brief:
- Major AI developers’ model-safety claims rest on incorrect assumptions about how hackers behave, Cisco researchers said in a report published on Wednesday.
- AI vendors assume that their models are safe from hijacking if they can fend off a single malicious prompt at a time, but hackers are increasingly using multistage prompts to evade model defenses, Cisco said, and most models aren’t prepared for those kinds of attacks.
- The new report illustrates a mostly underappreciated danger lurking inside AI models, one that could expose businesses using these tools to a wide range of disruptions and harm.
Dive Insight:
Cisco’s evaluation of 15 leading AI models from OpenAI, Anthropic, Google, Amazon and xAI “found that single-turn attack success rate (ASR) is not a reliable proxy for what happens when an attacker can adapt across turns,” researchers Nicholas Conley and Amy Chang wrote.
Their tests revealed that AI models were much...
Copyright of this story solely belongs to ciodive.com. To see the full text click HERE