Security researchers tricked LLMs into giving them cocaine recipes by abusing role models for prompt injection
AI + ML
If you want a picture of the future of LLM security, imagine Whac-a-Mole meets Groundhog Day
Researchers say that machine learning models cannot reliably distinguish between authorized and unauthorized input, ensuring that prompt injection will continue to present a threat until developers find new ways to have machine learning systems process inputs.
AI models provide responses to user-supplied prompts. The problem is that AI models may receive adversarial prompts – directly from a user or indirectly from an ingested document – that tell the model to take action contrary to its built-in system prompt.
Various techniques mitigate prompt injection, but defenders have not found ways to prevent such attacks.
According to independent researchers Charles Ye and Jasmine Cui, and MIT associate professor Dylan Hadfield-Menell, no one is likely to do so under the current fragile LLM security model.
As they observe in a papertitled "Prompt Injection...
Copyright of this story solely belongs to theregister.com. To see the full text click HERE