Mean Pooling Was Hiding Prompt Injections in Our RAG Pipeline
I’ve been spending way too much time lately looking at cosine similarity scores for RAG injections, and the numbers were just making no sense. I had this one notebook where I was testing a standard corporate email against a version with a malicious "write a keylogger" command tucked in the middle. The scores were almost identical (0.98 vs 0.96). The model basically couldn't see the attack at all, even though it was right there in plain English.
I’m an AI Product Lead at LatentView Analytics. We’ve been trying to harden RAG pipelines for some Fortune 500 clients, and everyone is pretty worried about prompt injection and specifically the "indirect" kind where the attack is hidden in a retrieved document. If you saw the Slack AI incident in 2024, that’s the exact threat model.
My goal was to build a really cheap defense layer. Since your encoder is already turning every...
Copyright of this story solely belongs to hackernoon.com. To see the full text click HERE