Jailbreaking Attacks #143
Replies: 3 comments 1 reply
-
Few things have to happen:
This all has to happen to make it even a possibility to jail break it in my opinion. |
Beta Was this translation helpful? Give feedback.
-
I might take back my answer or rather say that I cant confidently say that anymore:
https://arxiv.org/abs/2311.07590 |
Beta Was this translation helpful? Give feedback.
-
Looks like I underestimated how bad this can be. (they even called it Agent Smith!) |
Beta Was this translation helpful? Give feedback.
-
Really love the work I see getting done and think this is the sort of thing I've been waiting to see developed!
A security question.
Since currently no LLM is secure from jailbreaks, if any of the agents communicate to people outside of the swarm, wouldn't that make the whole network vulnerable to attack? Say if someone jailbroke an agent, and gives it the specific instruction to use the same jailbreaking prompt on all agents it communicates with, and then all agents that agent communicates with get compromised, and they then issue the same attack to all agents they communicate with, and so on. Call it the Agent Smith scenario, where a compromised agent starts to threaten the whole matrix.
Beta Was this translation helpful? Give feedback.
All reactions