-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow rewriting of user messages from input guardrails #1083
base: main
Are you sure you want to change the base?
Conversation
Status for workflow
|
If you want to solve the prompt injection issue with this, I think you will need this feature. But maybe you envision to fix it in another way? Typically, for our experiments for Devoxx, we had a |
@gsmet Yes I did something similar as well, but I was thinking that we could do something like: @UserMessage("blablablabla {param1} and more blablabla2 {param2}")
String chat(@V("param1") @Guard String param1, @V("param2") String param2); So only So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with |
Yeah, that makes perfect sense |
I'm not entirely sure on how this is supposed to work. Should the guardrail take in both the materialized output and the single annotated params? If so what you do in this case if you change the value of a param? Do you perform the materialization again? At this point, in my opinion, it would be much clearer if we had a third form of guardrails, let's call them In this case the workflow that I envision is the following:
What do you think? |
Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual. |
Ok, if so I suggest to review and eventually merge this pull request, and then we could introduce the guardrails for params with a second pull request. |
Any news or comments on this? Is this a feature that we want? Or maybe we should only develop input guardrails working on user params as suggested by @lordofthejars ? /cc @cescoffier @geoand |
Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with |
Good question and indeed at the beginning I was also struggling to find a valid use case for input rewriting and I didn't implement this. Subsequently I discussed this with @lordofthejars and he pointed out that there could be situations where you may want to rewrite the user input for instance for anonymization purposes: e.g. you rewrite the user prompt by replacing people/companies/products names with placeholders in order to avoid leaking sensitive information to the LLM service provider. |
I agree that rewriting input can have side effects, but I think that first, as the developer of the app, you know how you are rewriting, so maybe it is not that critical. Of course, you can always implement this change before invoking the LLM,but the good thing about guards is that you can have all of them in a place, and not having logic spread across different places |
Thanks, minimizing the risk of the sensitive input content being leaked into LLM is a good use case. I'd probably consider reporting a message to the user instead, |
Yeah also removing weird chars is an option, that yes it can be done as you mentioned as a sanitize method, but if we can put everything in one place, why not. But of course, there are workarounds for sure to not having to rewrite the prompt. |
The way I see it, rewriting the user query is fine. |
This pull request introduces the possibility of rewriting the user messages from the input guardrails. At the moment it is only possible to read and rewrite the complete materialized user message immediately before submitting it to the LLM. I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message), but if required I'm open to iterate on this and eventually also add this possibility.
/cc @lordofthejars @cescoffier @geoand