Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow rewriting of user messages from input guardrails #1083

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mariofusco
Copy link
Contributor

This pull request introduces the possibility of rewriting the user messages from the input guardrails. At the moment it is only possible to read and rewrite the complete materialized user message immediately before submitting it to the LLM. I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message), but if required I'm open to iterate on this and eventually also add this possibility.

/cc @lordofthejars @cescoffier @geoand

@mariofusco mariofusco requested a review from a team as a code owner November 15, 2024 15:14
Copy link

quarkus-bot bot commented Nov 15, 2024

Status for workflow Build (on pull request)

This is the status report for running Build (on pull request) on commit c89bf9b.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

@gsmet
Copy link
Member

gsmet commented Nov 15, 2024

I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message)

If you want to solve the prompt injection issue with this, I think you will need this feature. But maybe you envision to fix it in another way?

Typically, for our experiments for Devoxx, we had a sanitize() method which replaced any --- in the inputs as it was used as the delimiter.

@lordofthejars
Copy link
Contributor

lordofthejars commented Nov 15, 2024

@gsmet Yes I did something similar as well, but I was thinking that we could do something like:

@UserMessage("blablablabla {param1} and more blablabla2 {param2}") 
String chat(@V("param1") @Guard String param1, @V("param2") String param2);

So only param1 is sent as input guard variable. The idea of parsing the input works with one parameter, but if we have multiple parameters, we might do many regexp things.

So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with Guard.

@geoand
Copy link
Collaborator

geoand commented Nov 15, 2024

Yeah, that makes perfect sense

@mariofusco
Copy link
Contributor Author

So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with Guard.

I'm not entirely sure on how this is supposed to work. Should the guardrail take in both the materialized output and the single annotated params? If so what you do in this case if you change the value of a param? Do you perform the materialization again?

At this point, in my opinion, it would be much clearer if we had a third form of guardrails, let's call them UserParamGuardrail, working at the level of the single param and invoked before the materialization of the whole message. This would give us even more flexibility, so you could eventually annotate different params with different guardrails.

In this case the workflow that I envision is the following:

  1. Every single params with a UserParamGuardrail is validated and possibly rewritten by its own guardrail.
  2. The params (rewritten or not) are put together to create the materialized complete user message.
  3. The materialize user message is sent to the InputGuardrail (if any) for a further validation and rewriting.
  4. The resulting materialized message finally hits the LLM.

What do you think?

@lordofthejars
Copy link
Contributor

Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.

@mariofusco
Copy link
Contributor Author

Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.

Ok, if so I suggest to review and eventually merge this pull request, and then we could introduce the guardrails for params with a second pull request.

@mariofusco
Copy link
Contributor Author

Any news or comments on this? Is this a feature that we want? Or maybe we should only develop input guardrails working on user params as suggested by @lordofthejars ?

/cc @cescoffier @geoand

@sberyozkin
Copy link
Contributor

sberyozkin commented Nov 20, 2024

Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with sanitize() with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.

@mariofusco
Copy link
Contributor Author

Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with sanitize() with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.

Good question and indeed at the beginning I was also struggling to find a valid use case for input rewriting and I didn't implement this. Subsequently I discussed this with @lordofthejars and he pointed out that there could be situations where you may want to rewrite the user input for instance for anonymization purposes: e.g. you rewrite the user prompt by replacing people/companies/products names with placeholders in order to avoid leaking sensitive information to the LLM service provider.

@lordofthejars
Copy link
Contributor

I agree that rewriting input can have side effects, but I think that first, as the developer of the app, you know how you are rewriting, so maybe it is not that critical.

Of course, you can always implement this change before invoking the LLM,but the good thing about guards is that you can have all of them in a place, and not having logic spread across different places

@sberyozkin
Copy link
Contributor

Thanks, minimizing the risk of the sensitive input content being leaked into LLM is a good use case. I'd probably consider reporting a message to the user instead, Please retry, the sensitivity score is too high, but I can imagine how anonymization can be a useful mechanism as well

@lordofthejars
Copy link
Contributor

lordofthejars commented Nov 20, 2024

Yeah also removing weird chars is an option, that yes it can be done as you mentioned as a sanitize method, but if we can put everything in one place, why not. But of course, there are workarounds for sure to not having to rewrite the prompt.

@geoand
Copy link
Collaborator

geoand commented Nov 21, 2024

The way I see it, rewriting the user query is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants