Allow rewriting of user messages from input guardrails #1083

mariofusco · 2024-11-15T15:14:06Z

This pull request introduces the possibility of rewriting the user messages from the input guardrails. At the moment it is only possible to read and rewrite the complete materialized user message immediately before submitting it to the LLM. I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message), but if required I'm open to iterate on this and eventually also add this possibility.

/cc @lordofthejars @cescoffier @geoand

quarkus-bot · 2024-11-15T15:36:39Z

Status for workflow `Build (on pull request)`

This is the status report for running Build (on pull request) on commit c89bf9b.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

gsmet · 2024-11-15T15:38:00Z

I don't know if it would make sense to also allow a rewrite on the single input parameters level (before materializing the complete user message)

If you want to solve the prompt injection issue with this, I think you will need this feature. But maybe you envision to fix it in another way?

Typically, for our experiments for Devoxx, we had a sanitize() method which replaced any --- in the inputs as it was used as the delimiter.

lordofthejars · 2024-11-15T15:42:17Z

@gsmet Yes I did something similar as well, but I was thinking that we could do something like:

@UserMessage("blablablabla {param1} and more blablabla2 {param2}") 
String chat(@V("param1") @Guard String param1, @V("param2") String param2);

So only param1 is sent as input guard variable. The idea of parsing the input works with one parameter, but if we have multiple parameters, we might do many regexp things.

So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with Guard.

geoand · 2024-11-15T15:46:18Z

Yeah, that makes perfect sense

mariofusco · 2024-11-15T16:51:37Z

So then, in InputGuardrails, we can have a method saying give me all the parameters values annotated with Guard.

I'm not entirely sure on how this is supposed to work. Should the guardrail take in both the materialized output and the single annotated params? If so what you do in this case if you change the value of a param? Do you perform the materialization again?

At this point, in my opinion, it would be much clearer if we had a third form of guardrails, let's call them UserParamGuardrail, working at the level of the single param and invoked before the materialization of the whole message. This would give us even more flexibility, so you could eventually annotate different params with different guardrails.

In this case the workflow that I envision is the following:

Every single params with a UserParamGuardrail is validated and possibly rewritten by its own guardrail.
The params (rewritten or not) are put together to create the materialized complete user message.
The materialize user message is sent to the InputGuardrail (if any) for a further validation and rewriting.
The resulting materialized message finally hits the LLM.

What do you think?

lordofthejars · 2024-11-15T16:59:12Z

Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.

mariofusco · 2024-11-15T17:01:08Z

Exactly, Mario, I totally agree with you; I didn't pay attention to this. We need a method that lets us change parameters and then flow as usual.

Ok, if so I suggest to review and eventually merge this pull request, and then we could introduce the guardrails for params with a second pull request.

mariofusco · 2024-11-19T18:15:44Z

Any news or comments on this? Is this a feature that we want? Or maybe we should only develop input guardrails working on user params as suggested by @lordofthejars ?

/cc @cescoffier @geoand

sberyozkin · 2024-11-20T09:38:04Z

Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with sanitize() with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.

mariofusco · 2024-11-20T10:36:56Z

Hi, I'm wondering, can rewriting user messages have unintended side-effects. Should invalid input messages be rejected instead ? Sanitizing with sanitize() with removing unnecessary characters is nice, I'm just not sure what is the case for changing some input text, may be I'm overthinking it, sorry.

Good question and indeed at the beginning I was also struggling to find a valid use case for input rewriting and I didn't implement this. Subsequently I discussed this with @lordofthejars and he pointed out that there could be situations where you may want to rewrite the user input for instance for anonymization purposes: e.g. you rewrite the user prompt by replacing people/companies/products names with placeholders in order to avoid leaking sensitive information to the LLM service provider.

lordofthejars · 2024-11-20T11:06:46Z

I agree that rewriting input can have side effects, but I think that first, as the developer of the app, you know how you are rewriting, so maybe it is not that critical.

Of course, you can always implement this change before invoking the LLM,but the good thing about guards is that you can have all of them in a place, and not having logic spread across different places

sberyozkin · 2024-11-20T11:56:55Z

Thanks, minimizing the risk of the sensitive input content being leaked into LLM is a good use case. I'd probably consider reporting a message to the user instead, Please retry, the sensitivity score is too high, but I can imagine how anonymization can be a useful mechanism as well

lordofthejars · 2024-11-20T12:02:23Z

Yeah also removing weird chars is an option, that yes it can be done as you mentioned as a sanitize method, but if we can put everything in one place, why not. But of course, there are workarounds for sure to not having to rewrite the prompt.

geoand · 2024-11-21T14:22:32Z

The way I see it, rewriting the user query is fine.

Allow rewriting of user messages from input guardrails

c89bf9b

mariofusco requested a review from a team as a code owner November 15, 2024 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow rewriting of user messages from input guardrails #1083

Allow rewriting of user messages from input guardrails #1083

mariofusco commented Nov 15, 2024

quarkus-bot bot commented Nov 15, 2024

gsmet commented Nov 15, 2024

lordofthejars commented Nov 15, 2024 •

edited

Loading

geoand commented Nov 15, 2024

mariofusco commented Nov 15, 2024

lordofthejars commented Nov 15, 2024

mariofusco commented Nov 15, 2024

mariofusco commented Nov 19, 2024

sberyozkin commented Nov 20, 2024 •

edited

Loading

mariofusco commented Nov 20, 2024

lordofthejars commented Nov 20, 2024

sberyozkin commented Nov 20, 2024

lordofthejars commented Nov 20, 2024 •

edited

Loading

geoand commented Nov 21, 2024

Allow rewriting of user messages from input guardrails #1083

Are you sure you want to change the base?

Allow rewriting of user messages from input guardrails #1083

Conversation

mariofusco commented Nov 15, 2024

quarkus-bot bot commented Nov 15, 2024

Status for workflow Build (on pull request)

gsmet commented Nov 15, 2024

lordofthejars commented Nov 15, 2024 • edited Loading

geoand commented Nov 15, 2024

mariofusco commented Nov 15, 2024

lordofthejars commented Nov 15, 2024

mariofusco commented Nov 15, 2024

mariofusco commented Nov 19, 2024

sberyozkin commented Nov 20, 2024 • edited Loading

mariofusco commented Nov 20, 2024

lordofthejars commented Nov 20, 2024

sberyozkin commented Nov 20, 2024

lordofthejars commented Nov 20, 2024 • edited Loading

geoand commented Nov 21, 2024

Status for workflow `Build (on pull request)`

lordofthejars commented Nov 15, 2024 •

edited

Loading

sberyozkin commented Nov 20, 2024 •

edited

Loading

lordofthejars commented Nov 20, 2024 •

edited

Loading