-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow payload request to support extra inference method kwargs #1345
Comments
@adriangonz ,I have a PR for this issue too.https://github.com/SeldonIO/MLServer/pull/1418 |
Hey @nanbo-liu , As discussed in #1418, we don't have much control over the shape of However, the good news are that Following your example, you could have something like: {
"inputs": [
{
"name": "text_inputs",
"shape": [1],
"datatype": "BYTES",
"data": ["My kitten's name is JoJo,","Tell me a story:"],
}
],
"parameters": {
"max_new_tokens": 200,
},
} |
Hi @adriangonz, we are still running into an issue with this. The python code works fine for passing in new tokens via kwarg or in a config like you listed above when using the
We could potentially just extract any parameter kwargs from the payload request and append them to the kwargs list? |
We have the same issue and we don't know how to solve it. |
Ah I see... that would need some changes to the HF runtime to take into account what's passed via the In order to avoid using by mistake other fields of the |
@adriangonz , I opened up another PR for this: #1505 |
max_new_tokens
andreturn_full_text
are extra arguments we can passed into pipeline's predict method.max_new_tokens
represent maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.whish they can be passed into the payload request
something like:
The text was updated successfully, but these errors were encountered: