-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow payload request to support extra inference method kwargs #1505
Conversation
@adriangonz, Please take a look when you have a chance |
@adriangonz @sakoush could we please have this reviewed when either of you have a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your contribution, I left some comments mainly around testing.
Also please update docs accordingly of the hf runtime.
( | ||
{"max_length": 20}, | ||
{"max_length": 10}, | ||
True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear what expected means given the test case, i suggest to refactor a bit to make it more clearer. It might be just you just need to assert the number of tokens in each request is as follows expected (effectively converting it to 2 test cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this into 2 test cases. Also asserting the number of predicted tokens are the expected number of tokens
import pdb | ||
|
||
pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
@sakoush , would you take a look when you have a chance |
This seems to be the solution to my problem. I'm using MLServer with something like Fingers crossed that this will get merged soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the changes so far, I left a few minor suggestions mainly on additional testing cases.
Apologies for the slow response due to the holiday season.
@@ -170,6 +171,10 @@ def encode_request(cls, payload: Dict[str, Any], **kwargs) -> InferenceRequest: | |||
|
|||
@classmethod | |||
def decode_request(cls, request: InferenceRequest) -> Dict[str, Any]: | |||
""" | |||
Decode Inference requst into dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decode Inference requst into dictionary | |
Decode Inference request into dictionary |
@@ -170,6 +171,10 @@ def encode_request(cls, payload: Dict[str, Any], **kwargs) -> InferenceRequest: | |||
|
|||
@classmethod | |||
def decode_request(cls, request: InferenceRequest) -> Dict[str, Any]: | |||
""" | |||
Decode Inference requst into dictionary | |||
extra Inference kwargs can be kept in 'InferenceRequest.parameters.extra' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra Inference kwargs can be kept in 'InferenceRequest.parameters.extra' | |
extra Inference kwargs are extracted from 'InferenceRequest.parameters.extra' |
values.update(extra) | ||
else: | ||
logging.warn( | ||
"Extra inference kwargs should be kept in a dictionary." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you output the value of the parameter as well in the warning message? And perhaps change the warning message to be in the form of "Extra parameters cannot be parsed, expected a dictionary" to be more descriptive in the message?
tokenizer = runtime._model.tokenizer | ||
|
||
prediction = await runtime.predict(payload) | ||
generated_text = json.loads(prediction.outputs[0].data[0])["generated_text"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try to use the hf codec decode_response
method and check if it makes this line a bit more readable?
@sakoush , would you take a look. All issues you mentioned above should be addressed |
Can we please get another review? @adriangonz @sakoush @seldondev @RafalSkolasinski What's the status of this project? Is it going to continue to be maintained? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
🥳 |
Yes, MLServer continues to be maintained and we (Seldon) welcome contributions. There are releases scheduled for Seldon ecosystem projects and products in the coming weeks, which incorporate a significant amount of customer and community feedback and contributions. |
Does the upcoming release include these changes? |
Any reason this hasn't been merged yet? |
@ajsalow @emmettprexus @sakoush @adriangonz @seldondev @ahousley import requests
import json
payload = {
"inputs": [
{
"name": "text_inputs",
"shape": [1],
"datatype": "BYTES",
"data": ["My kitten's name is JoJo,","Tell me a story:"],
},
{
"name": "max_new_tokens",
"shape": [1],
"datatype": "INT32",
"data": [50],
"parameters": {
"content_type": "raw"
}
},
{
"name": "temperature",
"shape": [1],
"datatype": "FP64",
"data": [0.9],
"parameters": {
"content_type": "raw",
}
}
]
}
response = requests.post(
"http://localhost:8080/v2/models/tinyllama/infer", json=payload
)
data = json.loads(response.text)
print(data["outputs"]) So, maybe we don't want to merge this PR since it's already supported. We may want to update the |
This is for issue 1345
This allow us to pass inference_kwargs into payload
example below: