Allow payload request to support extra inference method kwargs #1505

nanbo-liu · 2023-12-05T23:08:28Z

This is for issue 1345
This allow us to pass inference_kwargs into payload

example below:

{
    "inputs": [
        {
            "name": "text_inputs",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["My kitten's name is JoJo,","Tell me a story:"],
        }
    ],
    "parameters": {
        "extra":{"max_new_tokens": 200}
    }
}

nanbo-liu · 2023-12-05T23:10:45Z

@adriangonz, Please take a look when you have a chance

ajsalow · 2023-12-14T18:09:27Z

@adriangonz @sakoush could we please have this reviewed when either of you have a chance?

sakoush

Many thanks for your contribution, I left some comments mainly around testing.

Also please update docs accordingly of the hf runtime.

runtimes/huggingface/mlserver_huggingface/codecs/base.py

sakoush · 2023-12-15T15:24:07Z

runtimes/huggingface/tests/test_common.py

+        (
+            {"max_length": 20},
+            {"max_length": 10},
+            True,


It is unclear what expected means given the test case, i suggest to refactor a bit to make it more clearer. It might be just you just need to assert the number of tokens in each request is as follows expected (effectively converting it to 2 test cases)

Updated this into 2 test cases. Also asserting the number of predicted tokens are the expected number of tokens

geodavic · 2023-12-15T18:07:55Z

runtimes/huggingface/tests/test_codecs.py

+    import pdb
+
+    pdb.set_trace()


runtimes/huggingface/tests/test_common.py

nanbo-liu · 2023-12-18T16:38:21Z

@sakoush , would you take a look when you have a chance

emmettprexus · 2023-12-26T11:30:51Z

This seems to be the solution to my problem. I'm using MLServer with something like bhadresh-savani/distilbert-base-uncased-emotion and the default (top_k=1) just gives me the highest score - but I need all of them. Setting parameters.extra.top_k to null gives me the complete response from the model.

Fingers crossed that this will get merged soon.

sakoush

Many thanks for the changes so far, I left a few minor suggestions mainly on additional testing cases.

Apologies for the slow response due to the holiday season.

sakoush · 2023-12-26T13:58:51Z

runtimes/huggingface/mlserver_huggingface/codecs/base.py

@@ -170,6 +171,10 @@ def encode_request(cls, payload: Dict[str, Any], **kwargs) -> InferenceRequest:

    @classmethod
    def decode_request(cls, request: InferenceRequest) -> Dict[str, Any]:
+        """
+        Decode Inference requst into dictionary


Suggested change

Decode Inference requst into dictionary

Decode Inference request into dictionary

sakoush · 2023-12-26T14:00:18Z

runtimes/huggingface/mlserver_huggingface/codecs/base.py

@@ -170,6 +171,10 @@ def encode_request(cls, payload: Dict[str, Any], **kwargs) -> InferenceRequest:

    @classmethod
    def decode_request(cls, request: InferenceRequest) -> Dict[str, Any]:
+        """
+        Decode Inference requst into dictionary
+        extra Inference kwargs can be kept in 'InferenceRequest.parameters.extra'


Suggested change

extra Inference kwargs can be kept in 'InferenceRequest.parameters.extra'

extra Inference kwargs are extracted from 'InferenceRequest.parameters.extra'

sakoush · 2023-12-26T14:05:03Z

runtimes/huggingface/mlserver_huggingface/codecs/base.py

+                    values.update(extra)
+                else:
+                    logging.warn(
+                        "Extra inference kwargs should be kept in a dictionary."


Could you output the value of the parameter as well in the warning message? And perhaps change the warning message to be in the form of "Extra parameters cannot be parsed, expected a dictionary" to be more descriptive in the message?

runtimes/huggingface/tests/test_codecs.py

sakoush · 2023-12-26T14:14:41Z

runtimes/huggingface/tests/test_common.py

+    tokenizer = runtime._model.tokenizer
+
+    prediction = await runtime.predict(payload)
+    generated_text = json.loads(prediction.outputs[0].data[0])["generated_text"]


Could you try to use the hf codec decode_response method and check if it makes this line a bit more readable?

nanbo-liu · 2023-12-27T15:11:03Z

@sakoush , would you take a look. All issues you mentioned above should be addressed

ajsalow · 2024-01-09T20:16:23Z

@adriangonz @sakoush @seldondev

ajsalow · 2024-01-16T14:05:29Z

Can we please get another review? @adriangonz @sakoush @seldondev @RafalSkolasinski

What's the status of this project? Is it going to continue to be maintained?

sakoush

LGTM

runtimes/huggingface/mlserver_huggingface/codecs/base.py

emmettprexus · 2024-01-16T17:25:15Z

🥳

ahousley · 2024-01-17T09:35:10Z

What's the status of this project? Is it going to continue to be maintained?

Yes, MLServer continues to be maintained and we (Seldon) welcome contributions. There are releases scheduled for Seldon ecosystem projects and products in the coming weeks, which incorporate a significant amount of customer and community feedback and contributions.

emmettprexus · 2024-01-25T10:41:11Z

What's the status of this project? Is it going to continue to be maintained?

Yes, MLServer continues to be maintained and we (Seldon) welcome contributions. There are releases scheduled for Seldon ecosystem projects and products in the coming weeks, which incorporate a significant amount of customer and community feedback and contributions.

Does the upcoming release include these changes?

ajsalow · 2024-02-05T19:09:30Z

Any reason this hasn't been merged yet?

nanbo-liu · 2024-02-09T15:40:12Z

@ajsalow @emmettprexus @sakoush @adriangonz @seldondev @ahousley
Our teammate @geodavic recently found that current Mlserver-huggingface already support loading inference kwargs,
but they just need to be formatted in kserver way.
An example below:

import requests
import json
payload = {
    "inputs": [
        {
            "name": "text_inputs",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["My kitten's name is JoJo,","Tell me a story:"],
        },  
        {
          "name": "max_new_tokens",
          "shape": [1],
          "datatype": "INT32",
          "data": [50],
          "parameters": {
                "content_type": "raw"
          }
        },
        {
          "name": "temperature",
          "shape": [1],
          "datatype": "FP64",
          "data": [0.9],
          "parameters": {
                "content_type": "raw",
          }
        }       
    ]
}

response = requests.post(
    "http://localhost:8080/v2/models/tinyllama/infer", json=payload
)

data = json.loads(response.text)
print(data["outputs"])

So, maybe we don't want to merge this PR since it's already supported. We may want to update the README file to show an example how to use extra inference kwargs in payload.

emmettprexus · 2024-02-13T06:21:50Z

Our teammate @geodavic recently found that current Mlserver-huggingface already support loading inference kwargs,

It looks like I owe @geodavic a cake for finding that out! I scoured through the code but didn't find any hints, so good job! Now I can move on with my project.

Thanks a ton.

Nanbo Liu added 3 commits December 5, 2023 21:50

update code allow extra inference kwargs

679cd0a

removed test.py file

83fe202

added unit test for extra inference kwargs

70f0de8

nanbo-liu mentioned this pull request Dec 5, 2023

Allow payload request to support extra inference method kwargs #1345

Open

Nanbo Liu added 4 commits December 6, 2023 13:41

fixed unit tests

faffcb5

fixed unit tests

2b8cf1b

fixed unit tests

e7b3123

fixed unit tests

41e6aa0

nanbo-liu marked this pull request as draft December 6, 2023 15:03

nanbo-liu marked this pull request as ready for review December 6, 2023 15:03

Nanbo Liu added 4 commits December 6, 2023 15:05

fixed unit tests

86238ce

fixed unit tests

c852edb

fixed unit tests

a221f7b

fixed unit tests

6d40906

sakoush self-requested a review December 11, 2023 08:36

sakoush requested changes Dec 15, 2023

View reviewed changes

Nanbo Liu added 2 commits December 15, 2023 17:46

update unit tests and added a warning

13c274b

update docs

36e4f9d

geodavic reviewed Dec 15, 2023

View reviewed changes

removed pdb

bd8c293

sakoush requested changes Dec 26, 2023

View reviewed changes

Nanbo Liu added 5 commits December 26, 2023 15:56

fixed unit tests

a2307d2

fixed lint

ad435a7

fixed lint

5693231

fixed lint

4f6c89e

fixed lint

344aaf9

Nanbo Liu added 4 commits December 26, 2023 16:49

fixed lint

44cd3e7

fixed lint

381701f

fixed lint

c2fc011

frerun jobs

31362dc

sakoush approved these changes Jan 16, 2024

View reviewed changes

runtimes/huggingface/mlserver_huggingface/codecs/base.py Show resolved Hide resolved

nanbo-liu closed this by deleting the head repository Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow payload request to support extra inference method kwargs #1505

Allow payload request to support extra inference method kwargs #1505

nanbo-liu commented Dec 5, 2023

nanbo-liu commented Dec 5, 2023

ajsalow commented Dec 14, 2023

sakoush left a comment

sakoush Dec 15, 2023

nanbo-liu Dec 15, 2023

geodavic Dec 15, 2023

nanbo-liu Dec 15, 2023

nanbo-liu commented Dec 18, 2023

emmettprexus commented Dec 26, 2023

sakoush left a comment

sakoush Dec 26, 2023

sakoush Dec 26, 2023

sakoush Dec 26, 2023

sakoush Dec 26, 2023

nanbo-liu commented Dec 27, 2023

ajsalow commented Jan 9, 2024

ajsalow commented Jan 16, 2024 •

edited

Loading

sakoush left a comment

emmettprexus commented Jan 16, 2024

ahousley commented Jan 17, 2024 •

edited

Loading

emmettprexus commented Jan 25, 2024

ajsalow commented Feb 5, 2024

nanbo-liu commented Feb 9, 2024

emmettprexus commented Feb 13, 2024

	Decode Inference requst into dictionary
	Decode Inference request into dictionary

	extra Inference kwargs can be kept in 'InferenceRequest.parameters.extra'
	extra Inference kwargs are extracted from 'InferenceRequest.parameters.extra'

Allow payload request to support extra inference method kwargs #1505

Allow payload request to support extra inference method kwargs #1505

Conversation

nanbo-liu commented Dec 5, 2023

nanbo-liu commented Dec 5, 2023

ajsalow commented Dec 14, 2023

sakoush left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nanbo-liu commented Dec 18, 2023

emmettprexus commented Dec 26, 2023

sakoush left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nanbo-liu commented Dec 27, 2023

ajsalow commented Jan 9, 2024

ajsalow commented Jan 16, 2024 • edited Loading

sakoush left a comment

Choose a reason for hiding this comment

emmettprexus commented Jan 16, 2024

ahousley commented Jan 17, 2024 • edited Loading

emmettprexus commented Jan 25, 2024

ajsalow commented Feb 5, 2024

nanbo-liu commented Feb 9, 2024

emmettprexus commented Feb 13, 2024

ajsalow commented Jan 16, 2024 •

edited

Loading

ahousley commented Jan 17, 2024 •

edited

Loading