Allow payload request to support extra inference method kwargs #1345

nanbo-liu · 2023-08-23T19:34:34Z

from transformers import LlamaForCausalLM, AutoTokenizer, TextGenerationPipeline
model = LlamaForCausalLM.from_pretrained("daryl149/llama-2-7b-hf",load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("daryl149/llama-2-7b-hf")
pipeline = TextGenerationPipeline(model, tokenizer)
pipeline("Once upon a time,", max_new_tokens=100,return_full_text=False)

max_new_tokens and return_full_text are extra arguments we can passed into pipeline's predict method.
max_new_tokens represent maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.

whish they can be passed into the payload request
something like:

{
    "inputs": [
        {
            "name": "text_inputs",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["My kitten's name is JoJo,","Tell me a story:"],
        }
    ],
    "inference_kwargs": {
        "max_new_tokens": 200,
    },
}

The text was updated successfully, but these errors were encountered:

nanbo-liu · 2023-09-29T14:53:48Z

@adriangonz ,I have a PR for this issue too.https://github.com/SeldonIO/MLServer/pull/1418

adriangonz · 2023-10-03T13:36:08Z

Hey @nanbo-liu ,

As discussed in #1418, we don't have much control over the shape of InferenceRequest, which is kept quite agnostic from specific use cases.

However, the good news are that InferenceRequest objects already contain a parameters field that can be used to specify arbitrary parameters. Would this not be enough for your use case?

Following your example, you could have something like:

{
    "inputs": [
        {
            "name": "text_inputs",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["My kitten's name is JoJo,","Tell me a story:"],
        }
    ],
    "parameters": {
        "max_new_tokens": 200,
    },
}

a-palacios · 2023-10-03T15:55:57Z

Hi @adriangonz, we are still running into an issue with this. The python code works fine for passing in new tokens via kwarg or in a config like you listed above when using the mlserver code but when we do a POST via python requests to the "http://localhost:8080/v2/models/transformer/infer" endpoint, the parameters seem to be dropped on the decode step in the predict function in mlserver_huggingface/runtime.py on line 39:

async def predict(self, payload: InferenceRequest) -> InferenceResponse:
        # TODO: convert and validate?
        kwargs = HuggingfaceRequestCodec.decode_request(payload)
        args = kwargs.pop("args", [])

        array_inputs = kwargs.pop("array_inputs", [])
        if array_inputs:
            args = [list(array_inputs)] + args
        
        prediction = self._model(*args, **kwargs)

        return self.encode_response(
            payload=prediction, default_codec=HuggingfaceRequestCodec
        )

We could potentially just extract any parameter kwargs from the payload request and append them to the kwargs list?

rivamarco · 2023-10-09T11:44:58Z

We have the same issue and we don't know how to solve it.
How can we enable such parameters?

adriangonz · 2023-10-09T12:53:22Z

Ah I see... that would need some changes to the HF runtime to take into account what's passed via the parameters field - along the lines of what @a-palacios described.

In order to avoid using by mistake other fields of the parameters object, it should probably try to whitelist well known arg names though.

nanbo-liu · 2023-12-05T23:10:25Z

@adriangonz , I opened up another PR for this: #1505

nanbo-liu mentioned this issue Sep 27, 2023

added inference_kwargs to InferenceRequest #1418

Closed

nanbo-liu mentioned this issue Dec 5, 2023

Allow payload request to support extra inference method kwargs #1505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow payload request to support extra inference method kwargs #1345

Allow payload request to support extra inference method kwargs #1345

nanbo-liu commented Aug 23, 2023 •

edited

Loading

nanbo-liu commented Sep 29, 2023 •

edited

Loading

adriangonz commented Oct 3, 2023

a-palacios commented Oct 3, 2023 •

edited

Loading

rivamarco commented Oct 9, 2023

adriangonz commented Oct 9, 2023 •

edited

Loading

nanbo-liu commented Dec 5, 2023

Allow payload request to support extra inference method kwargs #1345

Allow payload request to support extra inference method kwargs #1345

Comments

nanbo-liu commented Aug 23, 2023 • edited Loading

nanbo-liu commented Sep 29, 2023 • edited Loading

adriangonz commented Oct 3, 2023

a-palacios commented Oct 3, 2023 • edited Loading

rivamarco commented Oct 9, 2023

adriangonz commented Oct 9, 2023 • edited Loading

nanbo-liu commented Dec 5, 2023

nanbo-liu commented Aug 23, 2023 •

edited

Loading

nanbo-liu commented Sep 29, 2023 •

edited

Loading

a-palacios commented Oct 3, 2023 •

edited

Loading

adriangonz commented Oct 9, 2023 •

edited

Loading