Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow payload request to support extra inference method kwargs #1345

Open
nanbo-liu opened this issue Aug 23, 2023 · 6 comments
Open

Allow payload request to support extra inference method kwargs #1345

nanbo-liu opened this issue Aug 23, 2023 · 6 comments

Comments

@nanbo-liu
Copy link
Contributor

nanbo-liu commented Aug 23, 2023

from transformers import LlamaForCausalLM, AutoTokenizer, TextGenerationPipeline
model = LlamaForCausalLM.from_pretrained("daryl149/llama-2-7b-hf",load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("daryl149/llama-2-7b-hf")
pipeline = TextGenerationPipeline(model, tokenizer)
pipeline("Once upon a time,", max_new_tokens=100,return_full_text=False)

max_new_tokens and return_full_text are extra arguments we can passed into pipeline's predict method.
max_new_tokens represent maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.

whish they can be passed into the payload request
something like:

{
    "inputs": [
        {
            "name": "text_inputs",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["My kitten's name is JoJo,","Tell me a story:"],
        }
    ],
    "inference_kwargs": {
        "max_new_tokens": 200,
    },
}
@nanbo-liu
Copy link
Contributor Author

nanbo-liu commented Sep 29, 2023

@adriangonz ,I have a PR for this issue too.https://github.com/SeldonIO/MLServer/pull/1418

@adriangonz
Copy link
Contributor

Hey @nanbo-liu ,

As discussed in #1418, we don't have much control over the shape of InferenceRequest, which is kept quite agnostic from specific use cases.

However, the good news are that InferenceRequest objects already contain a parameters field that can be used to specify arbitrary parameters. Would this not be enough for your use case?

Following your example, you could have something like:

{
    "inputs": [
        {
            "name": "text_inputs",
            "shape": [1],
            "datatype": "BYTES",
            "data": ["My kitten's name is JoJo,","Tell me a story:"],
        }
    ],
    "parameters": {
        "max_new_tokens": 200,
    },
}

@a-palacios
Copy link

a-palacios commented Oct 3, 2023

Hi @adriangonz, we are still running into an issue with this. The python code works fine for passing in new tokens via kwarg or in a config like you listed above when using the mlserver code but when we do a POST via python requests to the "http://localhost:8080/v2/models/transformer/infer" endpoint, the parameters seem to be dropped on the decode step in the predict function in mlserver_huggingface/runtime.py on line 39:

async def predict(self, payload: InferenceRequest) -> InferenceResponse:
        # TODO: convert and validate?
        kwargs = HuggingfaceRequestCodec.decode_request(payload)
        args = kwargs.pop("args", [])

        array_inputs = kwargs.pop("array_inputs", [])
        if array_inputs:
            args = [list(array_inputs)] + args
        
        prediction = self._model(*args, **kwargs)

        return self.encode_response(
            payload=prediction, default_codec=HuggingfaceRequestCodec
        )

We could potentially just extract any parameter kwargs from the payload request and append them to the kwargs list?

@rivamarco
Copy link

We have the same issue and we don't know how to solve it.
How can we enable such parameters?

@adriangonz
Copy link
Contributor

adriangonz commented Oct 9, 2023

Ah I see... that would need some changes to the HF runtime to take into account what's passed via the parameters field - along the lines of what @a-palacios described.

In order to avoid using by mistake other fields of the parameters object, it should probably try to whitelist well known arg names though.

@nanbo-liu
Copy link
Contributor Author

@adriangonz , I opened up another PR for this: #1505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants