You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I deployed an adapter using LORA_ADAPTERS environment variable on a sagemaker endpoint. Everything is working fine except the fact that it is not failing if I'm providing the wrong adapter_id during the inference. It is providing prediction from the base model.
My Question: Should the request fail because we are providing wrong adapter_id?
curl 127.0.0.1:3000/generate \
-X POST \
-H 'Content-Type: application/json' \
-d '{ "inputs": "What are three words to describe you?", "parameters": { "max_new_tokens": 20 }}'# {"generated_text":" (e.g. funny, outgoing, creative)\nI would say that three words to describe me are"}
request/response with an adapter
curl 127.0.0.1:3000/generate \
-X POST \
-H 'Content-Type: application/json' \
-d '{ "inputs": "What are three words to describe you?", "parameters": { "max_new_tokens": 20, "adapter_id": "DavidLanz/Llama3_tw_8B_btc_qlora" }}'# {"generated_text":" A. Adventurous, B. Creative, C. Curious\nWhat are three words to describe"}%
request/response with invalid adapter
curl 127.0.0.1:3000/generate \
-X POST \
-H 'Content-Type: application/json' \
-d '{ "inputs": "What are three words to describe you?", "parameters": { "max_new_tokens": 20, "adapter_id": "random_adapter_id" }}'# {"generated_text":" (e.g. funny, outgoing, creative)\nI would say that three words to describe me are"}
From the response, I'm guessing that it is inferencing from the base model.
Also, how could I add more logs(opt-in for more logs), so that I can see if my adapter_id was invalid and the default behavior is that it will do the inference from base model.
The text was updated successfully, but these errors were encountered:
I deployed an adapter using
LORA_ADAPTERS
environment variable on a sagemaker endpoint. Everything is working fine except the fact that it is not failing if I'm providing the wrongadapter_id
during the inference. It is providing prediction from the base model.My Question: Should the request fail because we are providing wrong adapter_id?
For example :-
text-generation-launcher
--model-id meta-llama/Meta-Llama-3-8B-Instruct
--lora-adapters DavidLanz/Llama3_tw_8B_btc_qlora
request/response without adapter
request/response with an adapter
request/response with invalid adapter
From the response, I'm guessing that it is inferencing from the base model.
Also, how could I add more logs(opt-in for more logs), so that I can see if my
adapter_id
was invalid and the default behavior is that it will do the inference from base model.The text was updated successfully, but these errors were encountered: