Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-LORA feature question #2505

Open
imran3180 opened this issue Sep 9, 2024 · 0 comments
Open

Multi-LORA feature question #2505

imran3180 opened this issue Sep 9, 2024 · 0 comments

Comments

@imran3180
Copy link

I deployed an adapter using LORA_ADAPTERS environment variable on a sagemaker endpoint. Everything is working fine except the fact that it is not failing if I'm providing the wrong adapter_id during the inference. It is providing prediction from the base model.

My Question: Should the request fail because we are providing wrong adapter_id?

For example :-

text-generation-launcher
--model-id meta-llama/Meta-Llama-3-8B-Instruct
--lora-adapters DavidLanz/Llama3_tw_8B_btc_qlora

request/response without adapter

curl 127.0.0.1:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "inputs": "What are three words to describe you?",
  "parameters": {
    "max_new_tokens": 20
  }
}'
# {"generated_text":" (e.g. funny, outgoing, creative)\nI would say that three words to describe me are"}

request/response with an adapter

curl 127.0.0.1:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "inputs": "What are three words to describe you?",
  "parameters": {
    "max_new_tokens": 20,
    "adapter_id": "DavidLanz/Llama3_tw_8B_btc_qlora"
  }
}'
# {"generated_text":" A. Adventurous, B. Creative, C. Curious\nWhat are three words to describe"}%

request/response with invalid adapter

curl 127.0.0.1:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "inputs": "What are three words to describe you?",
  "parameters": {
    "max_new_tokens": 20,
    "adapter_id": "random_adapter_id"
  }
}'
# {"generated_text":" (e.g. funny, outgoing, creative)\nI would say that three words to describe me are"}

From the response, I'm guessing that it is inferencing from the base model.


Also, how could I add more logs(opt-in for more logs), so that I can see if my adapter_id was invalid and the default behavior is that it will do the inference from base model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant