You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've used many models with Outlines without problems. However, I've been unable to use Qwen successfully. Running this simple example:
model=models.transformers("Qwen/Qwen-7B-Chat",
device="auto",
model_kwargs={"trust_remote_code": True},
tokenizer_kwargs={"trust_remote_code": True}
)
prompt="""You are a sentiment-labelling assistant.Is the following review positive or negative?Review: This restaurant is great."""answer=outlines.generate.choice(model, ["Positive", "Positive"])(prompt)
gives this error:
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token(tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).
It was frustrating because I could use Qwen models with transformers library (i.e. directly, without Outlines) with no issues. Anyway, after some research, I managed to get past this error by running:
RuntimeError: Index put requires the source and destination dtypes match, got Float for the destination and BFloat16 for the source.
I managed to solve this error by adding "fp32": True to model_kwargs in the code above. After that, the model runs without issues. So just wanted to share this here for anyone facing the same issue and maybe it's useful to enhance Qwen support with Outlines.
Note: This works for Transformers. For AWQ, you need to also add "bf16":False to model_kwargs which is by default True.
However, it becomes slower since we are using 32 bits instead of 16 and I get a warning saying:
Your device support faster inference by passing bf16=True in "AutoModelForCausalLM.from_pretrained"
So that made me wonder if there's a way to use models with bf16=True with Outlines? I think the performance boost is significant.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've used many models with Outlines without problems. However, I've been unable to use Qwen successfully. Running this simple example:
gives this error:
It was frustrating because I could use Qwen models with
transformers
library (i.e. directly, without Outlines) with no issues. Anyway, after some research, I managed to get past this error by running:But then I get another error:
I managed to solve this error by adding
"fp32": True
tomodel_kwargs
in the code above. After that, the model runs without issues. So just wanted to share this here for anyone facing the same issue and maybe it's useful to enhance Qwen support with Outlines.Note: This works for Transformers. For AWQ, you need to also add
"bf16":False
tomodel_kwargs
which is by defaultTrue
.However, it becomes slower since we are using 32 bits instead of 16 and I get a warning saying:
So that made me wonder if there's a way to use models with
bf16=True
with Outlines? I think the performance boost is significant.Beta Was this translation helpful? Give feedback.
All reactions