-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional chat templates to dllama-api #73
Add additional chat templates to dllama-api #73
Conversation
Openchat has finetunes for llama2 and llama3 so I've just added openchat3 which should work with their llama 3 finetune, where as openchat should work with their llama 2 finetune |
I was able to successfully test openchat-3.6-8b and it worked correctly with the openchat3 chat template ./dllama-api.exe --model D:\openchat-3.6-8b-20240522-distributed\dllama_model_openchat-3.6-8b-20240522_q40.m --tokenizer D:\openchat-3.6-8b-20240522-distributed\dllama_tokenizer_llama3.t --weights-float-type q40 --buffer-float-type q80 --nthreads 8 --chat-template openchat3 --port 10111 (Excuse the weird characters, windows terminal can't render those symbols correctly.) I was not able to test openchat-3.5 as although I could convert the model using convert-hf.py, I could not convert the tokenizer. https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B uses chatml template so will test with that. |
I am having the weirdest issue, if I run Hermes-2-Theta-Llama-3-8B using the llama 3 converted tokenizer, it works fine, although it is missing some tokens since it's from a different model of the same architecture, Hermes-2-Theta-Llama-3-8B doesn't have a tokenizer.model so I was a bit in a jam as to what to do. I put together a convert-tokenizer-hf.py script that's meant to do the same as convert-tokenizer-llama3.py except it uses transformers AutoTokenizer to pull all the necessary data to build the tokenizer.t file But I think I am doing something wrong as when I run dllama with the generated tokenizer.t file it crashes when encoding text. convert-tokenizer-hf.py
|
I'm wondering if this is a good direction. I mean for sure the source code should not include all possible templates. Maybe this is something that should be moved to the tokenizer file. Basically now the tokenizer contains:
So this design assumes there may be differences in the chat mode. At the end the converter would be responsible for setting correct values. So this would be not a responsibility of DL. WDYT? |
I've tried to type a reply twice but keep getting an blue screen just as I'm about to send :/ Converting the tokenizer is very quick so, in the long run it's probably good to use that route, I just wanted to add a few of the common chat templates ie llama 2, llama 3 and chatml as that already covers the majority of models. The bigger issue I have is with the script I showed above, I cannot create tokenizers for some models as they do not have the tokenizer.model file, so I tried creating something to convert using AutoTokenizer but the converted tokenizer doesn't work for some reason, dllama error's out at tokenizer.cpp line 202, for instance this model: https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B |
@DifferentialityDevelopment please check this PR. This may solve the problem for different models. |
@DifferentialityDevelopment probably this would require updating the tokenizer in your repository on HuggingFace. Please, don't do this until the PR is not merged. Later, I want to test a different model. |
@DifferentialityDevelopment |
I've added a few of the most common chat templates, namely llama2, llama3, chatml and openchat.
This should make a lot more models compatible with distributed-llama's API
Also added an additional argument to AppArgs to let you specify the chat template used by the model on startup instead of on a per request basis.