Can madlad400 gguf models from huggingface be used? #8300
-
I compiled the latest version which has T5 support and tried running a madlad400 model from https://huggingface.co/jbochi/madlad400-3b-mt/resolve/main/model-q4k.gguf
Is there a change in the conversion process from .safetensors which is needed for T5 models?
|
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 7 replies
-
Well, I couldn't figure out a way to use jbochi's gguf directly either. So I think it's necessary to use the conversion script convert_hf_to_gguf.py. Btw super enthused about these recent additions, this project just keeps on getting better :D EDIT: OK so this behaves somewhat similarly to candle, but the glitches are slightly different. |
Beta Was this translation helpful? Give feedback.
-
Okay I got it working now (I think), and man it feels FAST!! The bad news is that my GGUF conversion procedure from jbochi => llama.cpp was quite a messy business indeed. I can give more details if there's interest but somehow I feel there must be a better way :D EDIT: I've now managed to polish the conversion process a little bit, so that no llama.cpp customization is necessary any longer. Here's the patch if anyone wants to try this version. You'll need the original jbochi model and xdelta3. |
Beta Was this translation helpful? Give feedback.
-
Did anyone made a llama.cpp compatible gguf for another T5 model, aya-101: https://huggingface.co/CohereForAI/aya-101 ? |
Beta Was this translation helpful? Give feedback.
-
Oh wow, there they are, popping up at HF now: |
Beta Was this translation helpful? Give feedback.
-
OK just got aya-101 working! The catch is that you have to quantize it yourself. I wanted to test quantizing a large model with meager resources and this was as good a candidate as any. (Of course, "large" is relative...in the era of 405B this is peanuts really :)
|
Beta Was this translation helpful? Give feedback.
-
aya-101 is missing the spiece.model file which is needed to convert it. I copied the one from mt5-xxl which enabled the conversion to work and created IQ4_XS quant. bash-5.1$ lm "translate to finnish: I wanted to test quantizing a large model with meager resources and this was as good a candidate as any." The model is pretty dumb, looks mainly useful for translations: lm "Answer the following yes/no question by reasoning step-by-step. Could a dandelion suffer from hepatitis?" Translated question with madlad400 to German, same answer. bash-5.1$ lm "Beantworten Sie die folgende Ja/Nein-Frage schrittweise: Könnte ein Löwenzahn an Hepatitis leiden?" |
Beta Was this translation helpful? Give feedback.
-
Gotta agree on dumb :D IQ4_XS, you say? I wonder how that imatrix thing is handled in these multilingual models. Btw in case anyone's wondering, yes you can run this on said C2D/4GB machine. Well, it's more of a crawl though. vvv Thanks vvv --repeat-penalty 2.0 and leveling up to IQ4_XS mitigated the looping problem, but not all the way. |
Beta Was this translation helpful? Give feedback.
-
I tried this: https://huggingface.co/Eddishockwave/madlad400-10b-mt-Q8_0-GGUF It works producing quite good results anyhow. |
Beta Was this translation helpful? Give feedback.
-
I got it working with I've noticed that In interactive mode, it doesn't return anything and in API mode it returns Anybody knows why? Using code from the latest git. |
Beta Was this translation helpful? Give feedback.
-
@misutoneko Hi, sir. Could you complete the example in llama.swiftui using T5? I tried using the madlad400 model in Swift, but I got an error: |
Beta Was this translation helpful? Give feedback.
Okay I got it working now (I think), and man it feels FAST!!
The bad news is that my GGUF conversion procedure from jbochi => llama.cpp was quite a messy business indeed.
It involved conjuring up an empty GGUF, filling it with metadata and doing some frankensteining with KerfuffleV2's gguf-tools.
I also wrote a custom script to rename the tensors, and llama.cpp itself needed a teeny weeny change too.
The upside of this method is that the quantized tensors remain untouched.
I can give more details if there's interest but somehow I feel there must be a better way :D
EDIT: I've now managed to polish the conversion process a little bit, so that no llama.cpp customization is necessary any longer.