You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+3
Original file line number
Diff line number
Diff line change
@@ -712,6 +712,9 @@ Building the program with BLAS support may lead to some performance improvements
712
712
713
713
### Prepare and Quantize
714
714
715
+
> [!NOTE]
716
+
> You can use the [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space on Hugging Face to quantise your model weights without any setup too. It is synced from `llama.cpp` main every 6 hours.
717
+
715
718
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.
716
719
717
720
Note: `convert.py` does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.
Copy file name to clipboardexpand all lines: examples/quantize/README.md
+3-1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# quantize
2
2
3
-
TODO
3
+
You can also use the [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space on Hugging Face to build your own quants without any setup.
4
+
5
+
Note: It is synced from llama.cpp `main` every 6 hours.
0 commit comments