You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I followed the sample colab notebook and fine tuned - "unsloth/Meta-Llama-3.1-8B-bnb-4bit" model.
I used the latest llama.cpp compiled with flags cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_ENABLE_UNIFIED_MEMORY=1
It generated the gguf file no problem, but when I tried to use the generated gguf I got this error:
c$ ./main -m ./models/unsloth.Q4_K_M.gguf -p "hello"
Log start
main: build = 3482 (e54c35e4)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed = 1735960410
gguf_init_from_file: invalid magic characters ''
llama_model_load: error loading model: llama_model_loader: failed to load model from ./models/unsloth.Q4_K_M.gguf
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/unsloth.Q4_K_M.gguf'
main: error: unable to load model
Here is the first few bytes of the generated gguf file, any experts see any issues with the generated gguf ?
More info.
The 16Bit generated model works fine. During the process of constructing the quanitzed 4bit unsloth/llama3.2 model, it creates a 16 Brain Float model, and it works. Here is some more info
model_q4_k_m$ ls -lrt
total 7002584
-rw-rw-r-- 1 d d 54628 Jan 3 19:10 tokenizer_config.json
-rw-rw-r-- 1 d d 454 Jan 3 19:10 special_tokens_map.json
-rw-rw-r-- 1 d d 17209920 Jan 3 19:10 tokenizer.json
-rw-rw-r-- 1 d d 994 Jan 3 19:10 config.json
-rw-rw-r-- 1 d d 234 Jan 3 19:10 generation_config.json
-rw-rw-r-- 1 d d 4417802560 Jan 3 19:10 model.safetensors
-rw-rw-r-- 1 d d 2479595168 Jan 3 19:10 unsloth.BF16.gguf <<<<< WORKS FINE.
-rw-rw-r-- 1 d d 255954592 Jan 3 19:10 unsloth.Q4_K_M.gguf <<<< DOES NOT WORK -INVALID MAGIC NUMBER
Hello,
I followed the sample colab notebook and fine tuned - "unsloth/Meta-Llama-3.1-8B-bnb-4bit" model.
I used the latest llama.cpp compiled with flags cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_ENABLE_UNIFIED_MEMORY=1
It generated the gguf file no problem, but when I tried to use the generated gguf I got this error:
c$ ./main -m ./models/unsloth.Q4_K_M.gguf -p "hello"
Log start
main: build = 3482 (e54c35e4)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed = 1735960410
gguf_init_from_file: invalid magic characters ''
llama_model_load: error loading model: llama_model_loader: failed to load model from ./models/unsloth.Q4_K_M.gguf
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/unsloth.Q4_K_M.gguf'
main: error: unable to load model
Here is the first few bytes of the generated gguf file, any experts see any issues with the generated gguf ?
(netai) d@d:
/hp/NetAnalytics/dev/netai/syslog/syslog_scraper_netai/t80/rc$ hexdump -C ./models/unsloth.Q4_K_M.gguf | head -n 10/hp/NetAnalytics/dev/netai/syslog/syslog_scraper_netai/t80/rc$00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00777e20 00 00 80 3f 00 00 80 3f 00 00 80 3f 00 00 80 3f |...?...?...?...?|
*
00777e50 00 00 80 3f 00 00 80 3f 00 00 80 3f c2 5e d3 3f |...?...?...?.^.?|
00777e60 6f b4 52 40 ee aa 1a 41 00 00 00 42 00 00 00 42 |[email protected]|
00777e70 00 00 00 42 00 00 00 42 00 00 00 42 00 00 00 42 |...B...B...B...B|
*
00777ea0 dc 5a 06 ac 97 b8 0f 2a 94 88 da 3f c1 7d 8e 71 |.Z.....*...?.}.q|
00777eb0 f4 a2 db 17 fe 31 75 eb 87 6f 00 0b 58 39 54 44 |.....1u..o..X9TD|
(netai) d@d:
Any ideas on how to figure out how to start debugging ?
The text was updated successfully, but these errors were encountered: