Using -ctk q4_0 -ctv q4_0 with llama.cpp server throws flash_attn error for bartowski/DeepSeek-R1-GGUF #11432

AshD · 2025-01-26T17:54:55Z

AshD
Jan 26, 2025

Running llama.cpp server with parameters -ctk q4_0 -ctv q4_0 throws flash_attn error for bartowski/DeepSeek-R1-GGUF

llama_init_from_model: flash_attn requires n_embd_head_k == n_embd_head_v - forcing off
llama_init_from_model: V cache quantization requires flash_attn
common_init_from_params: failed to create context with model '/home/ash/ai/llms/DeepSeek-R1-Q4_K_M-00001-of-00011.gguf'
srv load_model: failed to load model, '/home/ash/ai/llms/DeepSeek-R1-Q4_K_M-00001-of-00011.gguf'
main: exiting due to model loading error

It works if I remove -ctk q4_0 -ctv q4_0 but then the context memory requirements are a lot higher. Would really like to use a KV cache quant for this model.

Thanks,
Ash

BarfingLemurs · 2025-01-26T23:34:37Z

BarfingLemurs
Jan 26, 2025

You can just pick -ctk q4_0, The -ctv q4_0 might exclude cpu inference for this model.

0 replies

AshD · 2025-01-27T16:50:30Z

AshD
Jan 27, 2025
Author

Thanks. That reduced memory requirements by 80GB for the 32K context for Q4KM version of the model.

0 replies

ewof · 2025-01-31T23:50:32Z

ewof
Jan 31, 2025

i still get this using -ctk q4_0 and no -ctv, but im using the unsloth ggufs not bartowski
it also happens even if i dont use -ctk at all

if i do something like
--override-kv deepseek2.attention.value_length=int:192 or --override-kv deepseek2.attention.value_length=int:128 i get an error about wrong shape

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using -ctk q4_0 -ctv q4_0 with llama.cpp server throws flash_attn error for bartowski/DeepSeek-R1-GGUF #11432

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using -ctk q4_0 -ctv q4_0 with llama.cpp server throws flash_attn error for bartowski/DeepSeek-R1-GGUF #11432

AshD Jan 26, 2025

Replies: 3 comments

BarfingLemurs Jan 26, 2025

AshD Jan 27, 2025 Author

ewof Jan 31, 2025

AshD
Jan 26, 2025

BarfingLemurs
Jan 26, 2025

AshD
Jan 27, 2025
Author

ewof
Jan 31, 2025