llamacpp: Enable flash attention #44

p1-0tr · 2025-05-21T09:36:21Z

No description provided.

Signed-off-by: Piotr Stankiewicz <[email protected]>

ericcurtin · 2025-07-31T11:03:48Z

Note Georgi suggests doing this for Metal and CUDA:

https://x.com/ggerganov/status/1909657397964292209

So might want to put some conditionals here, like for example, this flag may be better off absent in the case of CPU inferencing

xenoscopic · 2025-08-27T17:09:16Z

@p1-0tr Is this still worth pushing ahead? I guess we need to shift it to pkg/inference/backends/llamacpp/llamacpp_config.go and condition it on runtime.GOOS == "darwin" || hasCUDA11CapableGPU(...)?

ericcurtin · 2025-08-27T18:28:23Z

Note they are speaking of an auto-fa flag coming soon upstream:

ggml-org/llama.cpp#15454

this optimization is worth it.

ericcurtin · 2025-09-03T18:02:22Z

This is not worth adding anymore, there's been a lot of activity upstream, now llama-server will automatically turn on flash attention when appropriate. It will be picked up here whenever llama.cpp is rebased

xenoscopic · 2025-09-03T18:55:28Z

Perfect. It looks like @p1-0tr has a pending PR to bump llama.cpp with most of the flash attention work you mention. @p1-0tr shall we close this one out?

llamacpp: Enable flash attention

52beb06

Signed-off-by: Piotr Stankiewicz <[email protected]>

ilopezluna approved these changes May 22, 2025

View reviewed changes

xenoscopic approved these changes May 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llamacpp: Enable flash attention #44

llamacpp: Enable flash attention #44

p1-0tr commented May 21, 2025

Uh oh!

ericcurtin commented Jul 31, 2025 •

edited

Loading

Uh oh!

xenoscopic commented Aug 27, 2025

Uh oh!

ericcurtin commented Aug 27, 2025

Uh oh!

ericcurtin commented Sep 3, 2025

Uh oh!

xenoscopic commented Sep 3, 2025

Uh oh!

Uh oh!

llamacpp: Enable flash attention #44

Are you sure you want to change the base?

llamacpp: Enable flash attention #44

Conversation

p1-0tr commented May 21, 2025

Uh oh!

ericcurtin commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xenoscopic commented Aug 27, 2025

Uh oh!

ericcurtin commented Aug 27, 2025

Uh oh!

ericcurtin commented Sep 3, 2025

Uh oh!

xenoscopic commented Sep 3, 2025

Uh oh!

Uh oh!

ericcurtin commented Jul 31, 2025 •

edited

Loading