main: port basic LLaVA (multimodal) support from llava-cli #5730

Nekotekina · 2024-02-26T12:47:57Z

Allows interactive chat in the terminal and possibly other features I haven't tested.
Example:

~/github/llama.cpp/build/bin/main\
-m ~/Downloads/LLM-Models/llava-v1.6-34b.Q5_K_M.gguf\
--mmproj ~/Downloads/LLM-Models/mmproj-llava-34b-f16-q6_k.gguf\
--image <(convert -resize 448x448\> <(xclip -selection clipboard -t image/png -o) PNG:-)\
-tb $(nproc) -c 4096 -n 1024 -ngl 10 --temp 0.2 --repeat-last-n 1024 -s 0\
--color -cml -p "Give detailed answers to user's question about the image: <image>"

As in llava-cli, <image> gets replaced with actual image embedding. Not sure about 448x448 size, this may be wrong, but this is just an example of image preprocessing (X server clipboard -> ImageMagick).

Nekotekina · 2024-03-29T09:38:22Z

If this is unwelcomed you could just close it...

lin72h · 2024-03-29T22:59:02Z

Wow RPCS3's developer is helping us! Thanks

slaren · 2024-03-31T20:11:54Z

If this is unwelcomed you could just close it...

It's not unwelcomed, but nobody is working on the llava code at the moment, so it is hard to review. The llava code is also pending a refactor so that it can be added back to the server (#4216 #6027). After that it may make more sense to add this functionality to other examples.

Limit image size to ~128 MiB Don't try to get file size by seeking This should increase flexibility

<image> keyword gets replaced with image embed within prompt.

Nekotekina force-pushed the vision branch 7 times, most recently from 5dc5a79 to d1d65e5 Compare February 27, 2024 09:13

Nekotekina force-pushed the vision branch from d1d65e5 to 8c6aefa Compare April 21, 2024 13:28

Nekotekina added 2 commits April 21, 2024 16:36

llava.cpp: allow --image from pipes/sockets

d2b7b46

Limit image size to ~128 MiB Don't try to get file size by seeking This should increase flexibility

examples/main: basic multimodal support ported from llava-cli

8ac7656

<image> keyword gets replaced with image embed within prompt.

Nekotekina force-pushed the vision branch from 8c6aefa to 8ac7656 Compare April 21, 2024 13:37

mofosyne added llava LLaVa and multimodal Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level refactoring Refactoring porting and removed refactoring Refactoring labels May 10, 2024

Nekotekina closed this Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main: port basic LLaVA (multimodal) support from llava-cli #5730

main: port basic LLaVA (multimodal) support from llava-cli #5730

Nekotekina commented Feb 26, 2024 •

edited

Loading

Nekotekina commented Mar 29, 2024

lin72h commented Mar 29, 2024

slaren commented Mar 31, 2024

main: port basic LLaVA (multimodal) support from llava-cli #5730

main: port basic LLaVA (multimodal) support from llava-cli #5730

Conversation

Nekotekina commented Feb 26, 2024 • edited Loading

Nekotekina commented Mar 29, 2024

lin72h commented Mar 29, 2024

slaren commented Mar 31, 2024

Nekotekina commented Feb 26, 2024 •

edited

Loading