Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main: port basic LLaVA (multimodal) support from llava-cli #5730

Closed
wants to merge 2 commits into from

Conversation

Nekotekina
Copy link
Contributor

@Nekotekina Nekotekina commented Feb 26, 2024

Allows interactive chat in the terminal and possibly other features I haven't tested.
Example:

~/github/llama.cpp/build/bin/main\
-m ~/Downloads/LLM-Models/llava-v1.6-34b.Q5_K_M.gguf\
--mmproj ~/Downloads/LLM-Models/mmproj-llava-34b-f16-q6_k.gguf\
--image <(convert -resize 448x448\> <(xclip -selection clipboard -t image/png -o) PNG:-)\
-tb $(nproc) -c 4096 -n 1024 -ngl 10 --temp 0.2 --repeat-last-n 1024 -s 0\
--color -cml -p "Give detailed answers to user's question about the image: <image>"

As in llava-cli, <image> gets replaced with actual image embedding. Not sure about 448x448 size, this may be wrong, but this is just an example of image preprocessing (X server clipboard -> ImageMagick).
C4YtL-TWQAQIRgY
image

@Nekotekina Nekotekina force-pushed the vision branch 7 times, most recently from 5dc5a79 to d1d65e5 Compare February 27, 2024 09:13
@Nekotekina
Copy link
Contributor Author

If this is unwelcomed you could just close it...

@lin72h
Copy link

lin72h commented Mar 29, 2024

Wow RPCS3's developer is helping us! Thanks

@slaren
Copy link
Collaborator

slaren commented Mar 31, 2024

If this is unwelcomed you could just close it...

It's not unwelcomed, but nobody is working on the llava code at the moment, so it is hard to review. The llava code is also pending a refactor so that it can be added back to the server (#4216 #6027). After that it may make more sense to add this functionality to other examples.

Limit image size to ~128 MiB
Don't try to get file size by seeking
This should increase flexibility
<image> keyword gets replaced with image embed within prompt.
@mofosyne mofosyne added llava LLaVa and multimodal Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level refactoring Refactoring porting and removed refactoring Refactoring labels May 10, 2024
@Nekotekina Nekotekina closed this Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llava LLaVa and multimodal porting Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants