support llava_onevision #7

defaultak01 · 2024-12-13T02:56:14Z

No description provided.

defaultak01 · 2024-12-13T03:10:47Z

@Yangsenqiao First of all, this is an excellent piece of work, and I really appreciate you making the code publicly available.

I noticed that in your demonstration of how VisionZip improves video understanding speed, you used LLAVA_OneVision. However, I encountered some compatibility issues when deploying it locally. For example:

The prepare_inputs_labels_for_multimodal_visionzip function does not accept the **modalities** parameter as input, whereas the corresponding function in LLAVA includes this parameter.
LLAVA_OneVision uses the SigLIP vision encoder (corresponding to **SigLipVisionTower**), but the code you provided is only adapted for **CLIPVisionTower**.

Could you please share the code related to LLAVA_OneVision that you used in the demonstration?

effortprogrammer · 2024-12-13T06:18:23Z

+1

Yangsenqiao · 2024-12-16T18:17:43Z

Please refer to my issue #9. Thanks a lot. 😊

liyucheng09 · 2025-01-20T16:52:18Z

@Yangsenqiao Hi Senqiao, can we simply replace CLIPVisionTower with SigLipVisionTower in the pattch function? will this reproduce similar results reported in the paper?

defaultak01 changed the title ~~support llava~~ support llava_onevision Dec 13, 2024

Yangsenqiao mentioned this issue Dec 16, 2024

Support the new model #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support llava_onevision #7

support llava_onevision #7

defaultak01 commented Dec 13, 2024

defaultak01 commented Dec 13, 2024 •

edited

Loading

effortprogrammer commented Dec 13, 2024

Yangsenqiao commented Dec 16, 2024

liyucheng09 commented Jan 20, 2025

support llava_onevision #7

support llava_onevision #7

Comments

defaultak01 commented Dec 13, 2024

defaultak01 commented Dec 13, 2024 • edited Loading

effortprogrammer commented Dec 13, 2024

Yangsenqiao commented Dec 16, 2024

liyucheng09 commented Jan 20, 2025

defaultak01 commented Dec 13, 2024 •

edited

Loading