Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3.2-vision and Qwen2-VL Support #4

Open
boom-bang opened this issue Dec 9, 2024 · 4 comments
Open

Llama3.2-vision and Qwen2-VL Support #4

boom-bang opened this issue Dec 9, 2024 · 4 comments

Comments

@boom-bang
Copy link

Amazing and impressive work. Any plans for the support of new generation multimodal LLMs such as Llama-3.2-11B-Vision or Qwen2-VL?

@Yangsenqiao
Copy link
Collaborator

Thank you for your interest in our work! I appreciate your suggestions and will attempt to use VisionZip on these models and explore its performance in the future. Additionally, we look forward to community pull requests🔥.

@effortprogrammer
Copy link

effortprogrammer commented Dec 12, 2024

@Yangsenqiao Do you have any plannings to implement visionzip in huggingface transformers environment? Current implementation is related to the original LLava github repo, so it may be useful for other people to use in huggingface transformers environment.

@Yangsenqiao
Copy link
Collaborator

Thank you for your recommendation! We plan to provide a version compatible with Hugging Face in the future, but it may not be in the next few weeks as my final exams are approaching (T▽T).

@effortprogrammer
Copy link

@Yangsenqiao Let me know when you start working.. I will do my best to support as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants