You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m Denny, a PhD student with a strong interest in Vision-Language Models. I found your demo particularly fascinating, especially its ability to interactively select visual tokens. I believe this feature could be incredibly useful for assessing the importance of different tokens.
Would you be able to share the code for the demo or provide guidance on how to develop a similar implementation?
Thank you in advance for your time and assistance!
Best regards,
Denny
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work! Since this demo involves some modifications to the LLaVA code itself, it may not be as straightforward or concise to release as VisionZip. Please kindly allow me some time to consider an appropriate release method.
Hi @Yangsenqiao ,
I’m Denny, a PhD student with a strong interest in Vision-Language Models. I found your demo particularly fascinating, especially its ability to interactively select visual tokens. I believe this feature could be incredibly useful for assessing the importance of different tokens.
Would you be able to share the code for the demo or provide guidance on how to develop a similar implementation?
Thank you in advance for your time and assistance!
Best regards,
Denny
The text was updated successfully, but these errors were encountered: