Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can VisionZip be used on VLMs deployed on NPU? #6

Open
geniusxxx opened this issue Dec 12, 2024 · 3 comments
Open

Can VisionZip be used on VLMs deployed on NPU? #6

geniusxxx opened this issue Dec 12, 2024 · 3 comments

Comments

@geniusxxx
Copy link

No description provided.

@Yangsenqiao
Copy link
Collaborator

Hi,

To be honest, I am not very familiar with the NPU. However, VisionZip is designed to reduce redundant visual tokens before feeding them into the LLM. Therefore, we believe it can be applied and deployed alongside most LLM acceleration algorithms.

As for deployment on the NPU, I think if the raw VLM can be deployed on the NPU, then the VisionZip could also be applicable.

Best regards,
Senqiao

@geniusxxx
Copy link
Author

geniusxxx commented Dec 12, 2024

Hi,

To be honest, I am not very familiar with the NPU. However, VisionZip is designed to reduce redundant visual tokens before feeding them into the LLM. Therefore, we believe it can be applied and deployed alongside most LLM acceleration algorithms.

As for deployment on the NPU, I think if the raw VLM can be deployed on the NPU, then the VisionZip could also be applicable.

Best regards, Senqiao

Thanks for your reply! I need to convert the model's weight into ONNX format and then use an inference framework to inference on edge-side NPU. Should I use VisionZip during the post-processing stage after converting to ONNX, or should I use VisionZip before converting to ONNX? Looking forward to your response.

@Yangsenqiao
Copy link
Collaborator

Hi Boyu,

I am not familiar with the NPU and ONNX, and I just quickly learned some related knowledge through ChatGPT. GPT-4O suggested that VisionZip should be applied before converting to ONNX. Below is the answer GPT provided. Please note that these answers may not be entirely correct!


VisionZip should be applied before converting to ONNX.

Reasoning:

  1. If VisionZip modifies the token structure (e.g., reducing redundant tokens), this needs to be part of the model's behavior and integrated into the ONNX graph. This ensures the ONNX model correctly represents the optimized token processing pipeline.
  2. Applying VisionZip after ONNX conversion would require additional post-processing outside the model, complicating the deployment pipeline.

Workflow:

  1. Integrate VisionZip into your model pipeline.
  2. Convert the full pipeline (vision encoder + VisionZip + subsequent stages) to ONNX.
  3. Deploy the ONNX model for inference on the edge-side NPU.

This approach keeps everything streamlined and efficient for edge deployment.


If you have any questions, please feel free to discuss them with me.

Best regards,
Senqiao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants