Can VisionZip be used on VLMs deployed on NPU? #6

geniusxxx · 2024-12-12T06:37:39Z

No description provided.

Yangsenqiao · 2024-12-12T11:25:55Z

Hi,

To be honest, I am not very familiar with the NPU. However, VisionZip is designed to reduce redundant visual tokens before feeding them into the LLM. Therefore, we believe it can be applied and deployed alongside most LLM acceleration algorithms.

As for deployment on the NPU, I think if the raw VLM can be deployed on the NPU, then the VisionZip could also be applicable.

Best regards,
Senqiao

geniusxxx · 2024-12-12T11:40:57Z

Hi,

To be honest, I am not very familiar with the NPU. However, VisionZip is designed to reduce redundant visual tokens before feeding them into the LLM. Therefore, we believe it can be applied and deployed alongside most LLM acceleration algorithms.

As for deployment on the NPU, I think if the raw VLM can be deployed on the NPU, then the VisionZip could also be applicable.

Best regards, Senqiao

Thanks for your reply! I need to convert the model's weight into ONNX format and then use an inference framework to inference on edge-side NPU. Should I use VisionZip during the post-processing stage after converting to ONNX, or should I use VisionZip before converting to ONNX? Looking forward to your response.

Yangsenqiao · 2024-12-12T12:07:49Z

Hi Boyu,

I am not familiar with the NPU and ONNX, and I just quickly learned some related knowledge through ChatGPT. GPT-4O suggested that VisionZip should be applied before converting to ONNX. Below is the answer GPT provided. Please note that these answers may not be entirely correct!

VisionZip should be applied before converting to ONNX.

Reasoning:

If VisionZip modifies the token structure (e.g., reducing redundant tokens), this needs to be part of the model's behavior and integrated into the ONNX graph. This ensures the ONNX model correctly represents the optimized token processing pipeline.
Applying VisionZip after ONNX conversion would require additional post-processing outside the model, complicating the deployment pipeline.

Workflow:

Integrate VisionZip into your model pipeline.
Convert the full pipeline (vision encoder + VisionZip + subsequent stages) to ONNX.
Deploy the ONNX model for inference on the edge-side NPU.

This approach keeps everything streamlined and efficient for edge deployment.

If you have any questions, please feel free to discuss them with me.

Best regards,
Senqiao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can VisionZip be used on VLMs deployed on NPU? #6

Can VisionZip be used on VLMs deployed on NPU? #6

geniusxxx commented Dec 12, 2024

Yangsenqiao commented Dec 12, 2024

geniusxxx commented Dec 12, 2024 •

edited

Loading

Yangsenqiao commented Dec 12, 2024

Can VisionZip be used on VLMs deployed on NPU? #6

Can VisionZip be used on VLMs deployed on NPU? #6

Comments

geniusxxx commented Dec 12, 2024

Yangsenqiao commented Dec 12, 2024

geniusxxx commented Dec 12, 2024 • edited Loading

Yangsenqiao commented Dec 12, 2024

Workflow:

geniusxxx commented Dec 12, 2024 •

edited

Loading