Unexpected Output When Passing Multiple Base64 Images to Qwen2-VL-7B-Instruct Using VLLM

Hi, thank you for this awesome repository. It works really well. 
However, I’m encountering an issue when sending two base64-encoded images to the Qwen2-VL-7B-Instruct model (served via VLLM with --limit-mm-per-prompt "image=4"). The model response becomes incoherent. 

When using only one image (k=1), the results are accurate and contextually relevant. Here is a minimal example to reproduce the issue:

```
@document_router.post("/query-pdfs")
async def query_pdfs(payload: Query):
    RAG = RAGMultiModalModel.from_index("query_docs", device='cpu')
    results = RAG.search(payload.message, k=2, return_base64_results=True)
    logger.info(f"Number of images returned - {len(results)}")
    response_message = await model_response(vision_openai_client, INTERNAL_VISION_MODEL, payload.message,
                                            results[0]['base64'], results[1]['base64'])


async def model_response(client, model, question, encoded_image_1, encoded_image_2):
    image_response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image;base64,{encoded_image_1}"}
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image;base64,{encoded_image_2}"}
                    }
                ],
            }
        ],
        max_tokens=7000,
        temperature=0,
    )
    return image_response
```

The model's response is a weird set of words that don't make any sense. 
Looks something like this - 

"message": "[' content page page page is page page page content page Page content tasks page page following page page page page pages page page several several several document page page page page page pá page pageesenym",

Not sure if this is the right place to post this, but I am desperate. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected Output When Passing Multiple Base64 Images to Qwen2-VL-7B-Instruct Using VLLM #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected Output When Passing Multiple Base64 Images to Qwen2-VL-7B-Instruct Using VLLM #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions