Skip to content

Incorrect list nesting in apply_chat_template causes prompts to be ignored in WISE Multimodal implementation #649

@boilerchun

Description

@boilerchun

Describe the bug

Thanks for the work! If I understand correctly, I might find a bug in the WISE multimodal implementation.

I encountered a logic error in the code where the apply_chat_template function is called.

"role": "user",
"content": [
[{"type": "image"}] * num_images + [{"type": "text", "text": p}]
],

When constructing the content field for the user message, the code creates a double-nested list structure (a list inside a list: [[...]]) instead of a flat list of dictionaries.

Because the Hugging Face processor.apply_chat_template expects a flat list of content items (e.g., [{"type": "image"}, {"type": "text"}]), this extra nesting confuses the tokenizer. As a result, the actual text prompt is failed to be included in the generated prompt_ids, causing the model to receive empty or malformed inputs.

To Reproduce

... inside the apply_chat_template call ...

{
"role": "user",
# ❌ BUG: Double nesting [ [ ... ] ]
"content": [
[{"type": "image"}] * num_images + [{"type": "text", "text": p}]
],
},

... inside the apply_chat_template call ...

{
"role": "user",
# ✅ FIX: Single list [ ... ]
"content": [{"type": "image"}] * num_images + [{"type": "text", "text": p}]
},

Expected behavior

The content field should be a single, flat list containing the image placeholders and the text dictionary. When I tested, if it is not a single, flat list, the resulted prompt id will not include the actual content of the prompt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions