-
Notifications
You must be signed in to change notification settings - Fork 339
Description
Describe the bug
Thanks for the work! If I understand correctly, I might find a bug in the WISE multimodal implementation.
I encountered a logic error in the code where the apply_chat_template function is called.
EasyEdit/easyeditor/models/wise/utils.py
Lines 313 to 316 in a642707
| "role": "user", | |
| "content": [ | |
| [{"type": "image"}] * num_images + [{"type": "text", "text": p}] | |
| ], |
When constructing the content field for the user message, the code creates a double-nested list structure (a list inside a list: [[...]]) instead of a flat list of dictionaries.
Because the Hugging Face processor.apply_chat_template expects a flat list of content items (e.g., [{"type": "image"}, {"type": "text"}]), this extra nesting confuses the tokenizer. As a result, the actual text prompt is failed to be included in the generated prompt_ids, causing the model to receive empty or malformed inputs.
To Reproduce
... inside the apply_chat_template call ...
{
"role": "user",
# ❌ BUG: Double nesting [ [ ... ] ]
"content": [
[{"type": "image"}] * num_images + [{"type": "text", "text": p}]
],
},
... inside the apply_chat_template call ...
{
"role": "user",
# ✅ FIX: Single list [ ... ]
"content": [{"type": "image"}] * num_images + [{"type": "text", "text": p}]
},
Expected behavior
The content field should be a single, flat list containing the image placeholders and the text dictionary. When I tested, if it is not a single, flat list, the resulted prompt id will not include the actual content of the prompt.