Skip to content

Should video_metadatas be video_metadata in agent_loop.py for Qwen3 VL? #5021

@Solus-sano

Description

@Solus-sano

Description

Hi, I noticed that in verl/experimental/agent_loop/agent_loop.py, the processor is called with video_metadatas (plural form) as the parameter name on lines 290 and 661.

However, looking at the official Qwen3-VL repository, it seems the parameter name used is video_metadata (singular form):

processor(text=text, images=images, videos=videos, video_metadata=video_metadatas, return_tensors="pt", do_resize=False, **video_kwargs)

I'm wondering if this might be a typo, or is there a specific reason for using the plural form here?

Affected Code

Line 290:

model_inputs = self.processor(
    text=[raw_prompt],
    images=images,
    videos=videos,
    video_metadatas=video_metadatas,
    ...
)

Line 661:

multi_modal_inputs = self.processor(
    text=[current_text],
    images=images,
    videos=videos,
    video_metadatas=video_metadatas,
    ...
)

References

Thanks for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions