Description
Hi, I noticed that in verl/experimental/agent_loop/agent_loop.py, the processor is called with video_metadatas (plural form) as the parameter name on lines 290 and 661.
However, looking at the official Qwen3-VL repository, it seems the parameter name used is video_metadata (singular form):
processor(text=text, images=images, videos=videos, video_metadata=video_metadatas, return_tensors="pt", do_resize=False, **video_kwargs)
I'm wondering if this might be a typo, or is there a specific reason for using the plural form here?
Affected Code
Line 290:
model_inputs = self.processor(
text=[raw_prompt],
images=images,
videos=videos,
video_metadatas=video_metadatas,
...
)
Line 661:
multi_modal_inputs = self.processor(
text=[current_text],
images=images,
videos=videos,
video_metadatas=video_metadatas,
...
)
References
Thanks for your time!