Skip to content

nusnlp/d2vlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Factorized Learning for Temporally Grounded Video-Language Models

Wenzheng Zeng1, Difei Gao1, Mike Zheng Shou1, Hwee Tou Ng1,

1National University of Singapore

ICCV 2025

This repository contains the official implementation of the ICCV 2025 paper "Factorized Learning for Temporally Grounded Video-Language Models".

πŸ”† Highlights

  • Model: We propose a new framework $D^2\mathrm{VLM}$, where we decompose the generation objective into a "grounding then answering with evidence referencing" paradigm and introduce evidence tokens to emphasize explicit event-level visual semantic capture.
  • Training Algorithm: We introduce Factorized Preference Optimization (FPO) that explicitly addresses both temporal grounding and textual response. A factorized data synthesis approach is also designed to support FPO.
  • Performance: Our method consistently outperforms SOTA methods across various tasks.
  • Open Source: The camera-ready paper and the source code will be released soon.

πŸŽ“ Citation

If you find our work useful in your research, please consider to cite our paper:

@inproceedings{d2vlm,
  title={Factorized Learning for Temporally Grounded Video-Language Models},
  author={Zeng, Wenzheng and Gao, Difei and Shou, Mike Zheng and Ng, Hwee Tou},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}

About

[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published