Skip to content

Repository for the paper Video Question Answering with Phrases via Semantic Roles

License

Notifications You must be signed in to change notification settings

TheShadow29/Video-QAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video-QAP (NAACL21)

LICENSE Python PyTorch Arxiv

Video Question Answering with Phrases via Semantic Roles
Arka Sadhu, Kan Chen Ram Nevatia
NAACL 2021

Video Question Answering has been studied through the lens of N-way phrase classification. While this eases evaluation, it severely limits its application in the wild. Here, we require the model to generate the answer and we propose a novel evaluation metric using relative scoring and contrastive scoring. We further create ActivityNet-SRL-QA and Charades-SRL-QA.

Quickstart

Quick Start

  1. Clone repo:

    git clone https://github.com/TheShadow29/Video-QAP
    cd Video-QAP
    export ROOT=$(pwd)
    
  2. Setup a new conda environment using the file vidqap_env.yml file provided. Please refer to Miniconda for details on installing conda.

    MINICONDA_ROOT=[to your Miniconda/Anaconda root directory]
    conda env create -f vidqap_env.yml --prefix $MINICONDA_ROOT/envs/vidqap_pyt
    conda activate vidqap_pyt
    
  3. See instructions to install fairseq INSTALL.md

  4. To download the datasets ActivityNet-SRL-QA and Charades-SRL-QA see DATA.md

Training

  1. Configuration files are insider configs
    cd $ROOT
    python code/main_dist.py "vogqap_asrlqa" --ds_to_use='asrl_qa' --mdl.name='vog_qa' --train.bs=4 --train.epochs=10 --train.lr=1e-4
    
    Use one of the models lqa, mtx_qa, butd_qa, vog_qa

Evaluation

  1. Main evaluation file is vidqa_code/eval_fn_vidqap.py. You can use this as a stand-alone file for a separate dataset as well.
cd $ROOT
python vidqa_code/eval_fn_vidqap.py --pred_file=... --ds_to_use='asrl_qa' --split_type='valid' --met_keys='meteor,rouge,bert_score'

ToDo:

  • Add more documentation on how to run the models
  • Add pre-trained model weights.
  • Support dataset creation for new caption dataset.

Acknowledgements:

We thank:

  1. @LuoweiZhou: for their codebase on GVD (https://github.com/facebookresearch/grounded-video-description) along with the extracted features for ActivityNet.
  2. @antoine77340 for their codebase on S3D pretrained on Howto100M (https://github.com/antoine77340/S3D_HowTo100M) used for feature extraction on Charades.
  3. allennlp for providing demo and pre-trained model for SRL.
  4. fairseq for sequence generation implementation and transformer encoder decoder models.

Citation

@inproceedings{Sadhu2021VideoQA,
  title={Video Question Answering with Phrases via Semantic Roles},
  author={Arka Sadhu and Kan Chen and R. Nevatia},
  booktitle={NAACL},
  year={2021}
}

About

Repository for the paper Video Question Answering with Phrases via Semantic Roles

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published