In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Modal Framework (MMF).
Model Name | Accuracy | Number of Epochs |
---|---|---|
Hierarchical Question-Image Co-attention | 48.32% | 42 |
MMF Transformer | 51.76% | 30 |
MMBT | 86.78% | 30 |
Download the dataset from here and place it in a directory named /dataset/med-vqa-data/
in the directory where this repository is cloned.
mmf_run config=projects/hateful_memes/configs/mmf_transformer/defaults.yaml model=mmf_transformer dataset=hateful_memes training.checkpoint_interval=100 training.max_updates=3000
mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml model=mmbt dataset=hateful_memes training.checkpoint_interval=100 training.max_updates=3000
cd hierarchical \
python main.py
Dataset used for training the models was the VQA-MED dataset taken from "ImageCLEF 2019: Visual Question Answering in Medical Domain" competition. Following are few plots of some statistics of the dataset.
Distribution of the type of questions in the dataset. |
Plot of frequency of words in answer. |