Exploring SQuAD with different model architectures

The Stanford Question Answer Dataset (SQuAD) is a recently released reading comprehension dataset. In this repository, I aim to reimplement state of the art network architectures that have been successful on the SQuAD dataset. The current rankings can be found here.

Models implemented

I have currently implemented 3 models:

Standard Seq2Seq using BiLSTMs for both the encoder and decoder to be used for the baseline
Seq2Seq with added attention mechanism
Bi-Directional Attention Flow (BiDAF) (not fully complete, still have to add CNNs for the character level embeddings)

Prerequisites

Anaconda to install the packages
Tensorflow v1.1 (preferably with GPU if you want to do your own training) I had to work with Tensorflow v1.1 as in v1.2 support for GPUs were disabled for Macs.

pip install 'tensorflow-gpu==1.1.0'

Installation

Create and setup conda environment

source create -n squad python=3

Activate the created environment

source activate squad

Install Tensorflow

pip install 'tensorflow-gpu==1.1.0'

Install Matplotlib

conda install matplotlib

Install Jupyter notebook (if you want to run the Notebook to see a demo of the trained neural network)

conda install notebook ipykernel

Create the kernel to be used by the notebook

ipython kernel install --user --display-name squad --name squad

Getting started (for training)

Download this repo
Download the dataset from here and drag the data folder to the root of the project repo
To train a model simply cd to the root folder of the repo project and type:

python train.py --model Baseline --train_dir path/to/save/results

Note here that the only two required flags are the --model flag and the --train_dir flag, the other flags have default values and can be found in the train.py file.

Another important flag is the --eval_num flag which specifies how often the model is saved

Getting started (for notebook)

Go to the root folder and type

jupyter notebook

Then click on LuongAttention.ipynb and run through the notebook. It will load up the model, choose a validation sample from the dataset and predict on it which will then be compared to the correct answer.

Results

In all F1 and EM graphs, red are the results from the validation dataset, whereas blue are the results from the training sets.

Baseline model

Batch size used: 128

Luong's Attention model

Batch size used: 256

The Attention module achieves ~71% for F1 and ~58% for EM for the validation set. This is comparable to the Match-LSTM model by Singapore Management University

Bi-directional Attention flow model

Batch size used: 24

The BiDAF module achieves similar scores to the Attention module, although a better score should be able to be achieved after the character level CNN embedding layer is implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
README-files		README-files
models		models
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LuongAttention.ipynb		LuongAttention.ipynb
README.md		README.md
train.py		train.py

Provide feedback

Saved searches