Environment & Data

git checkout 3543e65287e51a42d7abf9fecaf6f4f881743475

# Set up environment
pip install virtualenv
virtualenv -p python3.5 birch_env
source birch_env/bin/activate

# Install dependencies
pip install Cython  # jnius dependency
pip install -r requirements.txt

git clone https://github.com/NVIDIA/apex
cd apex && pip install -v --no-cache-dir . && cd ..

# Set up Anserini
git clone https://github.com/castorini/anserini.git
cd anserini && mvn clean package appassembler:assemble
cd eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..

# Download data and models
wget https://zenodo.org/record/3269890/files/birch_data.tar.gz
tar -xzvf birch_data.tar.gz

Dataset

python src/robust04_cv.py --anserini_path <path/to/anserini> --index_path <path/to/index> --cv_fold <2, 5>

This step retrieves documents to depth 1000 for each query, and splits them into sentences to generate folds data. You may skip to the next step and and use the downloaded data under data/datasets.

Training

python src/main.py --mode training --collection mb --qrels_file qrels.microblog.txt --batch_size <batch_size> --eval_steps <eval_steps> --learning_rate <learning_rate> --num_train_epochs <num_train_epochs> --device cuda

Inference

python src/main.py --mode inference --experiment <qa_2cv, mb_2cv, qa_5cv, mb_5cv> --collection <robust04_2cv, robust04_5cv> --model_path <models/saved.mb_3, models/saved.qa_2> --load_trained --batch_size <batch_size> --device cuda

Note that this step takes a long time. If you don't want to evaluate the pretrained models, you may skip to the next step and evaluate with our predictions under data/predictions.

Evaluation

BM25+RM3 Baseline

./eval_scripts/baseline.sh <path/to/anserini> <path/to/index> <2, 5>

Sentence Evidence

Compute document score

Set the last argument to True if you want to tune the hyperparameters first. To use the default hyperparameters, set to False.

./eval_scripts/test.sh <qa_2cv, mb_2cv, qa_5cv, mb_5cv> <2, 5> <path/to/anserini> <True, False>

Evaluate with trec_eval

./eval_scripts/eval.sh <bm25+rm3_2cv, qa_2cv, mb_2cv, bm25+rm3_5cv, qa_5cv, mb_5cv> <path/to/anserini> qrels.robust2004.txt

Result on Robust04

"Paper 1" based on two-fold CV:

Model	AP	P@20
Paper 1 (two fold)	0.2971	0.3948
BM25+RM3 (Anserini)	0.2987	0.3871
1S: BERT(QA)	0.3014	0.3928
2S: BERT(QA)	0.3003	0.3948
3S: BERT(QA)	0.3003	0.3948
1S: BERT(MB)	0.3241	0.4217
2S: BERT(MB)	0.3240	0.4209
3S: BERT(MB)	0.3244	0.4219

"Paper 2" based on five-fold CV:

Model	AP	P@20
Paper 2 (five fold)	0.272	0.386
BM25+RM3 (Anserini)	0.3033	0.3974
1S: BERT(QA)	0.3102	0.4068
2S: BERT(QA)	0.3090	0.4064
3S: BERT(QA)	0.3090	0.4064
1S: BERT(MB)	0.3266	0.4245
2S: BERT(MB)	0.3278	0.4267
3S: BERT(MB)	0.3278	0.4287

See this paper for the exact fold settings.

Replication Log

Results replicated by @emmileaf on 2019-06-10 (commit cc42b60)

How do I cite this work?

@article{yang2019simple,
  title={Simple Applications of BERT for Ad Hoc Document Retrieval},
  author={Yang, Wei and Zhang, Haotian and Lin, Jimmy},
  journal={arXiv preprint arXiv:1903.10972},
  year={2019}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproduce_arxiv.md

reproduce_arxiv.md

Environment & Data

Dataset

Training

Inference

Evaluation

BM25+RM3 Baseline

Sentence Evidence

Result on Robust04

Replication Log

Files

reproduce_arxiv.md

Latest commit

History

reproduce_arxiv.md

File metadata and controls

Environment & Data

Dataset

Training

Inference

Evaluation

BM25+RM3 Baseline

Sentence Evidence

Result on Robust04

Replication Log