Knowledge Graph Language Model

This repo contains an implementation of the KGLM model described in "Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling", Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner and Sameer Singh, ACL 2019 [arXiv].

Setup

You will need Python 3.5+. Dependencies can be installed by running:

pip install -r requirements.txt

Data

KGLM is trained on the Linked WikiText-2 dataset which can be downloaded at https://rloganiv.github.io/linked-wikitext-2.

Additionally, you will need embeddings for entities/relations in the Wikidata knowledge graph, as well as access to the knowledge graph itself (in order to look up entity aliases/related entities). For convenience, we provide pre-trained embeddings and pickled dictionaries containing the relevant portions of Wikidata here.

Training

To train the model run:

allennlp train [path to config] -s [path to save checkpoint to] --include-package kglm

example model configurations are provided in the experiments directory.

Perplexity Evaluation

To estimate perplexity of a trained model on held-out data run:

python -m kglm.run evaluate-perplexity \
    [model_archive_file] \
    [sampler_archive_file] \
    [input_file]

where:

model_archive_file - Trained (generative) model checkpoint. This is the model whose perplexity will be evaluated.
sampler_archive_file - Trained (discriminative) model checkpoint. This is the model used to create annotations during importance sampling. See Section 4 of the paper for more details about importance sampling.
input_file - Path to dataset to measure perplexity on.

Sentence Completion

To perform sentence completion experiments run:

allennlp predict --predictor cloze [model_archive_file] [input_file]

where

model_archive_file - Trained (generative) model checkpoint. This is the model whose perplexity will be evaluated.
input_file - Path to dataset to measure perplexity on.

Name	Name	Last commit message	Last commit date
Latest commit rloganiv Fixed small typos in the README Jun 25, 2019 fd3ee06 · Jun 25, 2019 History 189 Commits
build_tools/travis	build_tools/travis	Merged 'origin/master' into 'alias-lookup' and added tests/bugfixes	Jan 31, 2019
experiments	experiments	Added discriminative version of KGLM	Feb 22, 2019
kglm	kglm	SPEEDUP: Tensorization and lookup in AliasDatabase (rloganiv#40 )	May 27, 2019
.dockerignore	.dockerignore	Updated .dockerignore	Feb 24, 2019
.gitignore	.gitignore	WIP: 01/11/19	Jan 11, 2019
.pylintrc	.pylintrc	Updated alias copynet	Feb 7, 2019
.travis.yml	.travis.yml	Add barebones travisci config	Jan 28, 2019
Dockerfile	Dockerfile	Big mama jama commit	Feb 12, 2019
README.md	README.md	Fixed small typos in the README	Jun 25, 2019
codecov.yml	codecov.yml	Update codecov.yml	May 27, 2019
pytest.ini	pytest.ini	Basic project skeleton	Dec 17, 2018
requirements.txt	requirements.txt	Updated `requirements.txt` to use allennlp w/ native pytorch gradient…	Feb 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph Language Model

Setup

Data

Training

Perplexity Evaluation

Sentence Completion

About

Releases

Packages

Languages

MariaSL/kglm-model

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph Language Model

Setup

Data

Training

Perplexity Evaluation

Sentence Completion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages