Skip to content

MariaSL/kglm-model

This branch is 9 commits behind rloganiv/kglm-model:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fd3ee06 · Jun 25, 2019
Jan 31, 2019
Feb 22, 2019
May 27, 2019
Feb 24, 2019
Jan 11, 2019
Feb 7, 2019
Jan 28, 2019
Feb 12, 2019
Jun 25, 2019
May 27, 2019
Dec 17, 2018
Feb 6, 2019

Repository files navigation

Knowledge Graph Language Model

This repo contains an implementation of the KGLM model described in "Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling", Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner and Sameer Singh, ACL 2019 [arXiv].

Setup

You will need Python 3.5+. Dependencies can be installed by running:

pip install -r requirements.txt

Data

KGLM is trained on the Linked WikiText-2 dataset which can be downloaded at https://rloganiv.github.io/linked-wikitext-2.

Additionally, you will need embeddings for entities/relations in the Wikidata knowledge graph, as well as access to the knowledge graph itself (in order to look up entity aliases/related entities). For convenience, we provide pre-trained embeddings and pickled dictionaries containing the relevant portions of Wikidata here.

Training

To train the model run:

allennlp train [path to config] -s [path to save checkpoint to] --include-package kglm

example model configurations are provided in the experiments directory.

Perplexity Evaluation

To estimate perplexity of a trained model on held-out data run:

python -m kglm.run evaluate-perplexity \
    [model_archive_file] \
    [sampler_archive_file] \
    [input_file]

where:

  • model_archive_file - Trained (generative) model checkpoint. This is the model whose perplexity will be evaluated.
  • sampler_archive_file - Trained (discriminative) model checkpoint. This is the model used to create annotations during importance sampling. See Section 4 of the paper for more details about importance sampling.
  • input_file - Path to dataset to measure perplexity on.

Sentence Completion

To perform sentence completion experiments run:

allennlp predict --predictor cloze [model_archive_file] [input_file]

where

  • model_archive_file - Trained (generative) model checkpoint. This is the model whose perplexity will be evaluated.
  • input_file - Path to dataset to measure perplexity on.

About

Code for KGLM paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.2%
  • Jsonnet 5.1%
  • Other 0.7%