Accompanying repository for our EMNLP 2017 paper (full paper). It contains the code to replicate the experiments and the pre-trained models for sentence-level relation extraction.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Please use the following citation:
@inproceedings{TUD-CS-2017-0119,
title = {Context-Aware Representations for Knowledge Base Relation Extraction},
author = {Sorokin, Daniil and Gurevych, Iryna},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages = {(to appear)},
year = {2017},
location = {Copenhagen, Denmark},
}
We demonstrate that for sentence-level relation extraction it is beneficial to consider other relations in the sentential context while predicting the target relation. Our architecture uses an LSTM-based encoder to jointly learn representations for all relations in a single sentence. We combine the context representations with an attention mechanism to make the final prediction. We use the Wikidata knowledge base to construct a dataset of multiple relations per sentence and to evaluate our approach. Compared to a baseline system, our method results in an average error reduction of 24% on a held-out set of relations.
Please, refer to the paper for more details.
The dataset described in the paper can be found here:
If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.
- Daniil Sorokin, [email protected]
- https://www.informatik.tu-darmstadt.de/ukp/
- https://www.tu-darmstadt.de
You can try out the model on single sentences in our demo:
http://semanticparsing.ukp.informatik.tu-darmstadt.de:5000/relation-extraction/
relation_extraction/
├── eval.py
├── model-train-and-test.py
├── notebooks
├── optimization_space.py
├── core
│ ├── parser.py
│ ├── embeddings.py
│ ├── entity_extraction.py
│ └── keras_models.py
├── relextserver
│ └── server.py
├── graph
│ ├── graph_utils.py
│ ├── io.py
│ └── vis_utils.py
├── stanford_tag_dataset.py
└── evaluation
└── metrics.py
resources/
├── properties-with-labels.txt
└── property_blacklist.txt
File | Description |
---|---|
relation_extraction/ | Main Python module |
relation_extraction/core | Models for joint relation extraction |
relation_extraction/relextserver | The code for the web demo. |
relation_extraction/graph | IO and processing for relation graphs |
relation_extraction/evaluation | Evaluation metrics |
resources/ | Necessary resources |
data/curves/ | The precision-recall curves for each model on the held out data |
-
We recommend that you setup a new pip environment first: http://docs.python-guide.org/en/latest/dev/virtualenvs/
-
Check out the repository and run:
pip3 install -r requirements.txt
- Set the Keras (deep learning library) backend to TensorFlow with the following command:
export KERAS_BACKEND=tensorflow
You can also permanently change Keras backend (read more: https://keras.io/backend/). Note that in order to reproduce the experiments in the paper you have to use Theano as a backend instead.
-
Download the data, if you want to replicate the experiments from the paper. Extract the archive inside
emnlp2017-relation-extraction/data/wikipedia-wikidata/
. The data was preprocessed using Stanford Core NLP 3.7.0 models. Seestanford_tag_dataset.py
for more information. -
Download the GloVe embeddings, glove.6B.zip and put them into the folder
emnlp2017-relation-extraction/resources/glove/
. You can change the path to word embeddings in themodel_params.json
file if needed.
- You can download the models that were used in the experiments here
- See
Using pre-trained models.ipynb
for a detailed example on how to use the pre-trained models in your code
To reproduce the experiments please refer to the version of the code that was published with the paper: tag emnlp17
In any other case, we recommend using the most recent version.
-
Complete the setup above
-
Run
python model_train.py
inemnlp2017-relation-extraction/relation_extraction/
to see the list of parameters -
If you put the data into the default folders you can train the
ContextWeighted
model with the following command:
python model_train.py model_ContextWeighted train ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-training.02_06.json ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-validation.02_06.json
- Run the following command to compute the precision-recall curves:
python precision_recall_curves.py model_ContextWeighted ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-held-out.02_06.json
- The web demo code is provided for information only. It is not meant to be run elsewhere.
- Python 3.4
- Keras 2.1.5
- TensorFlow 1.6.0
- See requirements.txt for library requirements.
- Apache License Version 2.0