DynaSent is an English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.
Christopher Potts, Zhengxuan Wu, Atticus Geiger, and Douwe Kiela. 2020. DynaSent: A dynamic benchmark for sentiment analysis. Ms., Stanford University and Facebook AI Research.
@article{potts-etal-2020-dynasent,
title={{DynaSent}: A Dynamic Benchmark for Sentiment Analysis},
author={Potts, Christopher and Wu, Zhengxuan and Geiger, Atticus and Kiela, Douwe},
journal={arXiv preprint arXiv:2012.15349},
url={https://arxiv.org/abs/2012.15349},
year={2020}}
The dataset is dynasent-v1.1.zip, which is included in this repository. v1.1
differs from v1
only in that v1.1
has proper unique ids for Round 1 and corrects a bug that led to some non-unique ids in Round 2. There are no changes to the examples or other metadata.
The dataset consists of two rounds, each with a train/dev/test split:
dynasent-v1.1-round01-yelp-train.jsonl
dynasent-v1.1-round01-yelp-dev.jsonl
dynasent-v1.1-round01-yelp-test.jsonl
dynasent-v1.1-round02-dynabench-train.jsonl
dynasent-v1.1-round02-dynabench-dev.jsonl
dynasent-v1.1-round02-dynabench-test.jsonl
The dataset also contains a version of the Stanford Sentiment Treebank dev set in our format with labels from our validation task:
sst-dev-validated.jsonl
This function can be used to load any subset of the files:
import json
def load_dataset(*src_filenames, labels=None):
data = []
for filename in src_filenames:
with open(filename) as f:
for line in f:
d = json.loads(line)
if labels is None or d['gold_label'] in labels:
data.append(d)
return data
For example, to create a Round 1 train set restricting to examples with ternary gold labels:
import os
r1_train_filename = os.path.join('dynasent-v1.1', 'dynasent-v1.1-round01-yelp-train.jsonl')
ternary_labels = ('positive', 'negative', 'neutral')
r1_train = load_dataset(r1_train_filename, labels=ternary_labels)
X_train, y_train = zip(*[(d['sentence'], d['gold_label']) for d in r1_train])
DynaSent rounds can also be accessed directly using the HuggingFace Datasets library:
"""
Make sure you install the Datasets library using:
pip install datasets
"""
from datasets import load_dataset
r1_dataset = load_dataset("dynabench/dynasent", "dynabench.dynasent.r1.all")
r2_dataset = load_dataset("dynabench/dynasent", "dynabench.dynasent.r2.all")
{'hit_ids': ['y5238'],
'sentence': 'Roto-Rooter is always good when you need someone right away.',
'indices_into_review_text': [0, 60],
'model_0_label': 'positive',
'model_0_probs': {'negative': 0.01173639390617609,
'positive': 0.7473671436309814,
'neutral': 0.24089649319648743},
'text_id': 'r1-0000001',
'review_id': 'IDHkeGo-nxhqX4Exkdr08A',
'review_rating': 1,
'label_distribution': {'positive': ['w130', 'w186', 'w207', 'w264', 'w54'],
'negative': [],
'neutral': [],
'mixed': []},
'gold_label': 'positive'}
Details:
'hit_ids'
: List of Amazon Mechanical Turk Human Interface Tasks (HITs) in which this example appeared during validation. The values are anonymized but used consistently throughout the dataset.'sentence'
: The example text.'indices_into_review_text':
indices of'sentence'
into the original review in the Yelp Academic Dataset.'model_0_label'
: prediction of Model 0 as described in the paper. The possible values are'positive'
,'negative'
, and'neutral'
.'model_0_probs'
: probability distribution predicted by Model 0. The keys are('positive', 'negative', 'neutral')
and the values are floats.'text_id'
: unique identifier for this entry.'review_id'
: review-level identifier for the review from the Yelp Academic Dataset containing'sentence'
.'review_rating'
: review-level star-rating for the review containing'sentence'
in the Yelp Academic Dataset. The possible values are1
,2
,3
,4
, and5
.'label_distribution':
response distribution from the MTurk validation task. The keys are('positive', 'negative', 'neutral')
and the values are lists of anonymized MTurk ids, which are used consistently throughout the dataset.'gold_label'
: the label chosen by at least three of the five workers if there is one (possible values:'positive'
,'negative'
, 'neutral'
, and'mixed'
), elseNone
.
Here is some code one could use to augment a dataset, as loaded by load_dataset
, with a field giving the full review text from the Yelp Academic Dataset:
import json
def index_yelp_reviews(yelp_src_filename='yelp_academic_dataset_review.json'):
index = {}
with open(yelp_src_filename) as f:
for line in f:
d = json.loads(line)
index[d['review_id']] = d['text']
return index
yelp_index = index_yelp_reviews()
def add_review_text_round1(dataset, yelp_index):
for d in dataset:
review_text = yelp_index[d['text_id']]
# Check that we can find the sentence as expected:
start, end = d['indices_into_review_text']
assert review_text[start: end] == d['sentence']
d['review_text'] = review_text
return dataset
{'hit_ids': ['y22661'],
'sentence': "We enjoyed our first and last meal in Toronto at Bombay Palace, and I can't think of a better way to book our journey.",
'sentence_author': 'w250',
'has_prompt': True,
'prompt_data': {'indices_into_review_text': [2093, 2213],
'review_rating': 5,
'prompt_sentence': "Our first and last meals in Toronto were enjoyed at Bombay Palace and I can't think of a better way to bookend our trip.",
'review_id': 'Krm4kSIb06BDHternF4_pA'},
'model_1_label': 'positive',
'model_1_probs': {'negative': 0.29140257835388184,
'positive': 0.6788994669914246,
'neutral': 0.029697999358177185},
'text_id': 'r2-0000001',
'label_distribution': {'positive': ['w43', 'w26', 'w155', 'w23'],
'negative': [],
'neutral': [],
'mixed': ['w174']},
'gold_label': 'positive'}
Details:
'hit_ids'
: List of Amazon Mechanical Turk Human Interface Tasks (HITs) in which this example appeared during validation. The values are anonymized but used consistently throughout the dataset.'sentence'
: The example text.'sentence_author'
: Anonymized MTurk id of the worker who wrote'sentence'
. These are from the same family of ids as used in'label_distribution'
, but this id is never one of the ids in'label_distribution'
for this example.'has_prompt'
:True
if the'sentence'
was written with a Prompt elseFalse
.'prompt_data'
: None if'has_prompt'
is False, else:'indices_into_review_text'
: indices of'prompt_sentence'
into the original review in the Yelp Academic Dataset.'review_rating'
: review-level star-rating for the review containing'sentence'
in the Yelp Academic Dataset.'prompt_sentence'
: The prompt text.'review_id'
: review-level identifier for the review from the Yelp Academic Dataset containing'prompt_sentence'
.
'model_1_label'
: prediction of Model 1 as described in the paper. The possible values are'positive'
,'negative'
, and 'neutral'
.'model_1_probs'
: probability distribution predicted by Model 1. The keys are('positive', 'negative', 'neutral')
and the values are floats.'text_id'
: unique identifier for this entry.'label_distribution'
: response distribution from the MTurk validation task. The keys are('positive', 'negative', 'neutral')
and the values are lists of anonymized MTurk ids, which are used consistently throughout the dataset.'gold_label'
: the label chosen by at least three of the five workers if there is one (possible values:'positive'
,'negative'
, 'neutral'
, and'mixed'
), elseNone
.
To add the review texts to the 'prompt_data'
field, one can extend the code above for Round 1 with the following function:
def add_review_text_round2(dataset, yelp_index):
for d in dataset:
if d['has_prompt']:
prompt_data = d['prompt_data']
review_text = yelp_index[prompt_data['review_id']]
# Check that we can find the sentence as expected:
start, end = prompt_data['indices_into_review_text']
assert review_text[start: end] == prompt_data['prompt_sentence']
prompt_data['review_text'] = review_text
return dataset
{'hit_ids': ['s20533'],
'sentence': '-LRB- A -RRB- n utterly charming and hilarious film that reminded me of the best of the Disney comedies from the 60s.',
'tree': '(4 (2 (1 -LRB-) (2 (2 A) (3 -RRB-))) (4 (4 (2 n) (4 (3 (2 utterly) (4 (3 (4 charming) (2 and)) (4 hilarious))) (3 (2 film) (3 (2 that) (4 (4 (2 (2 reminded) (3 me)) (4 (2 of) (4 (4 (2 the) (4 best)) (2 (2 of) (3 (2 the) (3 (3 Disney) (2 comedies))))))) (2 (2 from) (2 (2 the) (2 60s)))))))) (2 .)))',
'text_id': 'sst-dev-validate-0000437',
'sst_label': '4',
'label_distribution': {'positive': ['w207', 'w3', 'w840', 'w135', 'w26'],
'negative': [],
'neutral': [],
'mixed': []},
'gold_label': 'positive'}
Details:
'hit_ids'
: List of Amazon Mechanical Turk Human Interface Tasks (HITs) in which this example appeared during validation. The values are anonymized but used consistently throughout the dataset.'sentence'
: The example text.'tree'
: The parsetree for the example as given in the SST distribution.'text_id'
: A new identifier for this example.'sst_label'
: The root-node label from the SST. Possible values'0'
,'1'
'2'
,'3'
, and'4'
.'label_distribution':
response distribution from the MTurk validation task. The keys are('positive', 'negative', 'neutral')
and the values are lists of anonymized MTurk ids, which are used consistently throughout the dataset.'gold_label'
: the label chosen by at least three of the five workers if there is one (possible values:'positive'
,'negative'
, 'neutral'
, and'mixed'
), elseNone
.
Model 0 and Model 1 from the paper are available here:
https://drive.google.com/drive/folders/1dpKrjNJfAILUQcJPAFc5YOXUT51VEjKQ?usp=sharing
This repository includes a Python module dynasent_models.py
that provides a Hugging Face-based wrapper around these (PyTorch) models. Simple examples:
import os
from dynasent_models import DynaSentModel
# `dynasent_model0` should be downloaded from the above Google Drive link and
# placed in the `models` directory. `dynasent_model1` works the same way.
model = DynaSentModel(os.path.join('models', 'dynasent_model0.bin'))
examples = [
"superb",
"They said the experience would be amazing, and they were right!",
"They said the experience would be amazing, and they were wrong!"]
model.predict(examples)
This should return the list ['positive', 'positive', 'negative']
.
The predict_proba
method provides access to the predicted distribution over the class labels; see the demo at the bottom of dynasent_models.py
for details.
The following code uses load_dataset
from above to reproduce the Round 2 dev-set report on Model 0 from the paper:
import os
from sklearn.metrics import classification_report
from dynasent_models import DynaSentModel
dev_filename = os.path.join('dynasent-v1.1', 'dynasent-v1.1-round02-dynabench-dev.jsonl')
dev = load_dataset(dev_filename)
X_dev, y_dev = zip(*[(d['sentence'], d['gold_label']) for d in dev])
model = DynaSentModel(os.path.join('models', 'dynasent_model0.bin'))
preds = model.predict(X_dev)
print(classification_report(y_dev, preds, digits=3))
For a fuller report on these models, see our paper and our model card.
The following notebooks reproduce the dataset statistics, figures, and random example selections from the paper:
analyses_comparative.ipynb
analysis_round1.ipynb
analysis_round2.ipynb
analysis_sst_dev_revalidate.ipynb
The Python module dynasent_utils.py
contains functions that support those notebooks, and dynasent.mplstyle
helps with styling the plots.
The Datasheet for our dataset:
The Model Card for our models:
The module test_dataset.py
contains PyTest tests for the dataset. To use it, run
py.test -vv test_dataset.py
in the root directory of this repository.
The file validation-hit-contents.html
contains the HTML/Javascript used in the validation task. It could be used directly on Amazon Mechanical Turk, by simply pasting its contents into the usual HIT creation window.
DynaSent has a Creative Commons Attribution 4.0 International License.