empathic-stories

Latest model checkpoint available here:

Model checkpoint available for download through Google Drive

Sample code setting up server to remote access model

Sample how to use:

from trainer import EmpathicSimilarityModel
from numpy import dot
from numpy.linalg import norm
import numpy as np

model_BART =EmpathicSimilarityModel.load_from_checkpoint("./lightning_logs/version_164/checkpoints/epoch=15-step=752.ckpt", model="BART", pooling="MEAN", bin=False, losses="MSE", use_pretrained=False)

# get embeddings
story1 = "this is my story"
e1 = model_BART(story1).detach().numpy().reshape(-1)

story2 = "this is my story 2"
e2 = model_BART(story2).detach().numpy().reshape(-1)

def get_cosine_similarity(a, b):
    cos_sim = dot(a, b)/(norm(a)*norm(b))
    return cos_sim

print(get_cosine_similarity(e1, e2))

File Structure

/annotation contains all MTurk annotation templates
/data contains all data folders for train, dev, test sets
/models contains all lightning modules and our pretrained BART model
- EmpathicSimilarityModel takes in a story pair (2 stories) and fine tunes on empathic similarity score
- EmpathicSummaryModel takes in a single story and fine tunes on empathy reasons (main event + emotion description + moral)
/config contains yaml config files for different model training settings
/user_study contains the frontend and server side code for our user study interface
dataset.py contains the dataloaders
special_tokens.py definitions of special tokens
trainer.py contains training code and input of config files for different model training settings
utils.py contains extra model utilities
evaluator.py contains an evaluation class to compute all evaluation metrics

Dataset Overview

Stories

Data Source: which data source the story came from
story: raw text of the story
story_formatted: the story formatted with breaks
story_summary: ChatGPT summarized story
comments: (if pulled from social media), top level comments to the story
url: (if pulled from social media), the original url of the story
post_id: (if pulled from social media), the original id of the story
post_time: (if pulled from social media), the time the story was posted
post_score: (if pulled from Reddit), the score of the post
toxicity_score: toxicity score rated by Detoxify
WorkerId: worker ID of annotator
LifetimeApprovalRate: annotator's lifetime approval rate
AcceptTime: when the annotator accepted the HIT
SubmitTime: when the annotator submitted the HIT
WorkTimeInSeconds: how long the annotator took for the HIT
Age: annotator age
Gender: annotator gender
Race: annotator race
Arousal: annotator's arousal before the task (1-10)
Valence: annotator's valence before the task (1-10)
Main Event: main event of the story as rated by human annotator
Emotion Description: emotion of the story as rated by human annotator
Moral: moral of the story as rated by human annotator
Empathy Reasons: reasons why people may empathize with the story as rated by human annotator
Main Event (gpt3.5): main event of the story as rated by ChatGPT
Emotion Description (gpt3.5): emotion of the story as rated by ChatGPT
Moral (gpt3.5): moral of the story as rated by ChatGPT
Empathy Reasons (gpt3.5): reasons why people may empathize with the story as rated by ChatGPT
Empathizable: how generally "empathizable" the story is
Well-Written: how well-written the story is
fake_score: how likely the post is written by AI tools, as predicted by the Writer AI Content Detector
num_sentences: number of sentences in the story
num_words: number of words in the story
num_sentences_event: number of sentences in the event
num_words_event: number of words in the event
num_sentences_emotion: number of sentences in the emotion
num_words_emotion: number of words in the emotion
num_sentences_moral: number of sentences in the moral
num_words_moral: number of words in the moral
num_sentences_empathy_reasons: number of sentences in the empathy reasons
num_words_empathy_reasons: number of words in the empathy reasons

Story Pairs

pairs: pair ID (matches with story file index)
binned: which sampled bin the pair belongs to (based on SBERT sampling)
story_A: first story in story pair
story_B: second story in story pair
story_A_summary: summary of first story in story pair
story_B_summary: summary of second story in story pair
Empathic Similarity (gpt3.5): empathic similarity score as rated by ChatGPT
Empathic Similarity Binned (gpt3.5): binned empathic similarity score as rated by ChatGPT
Empathic Similarity Reasons (gpt3.5): reasons why two stories are empathically similar as rated by ChatGPT
similarity_empathy_human_AGG: empathic similarity score as rated by human annotators
similarity_event_human_AGG: event similarity score as rated by human annotators
similarity_emotion_human_AGG: emotion similarity score as rated by human annotators
similarity_moral_human_AGG: moral similarity score as rated by human annotators

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

empathic-stories

Latest model checkpoint available here:

File Structure

Dataset Overview

Stories

Story Pairs

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
annotation		annotation
config		config
data		data
models		models
user_study		user_study
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
evaluator.py		evaluator.py
special_tokens.py		special_tokens.py
trainer.py		trainer.py
utils.py		utils.py

mitmedialab/empathic-stories

Folders and files

Latest commit

History

Repository files navigation

empathic-stories

Latest model checkpoint available here:

File Structure

Dataset Overview

Stories

Story Pairs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages