Model checkpoint available for download through Google Drive
Sample code setting up server to remote access model
Sample how to use:
from trainer import EmpathicSimilarityModel
from numpy import dot
from numpy.linalg import norm
import numpy as np
model_BART =EmpathicSimilarityModel.load_from_checkpoint("./lightning_logs/version_164/checkpoints/epoch=15-step=752.ckpt", model="BART", pooling="MEAN", bin=False, losses="MSE", use_pretrained=False)
# get embeddings
story1 = "this is my story"
e1 = model_BART(story1).detach().numpy().reshape(-1)
story2 = "this is my story 2"
e2 = model_BART(story2).detach().numpy().reshape(-1)
def get_cosine_similarity(a, b):
cos_sim = dot(a, b)/(norm(a)*norm(b))
return cos_sim
print(get_cosine_similarity(e1, e2))
/annotation
contains all MTurk annotation templates/data
contains all data folders for train, dev, test sets/models
contains all lightning modules and our pretrained BART modelEmpathicSimilarityModel
takes in a story pair (2 stories) and fine tunes on empathic similarity scoreEmpathicSummaryModel
takes in a single story and fine tunes on empathy reasons (main event + emotion description + moral)
/config
contains yaml config files for different model training settings/user_study
contains the frontend and server side code for our user study interfacedataset.py
contains the dataloadersspecial_tokens.py
definitions of special tokenstrainer.py
contains training code and input of config files for different model training settingsutils.py
contains extra model utilitiesevaluator.py
contains an evaluation class to compute all evaluation metrics
Data Source
: which data source the story came fromstory
: raw text of the storystory_formatted
: the story formatted with breaksstory_summary
: ChatGPT summarized storycomments
: (if pulled from social media), top level comments to the storyurl
: (if pulled from social media), the original url of the storypost_id
: (if pulled from social media), the original id of the storypost_time
: (if pulled from social media), the time the story was postedpost_score
: (if pulled from Reddit), the score of the posttoxicity_score
: toxicity score rated by DetoxifyWorkerId
: worker ID of annotatorLifetimeApprovalRate
: annotator's lifetime approval rateAcceptTime
: when the annotator accepted the HITSubmitTime
: when the annotator submitted the HITWorkTimeInSeconds
: how long the annotator took for the HITAge
: annotator ageGender
: annotator genderRace
: annotator raceArousal
: annotator's arousal before the task (1-10)Valence
: annotator's valence before the task (1-10)Main Event
: main event of the story as rated by human annotatorEmotion Description
: emotion of the story as rated by human annotatorMoral
: moral of the story as rated by human annotatorEmpathy Reasons
: reasons why people may empathize with the story as rated by human annotatorMain Event (gpt3.5)
: main event of the story as rated by ChatGPTEmotion Description (gpt3.5)
: emotion of the story as rated by ChatGPTMoral (gpt3.5)
: moral of the story as rated by ChatGPTEmpathy Reasons (gpt3.5)
: reasons why people may empathize with the story as rated by ChatGPTEmpathizable
: how generally "empathizable" the story isWell-Written
: how well-written the story isfake_score
: how likely the post is written by AI tools, as predicted by the Writer AI Content Detectornum_sentences
: number of sentences in the storynum_words
: number of words in the storynum_sentences_event
: number of sentences in the eventnum_words_event
: number of words in the eventnum_sentences_emotion
: number of sentences in the emotionnum_words_emotion
: number of words in the emotionnum_sentences_moral
: number of sentences in the moralnum_words_moral
: number of words in the moralnum_sentences_empathy_reasons
: number of sentences in the empathy reasonsnum_words_empathy_reasons
: number of words in the empathy reasons
pairs
: pair ID (matches with story file index)binned
: which sampled bin the pair belongs to (based on SBERT sampling)story_A
: first story in story pairstory_B
: second story in story pairstory_A_summary
: summary of first story in story pairstory_B_summary
: summary of second story in story pairEmpathic Similarity (gpt3.5)
: empathic similarity score as rated by ChatGPTEmpathic Similarity Binned (gpt3.5)
: binned empathic similarity score as rated by ChatGPTEmpathic Similarity Reasons (gpt3.5)
: reasons why two stories are empathically similar as rated by ChatGPTsimilarity_empathy_human_AGG
: empathic similarity score as rated by human annotatorssimilarity_event_human_AGG
: event similarity score as rated by human annotatorssimilarity_emotion_human_AGG
: emotion similarity score as rated by human annotatorssimilarity_moral_human_AGG
: moral similarity score as rated by human annotators