Skip to content

Latest commit

 

History

History
424 lines (371 loc) · 25.1 KB

README.md

File metadata and controls

424 lines (371 loc) · 25.1 KB

Capitol Letters: Detecting Differential Attitudes in Legislator Communications Using NLP

Repo contains code/figures created for my master's thesis which looked at legislator comunication through one-minute pseeches from the House floor and on Twitter.

Description

This repo contains code necessary for end-to-end workflow for scraping and analyzing one-minute speech data using gpt-3.5-turbo.

First, I provide code to collect all links to C-SPAN videos which mention a given phrase, in this analysis that phrase is "one-minute" (scrape 117th one minute speeches.ipynb). I also provide the code I used to isolate just the one-minute speeches from the transcript data, since the transcripts contains other House session communication (clean_transcript_data_11*.ipynb). The result fo this cleaning can be found in the folder, One Minute Congresional Speech Data. After cleaning the transcript data the goal is to then score these speeches for valece, arousal, confidence, empathy and smypathy. This is done using open ai's gpt-3.5-turbo (openapi_getVACES_115.ipynb). Similar scoring is done for 116th and 117th congresses. The resulting scores are availible in the VACES output directory, which has each spceech, the speaker, the party identification of the speaker as well as it's score for each VACES measure.

The goal of this project was to explore differntial emotion by Members of Congress, so KS-testing is done on the distributions by party to determine if the distrubtions differ (ks-testing VAC_11*.ipynb). Figures of the dsitributions are also created and combined in 'make kde-figure-115-116-117.ipynb.'

One could use this code to expand the work done here to analyze House Session transcripts on any topic.

Components of repo

  • scraping and cleaning code
    • scrape 117th one minute speeches.ipynb, program which uses selenium to scrape the links to all CSPAN house sessions which mention one-minute as well as scraping the transcripts of those House sessions using link to the CSPAN video. Similar collection was done for 115th and 116th Congresses.
    • clean_transcript_data_115.ipynb, code which isolates just one-minute speeches from House session transcripts for 115th Congress
    • clean_transcript_data_116.ipynb, code which isolates just one-minute speeches from House session transcripts for 116th Congress
    • clean_transcript_data_117.ipynb, code which isolates just one-minute speeches from House session transcripts for 117th Congress
  • One Minute Congresional Speech Data * cleaned_transcript_data_115th.csv, cleaned_transcript_data_116th.csv, cleaned_transcript_data_117th.csv resulting output from cleaning pipeline above.
  • open ai code
    • openapi_getVACES_115.ipynb, code which scores one-minute speeches using gpt-3.5-turbo
  • VACES output
    • Contains collected one-minute speeches for 115th, 116th and 117th Congresses as well as their Valence, Arousal, Confidence, Empathy and Sympathy which was scored using OpenAI gpt-3.5-turbo in openapi_getVACES_115.ipynb
  • figure and ks code
    • kde figures and ks testing
      • ks-testing VAC_115.ipynb, ks testing for 115th Congress and figure generation
      • ks-testing VAC_116.ipynb, ks testing for 116th Congress and figure generation
      • ks-testing VAC_117.ipynb, ks testing for 117th Congress and figure generation
      • make kde-figure-115-116-117.ipynb, combine figures into one.
    • sentiment figures
      • cleaning_and_sentiment_scoring_116.ipynb, cleans Tweets and scores sentiment for 116th Congress
      • cleaning_and_sentiment_scoring_117.ipynb, cleans Tweets and scores sentiment for 117th Congress

Twitter Analysis Replication

I can't provide the raw tweet data due to the Twitter Academic API usage agreement. Instead, I provide a file which contains the date of the tweet, the sentiment score of a tweet as scored by VADER, and the corresponding political party of the tweets author. This is found in the folder CapitolLetters/twitter sentiment data, tweets_115.csv, tweets_116.csv, and tweets_117.csv. This then is enough to replicate the figures found in my thesis, i.e.

figure and ks code/sentiment figures/cleaning_and_sentiment_scoring_115.ipynb

figure and ks code/sentiment figures/cleaning_and_sentiment_scoring_116.ipynb

figure and ks code/sentiment figures/cleaning_and_sentiment_scoring_117.ipynb

VACES Analysis Replication

scraping data

The data was scraped from the C-SPAN website, using selenium.

To capture the transcript you'll need a C-SPAN account (free to make). Then navigate to their API page and use the GET \mentions feature.

Then appropriate min date, max date, search term ("one minute speeches" for my use case), as well as session type= 25 (House sessions). This will output a json with links to all videos which mention the search term in the given range. See scrape 117th one minute speeches.ipynb for code to capture these links and the corresponding transcripts.

One you have the transcraipts from here, for me, the next step was to isolate the one-minute speeches data, see clean_transcript_data_115.ipynb for example.

The resulting, cleaned, one minute speech data can be found

  • cleaned_transcript_data_115th.csv
  • cleaned_transcript_data_116th.csv
  • cleaned_transcript_data_117th.csv

using open ai to score data

From here, will score one-minute speeches using gpt-3.5-turbo, see openapi_getVACES_115.ipynb for full example.

Most notably, you will need your own API key to be able to replicate this analysis.

Essentially, need to pass it the text you want to score, the definition of the emotion dimention, as well as the dimensions name.

def label_text_using_gpt(text, defn, name):
    chat_completion = client.chat.completions.create(
        messages=[
        {
            "role": "system", "content": "You are a helpful assistant for labelling text data."
        }, {
            "role": "user", "content": f"The following is part of a congressional speech. Given this definition of '{name}': '{defn}' and this text: '{text}', provide a score for the '{name}' of the text between -1 and 1. Please give me ONLY a number between -1 and 1 as your response."
         }
        ],
        model="gpt-3.5-turbo",
        )
    return chat_completion.choices[0].message.content

Example input:

valence= "the degree of pleasure/positivity i.e. valence is the positive--negative or pleasure--displeasure dimension, where high valence (1) indicates the content of text is all positive or pleasurable and low valence (-1) indicates the content of the text is not positive or pleasurable" 

text_ex = "My name is Cara Nix, I like ice cream and cookies and Taylor Swift" 

result= safe_label_text_using_gpt_v2(text_ex, valence, "valence")

The result is then a score for the valence of the text!

The scores for valence, arousal, confidence, empathy, and sympathy for the one-minute speeches for the 115-117th Congresses is availible VACES output/speeches_VACES_115.csv, VACES output/speeches_VAC_116.csv, VACES output/speeches_ES_116.csv, and VACES output/speeches_VACES_117.csv

analyzing results

After passing in all the one-minute speeches and getting scores for valence, arousal, confidence, empathy, and sympathy for 115-117th Congresses it is now time to analyze these scores.

path to folder: figure and ks code/kde figures and ks testing

  • ks-testing VAC_115.ipynb
  • ks-testing VAC_116.ipynb
  • ks-testing VAC_117.ipynb

Then to combine these figures, use make kde-figure-115-116-117.ipynb.

Authors

Contributors names and contact info

ex. Cara Nix, [email protected]

Version History

Acknowledgments

Inspiration, code snippets, etc.

Packages in conda enviornment w/ versions

absl-py                   2.1.0                    pypi_0    pypi
aiofiles                  22.1.0          py311hca03da5_0  
aiosqlite                 0.18.0          py311hca03da5_0  
annotated-types           0.6.0                    pypi_0    pypi
anyio                     3.5.0           py311hca03da5_0  
appnope                   0.1.2           py311hca03da5_1001  
argon2-cffi               21.3.0             pyhd3eb1b0_0  
argon2-cffi-bindings      21.2.0          py311h80987f9_0  
asttokens                 2.0.5              pyhd3eb1b0_0  
astunparse                1.6.3                    pypi_0    pypi
attrs                     22.1.0          py311hca03da5_0  
babel                     2.11.0          py311hca03da5_0  
backcall                  0.2.0              pyhd3eb1b0_0  
beautifulsoup4            4.12.2          py311hca03da5_0  
bleach                    4.1.0              pyhd3eb1b0_0  
blis                      0.7.11                   pypi_0    pypi
bokeh                     3.3.4                    pypi_0    pypi
brotli                    1.1.0                    pypi_0    pypi
brotlipy                  0.7.0           py311h80987f9_1002  
bzip2                     1.0.8                h620ffc9_4  
ca-certificates           2023.08.22           hca03da5_0  
cachetools                5.3.2                    pypi_0    pypi
catalogue                 2.0.10                   pypi_0    pypi
certifi                   2023.7.22       py311hca03da5_0  
cffi                      1.15.1          py311h80987f9_3  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
clean-text                0.6.0                    pypi_0    pypi
cleantext                 1.1.4                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpathlib              0.16.0                   pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
colorcet                  3.0.1                    pypi_0    pypi
comm                      0.1.2           py311hca03da5_0  
confection                0.1.4                    pypi_0    pypi
contourpy                 1.2.0                    pypi_0    pypi
cryptography              41.0.3          py311hd4332d6_0  
cycler                    0.12.1                   pypi_0    pypi
cymem                     2.0.8                    pypi_0    pypi
cyrus-sasl                2.1.28               h9131b1a_1  
dask                      2024.1.1                 pypi_0    pypi
datashader                0.16.0                   pypi_0    pypi
debugpy                   1.6.7           py311h313beb8_0  
decorator                 5.1.1              pyhd3eb1b0_0  
defusedxml                0.7.1              pyhd3eb1b0_0  
emoji                     1.7.0                    pypi_0    pypi
entrypoints               0.4             py311hca03da5_0  
exceptiongroup            1.1.3                    pypi_0    pypi
executing                 0.8.3              pyhd3eb1b0_0  
ffmpeg                    1.4                      pypi_0    pypi
flatbuffers               23.5.26                  pypi_0    pypi
fonttools                 4.47.2                   pypi_0    pypi
fsspec                    2024.2.0                 pypi_0    pypi
ftfy                      6.1.3                    pypi_0    pypi
gast                      0.5.4                    pypi_0    pypi
gettext                   0.21.0               h13f89a0_1  
giflib                    5.2.1                h80987f9_3  
glib                      2.69.1               h514c7bf_2  
google-auth               2.27.0                   pypi_0    pypi
google-auth-oauthlib      1.2.0                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.60.1                   pypi_0    pypi
gst-plugins-base          1.14.1               h313beb8_1  
gstreamer                 1.14.1               h80987f9_1  
h11                       0.14.0                   pypi_0    pypi
h5py                      3.10.0                   pypi_0    pypi
holoviews                 1.18.2                   pypi_0    pypi
icu                       68.1                 hc377ac9_0  
idna                      3.4             py311hca03da5_0  
imageio                   2.33.1                   pypi_0    pypi
importlib-metadata        7.0.1                    pypi_0    pypi
ipykernel                 6.25.0          py311hb6e6a13_0  
ipython                   8.15.0          py311hca03da5_0  
ipython_genutils          0.2.0              pyhd3eb1b0_1  
ipywidgets                8.0.4           py311hca03da5_0  
jedi                      0.18.1          py311hca03da5_1  
jinja2                    3.1.2           py311hca03da5_0  
joblib                    1.3.2                    pypi_0    pypi
jpeg                      9e                   h80987f9_1  
json5                     0.9.6              pyhd3eb1b0_0  
jsonschema                4.17.3          py311hca03da5_0  
jupyter                   1.0.0           py311hca03da5_8  
jupyter_client            7.4.9           py311hca03da5_0  
jupyter_console           6.6.3           py311hca03da5_0  
jupyter_core              5.3.0           py311hca03da5_0  
jupyter_events            0.6.3           py311hca03da5_0  
jupyter_server            1.23.4          py311hca03da5_0  
jupyter_server_fileid     0.9.0           py311hca03da5_0  
jupyter_server_ydoc       0.8.0           py311hca03da5_1  
jupyter_ydoc              0.2.4           py311hca03da5_0  
jupyterlab                3.6.3           py311hca03da5_0  
jupyterlab_pygments       0.1.2                      py_0  
jupyterlab_server         2.22.0          py311hca03da5_0  
jupyterlab_widgets        3.0.5           py311hca03da5_0  
keras                     2.15.0                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
krb5                      1.20.1               hf3e1bf2_1  
langcodes                 3.3.0                    pypi_0    pypi
lazy-loader               0.3                      pypi_0    pypi
lerc                      3.0                  hc377ac9_0  
libclang                  16.0.6                   pypi_0    pypi
libclang13                14.0.6          default_h24352ff_1  
libcxx                    14.0.6               h848a8c0_0  
libdeflate                1.17                 h80987f9_0  
libedit                   3.1.20221030         h80987f9_0  
libffi                    3.4.4                hca03da5_0  
libiconv                  1.16                 h1a28f6b_2  
libllvm14                 14.0.6               h7ec7a93_3  
libpng                    1.6.39               h80987f9_0  
libpq                     12.15                h02f6b3c_1  
libsodium                 1.0.18               h1a28f6b_0  
libtiff                   4.5.1                h313beb8_0  
libwebp                   1.3.2                ha3663a8_0  
libwebp-base              1.3.2                h80987f9_0  
libxml2                   2.10.4               h372ba2a_0  
libxslt                   1.1.37               habca612_0  
linkify-it-py             2.0.3                    pypi_0    pypi
llvm-openmp               14.0.6               hc6e5704_0  
llvmlite                  0.42.0                   pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
lxml                      4.9.3           py311h50ffb84_0  
lz4-c                     1.9.4                h313beb8_0  
markdown                  3.5.2                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.1           py311h80987f9_0  
matplotlib                3.8.2                    pypi_0    pypi
matplotlib-inline         0.1.6           py311hca03da5_0  
mdit-py-plugins           0.4.0                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mistune                   0.8.4           py311h80987f9_1000  
ml-dtypes                 0.2.0                    pypi_0    pypi
multipledispatch          1.0.0                    pypi_0    pypi
murmurhash                1.0.10                   pypi_0    pypi
mutagen                   1.47.0                   pypi_0    pypi
mysql                     5.7.24               ha71a6ea_2  
nbclassic                 0.5.5           py311hca03da5_0  
nbclient                  0.5.13          py311hca03da5_0  
nbconvert                 6.5.4           py311hca03da5_0  
nbformat                  5.9.2           py311hca03da5_0  
ncurses                   6.4                  h313beb8_0  
nest-asyncio              1.5.6           py311hca03da5_0  
networkx                  3.2.1                    pypi_0    pypi
nltk                      3.8.1                    pypi_0    pypi
notebook                  6.5.4           py311hca03da5_1  
notebook-shim             0.2.2           py311hca03da5_0  
numba                     0.59.0                   pypi_0    pypi
numpy                     1.26.0                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.0.10               h1a28f6b_2  
opt-einsum                3.3.0                    pypi_0    pypi
outcome                   1.2.0                    pypi_0    pypi
packaging                 23.1            py311hca03da5_0  
pandas                    2.1.0                    pypi_0    pypi
pandocfilters             1.5.0              pyhd3eb1b0_0  
panel                     1.3.8                    pypi_0    pypi
param                     2.0.2                    pypi_0    pypi
parso                     0.8.3              pyhd3eb1b0_0  
partd                     1.4.1                    pypi_0    pypi
pcre                      8.45                 hc377ac9_0  
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pillow                    10.2.0                   pypi_0    pypi
pip                       23.2.1          py311hca03da5_0  
platformdirs              3.10.0          py311hca03da5_0  
ply                       3.11            py311hca03da5_0  
preshed                   3.0.9                    pypi_0    pypi
prometheus_client         0.14.1          py311hca03da5_0  
prompt-toolkit            3.0.36          py311hca03da5_0  
prompt_toolkit            3.0.36               hd3eb1b0_0  
protobuf                  4.23.4                   pypi_0    pypi
psutil                    5.9.0           py311h80987f9_0  
ptyprocess                0.7.0              pyhd3eb1b0_2  
pure_eval                 0.2.2              pyhd3eb1b0_0  
pyasn1                    0.5.1                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0  
pycryptodomex             3.19.0                   pypi_0    pypi
pyct                      0.5.0                    pypi_0    pypi
pydantic                  2.5.3                    pypi_0    pypi
pydantic-core             2.14.6                   pypi_0    pypi
pygments                  2.15.1          py311hca03da5_1  
pynndescent               0.5.11                   pypi_0    pypi
pyopenssl                 23.2.0          py311hca03da5_0  
pyparsing                 3.1.1                    pypi_0    pypi
pyqt                      5.15.7          py311h313beb8_0  
pyqt5-sip                 12.11.0         py311h313beb8_0  
pyrsistent                0.18.0          py311h80987f9_0  
pysocks                   1.7.1           py311hca03da5_0  
python                    3.11.5               hb885b13_0  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python-fastjsonschema     2.16.2          py311hca03da5_0  
python-json-logger        2.0.7           py311hca03da5_0  
pytz                      2023.3.post1    py311hca03da5_0  
pyviz-comms               3.0.1                    pypi_0    pypi
pyyaml                    6.0             py311h80987f9_1  
pyzmq                     23.2.0          py311h313beb8_0  
qt-main                   5.15.2               h9b4df51_9  
qt-webengine              5.15.9               h2903aaf_7  
qtconsole                 5.4.2           py311hca03da5_0  
qtpy                      2.2.0           py311hca03da5_0  
qtwebkit                  5.212                h19f419d_5  
readline                  8.2                  h1a28f6b_0  
regex                     2023.12.25               pypi_0    pypi
requests                  2.31.0          py311hca03da5_0  
requests-oauthlib         1.3.1                    pypi_0    pypi
rfc3339-validator         0.1.4           py311hca03da5_0  
rfc3986-validator         0.1.1           py311hca03da5_0  
rsa                       4.9                      pypi_0    pypi
scikit-image              0.22.0                   pypi_0    pypi
scikit-learn              1.4.0                    pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
seaborn                   0.13.1                   pypi_0    pypi
selenium                  4.12.0                   pypi_0    pypi
send2trash                1.8.0              pyhd3eb1b0_1  
setuptools                68.0.0          py311hca03da5_0  
sip                       6.6.2           py311h313beb8_0  
six                       1.16.0             pyhd3eb1b0_1  
smart-open                6.4.0                    pypi_0    pypi
sniffio                   1.2.0           py311hca03da5_1  
sortedcontainers          2.4.0                    pypi_0    pypi
soupsieve                 2.4             py311hca03da5_0  
spacy                     3.7.2                    pypi_0    pypi
spacy-legacy              3.0.12                   pypi_0    pypi
spacy-loggers             1.0.5                    pypi_0    pypi
sqlite                    3.41.2               h80987f9_0  
srsly                     2.4.8                    pypi_0    pypi
stack_data                0.2.0              pyhd3eb1b0_0  
tensorboard               2.15.1                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
tensorflow                2.15.0                   pypi_0    pypi
tensorflow-estimator      2.15.0                   pypi_0    pypi
tensorflow-hub            0.16.1                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.36.0                   pypi_0    pypi
tensorflow-macos          2.15.0                   pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
terminado                 0.17.1          py311hca03da5_0  
tf-keras                  2.15.0                   pypi_0    pypi
thinc                     8.2.2                    pypi_0    pypi
threadpoolctl             3.2.0                    pypi_0    pypi
tifffile                  2024.1.30                pypi_0    pypi
tinycss2                  1.2.1           py311hca03da5_0  
tk                        8.6.12               hb8d0fd4_0  
toml                      0.10.2             pyhd3eb1b0_0  
toolz                     0.12.1                   pypi_0    pypi
tornado                   6.3.2           py311h80987f9_0  
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.7.1           py311hca03da5_0  
trio                      0.22.2                   pypi_0    pypi
trio-websocket            0.10.4                   pypi_0    pypi
typer                     0.9.0                    pypi_0    pypi
typing-extensions         4.7.1           py311hca03da5_0  
typing_extensions         4.7.1           py311hca03da5_0  
tzdata                    2023.3                   pypi_0    pypi
uc-micro-py               1.0.2                    pypi_0    pypi
umap                      0.1.1                    pypi_0    pypi
umap-learn                0.5.5                    pypi_0    pypi
undetected-chromedriver   3.5.3                    pypi_0    pypi
urllib3                   1.26.16         py311hca03da5_0  
vadersentiment            3.3.2                    pypi_0    pypi
vectorizers               0.2                      pypi_0    pypi
wasabi                    1.1.2                    pypi_0    pypi
wcwidth                   0.2.13                   pypi_0    pypi
weasel                    0.3.4                    pypi_0    pypi
webencodings              0.5.1           py311hca03da5_1  
websocket-client          0.58.0          py311hca03da5_4  
websockets                11.0.3                   pypi_0    pypi
werkzeug                  3.0.1                    pypi_0    pypi
wheel                     0.38.4          py311hca03da5_0  
widgetsnbextension        4.0.5           py311hca03da5_0  
wrapt                     1.14.1                   pypi_0    pypi
wsproto                   1.2.0                    pypi_0    pypi
xarray                    2024.1.1                 pypi_0    pypi
xyzservices               2023.10.1                pypi_0    pypi
xz                        5.4.2                h80987f9_0  
y-py                      0.5.9           py311ha6e5c4f_0  
yaml                      0.2.5                h1a28f6b_0  
youtube-dl                2021.12.17               pypi_0    pypi
ypy-websocket             0.8.2           py311hca03da5_0  
yt-dlp                    2023.10.7                pypi_0    pypi
zeromq                    4.3.4                hc377ac9_0  
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               h5a0b063_0  
zstd                      1.5.5                hd90d995_0