Skip to content

Latest commit

 

History

History
66 lines (52 loc) · 3.09 KB

README.md

File metadata and controls

66 lines (52 loc) · 3.09 KB

DOI

TwitterSentimentBenchmarkDataAnalysis

Analysis on twitter sentiment analysis benchmark datasets as described in the paper Shubhanshu Mishra and Jana Diesner. 2018. Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora. In Proceedings of the 29th on Hypertext and Social Media (HT '18). ACM, New York, NY, USA, 2-10. DOI: https://doi.org/10.1145/3209542.3209562

If you plan to use this analysis please cite the following items:

@inproceedings{Mishra2018,
  doi = {10.1145/3209542.3209562},
  url = {https://doi.org/10.1145/3209542.3209562},
  year  = {2018},
  publisher = {{ACM} Press},
  author = {Shubhanshu Mishra and Jana Diesner},
  title = {Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora},
  booktitle = {Proceedings of the 29th on Hypertext and Social Media  - {HT} {\textquotesingle}18}
}

@misc{shubhanshu_mishra_2018_1308462,
  author       = {Shubhanshu Mishra},
  title        = {Twitter sentiment benchmark data analysis},
  month        = jul,
  year         = 2018,
  doi          = {10.5281/zenodo.1308462},
  url          = {https://doi.org/10.5281/zenodo.1308462}
}

Download the data with training, validation, and test splits

You can use the training, validation, and test splits data_with_train_dev_test_split.txt.gz as used in the paper by downloading the data in the data folder:

$ ls -ltrh data/
total 11M
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:26 joined_data_all.txt.gz
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:48 data_with_train_dev_test_split.txt.gz

The file was created as follows:

cd data && gunzip joined_data_all.txt.gz
python create_data_splits.py

Data sources:

Detecting the correlation between sentiment and user-level as well as text-level meta-data from benchmark corpora

Code for this analysis will can be seen in following files:

Code released under GNU General Public License v3.0