Skip to content

Commit ee2f022

Browse files
committed
Updated README with data and split instructions
1 parent f18101c commit ee2f022

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,26 @@ If you plan to use this analysis please cite the following items:
2525
}
2626
```
2727

28+
## Download the data with training, validation, and test splits
29+
30+
You can use the training, validation, and test splits `data_with_train_dev_test_split.txt.gz` as used in the paper by downloading the data in the data folder:
31+
32+
```
33+
$ ls -ltrh data/
34+
total 11M
35+
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:26 joined_data_all.txt.gz
36+
-rw-rw-r-- 1 smishra8 is-sailgroup 5.1M May 16 04:48 data_with_train_dev_test_split.txt.gz
37+
```
38+
39+
The file was created as follows:
40+
41+
```bash
42+
cd data && gunzip joined_data_all.txt.gz
43+
python create_data_splits.py
44+
```
45+
46+
47+
2848
## Data sources:
2949
* SemEval - http://alt.qcri.org/semeval2017/task4/
3050
* Airline - https://www.kaggle.com/crowdflower/twitter-airline-sentiment

0 commit comments

Comments
 (0)