-
Notifications
You must be signed in to change notification settings - Fork 29
/
README
91 lines (73 loc) · 3.73 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
==============================================================================
Linking to Yelp dataset (via a symlink)
==============================================================================
ln -s $HOME/Dropbox/sentiment-data/yelp/ yelp
==============================================================================
Toolkits
==============================================================================
Oliver Mason's Qtag program [http://phrasys.net/uob/om/software]
==============================================================================
For setting up Maximum Entropy Modeling Toolkit for Python and C++
==============================================================================
Main page [http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html]
Source [https://github.com/lzhang10/maxent]
Wonderful documentation, except for the missing Python API reference [http://homepages.inf.ed.ac.uk/lzhang10/software/maxent/manual.pdf]
DEPENDENCIES
zlib [http://www.techsww.com/tutorials/libraries/zlib/installation/installing_zlib_on_ubuntu_linux.php]
libboost [apt-get]
jam [apt-get]
Important points
* L-BFGS is the default parameter estimating method in this toolkit.
==============================================================================
Preprocess movie data
==============================================================================
Use Qtag with the "underscore" and "process all files in directory" options
$ java -jar qtag.jar
Move the POS tagged data out to its own directory, for further processing
$ mv pos/tagged/ pos_tagged
$ mv neg/tagged/ neg_tagged
Tag data with position
$ python position_tagger.py -d pos
$ python position_tagger.py -d neg
Filter out for only adjectives
$ python adjectives_filter.py -d neg
$ python adjectives_filter.py -d pos
Filter out for only verbs
$ python verb_filter.py -d pos
$ python verb_filter.py -d neg
==============================================================================
Preprocess Yelp data
==============================================================================
Make yelp data look like movie data in terms of formatting, and limit to 1000
per star rating
$ python preprocess_yelp.py -d yelp/default/1star_limited
$ python preprocess_yelp.py -d yelp/default/2star_limited
$ python preprocess_yelp.py -d yelp/default/3star_limited
$ python preprocess_yelp.py -d yelp/default/4star_limited
$ python preprocess_yelp.py -d yelp/default/5star_limited
Use Qtag with the "underscore" and "process all files in directory" options
$ java -jar qtag.jar
Move the POS tagged data out to its own directory, for further processing
$ mv 1star_limited/tagged/ 1star_limited_tagged
$ mv 2star_limited/tagged/ 2star_limited_tagged
$ mv 3star_limited/tagged/ 3star_limited_tagged
$ mv 4star_limited/tagged/ 4star_limited_tagged
$ mv 5star_limited/tagged/ 5star_limited_tagged
Tag data with position
$ python position_tagger.py -d yelp/default/1star_limited
$ python position_tagger.py -d yelp/default/2star_limited
$ python position_tagger.py -d yelp/default/3star_limited
$ python position_tagger.py -d yelp/default/4star_limited
$ python position_tagger.py -d yelp/default/5star_limited
Filter out for only adjectives
$ python adjectives_filter.py -d yelp/default/1star_limited
$ python adjectives_filter.py -d yelp/default/2star_limited
$ python adjectives_filter.py -d yelp/default/3star_limited
$ python adjectives_filter.py -d yelp/default/4star_limited
$ python adjectives_filter.py -d yelp/default/5star_limited
Filter out for only verbs
$ python verb_filter.py -d yelp/default/1star_limited
$ python verb_filter.py -d yelp/default/2star_limited
$ python verb_filter.py -d yelp/default/3star_limited
$ python verb_filter.py -d yelp/default/4star_limited
$ python verb_filter.py -d yelp/default/5star_limited