easySentimentAnalyseR
is a bag-of-words-based text mining template. This template can be sued to do quick and simple sentiment analysis from Twitter.
Based on my understanding, if you want to fetch the tweets from Twitter and have an overall good experience, you'd better (should/have to) register its corresponding dev account through Twitter Apps portal.
-
To do that, you must have a valid twitter normal user account, phone number or email address.
-
Then, you could log on its app portal.
-
Next, you need to create a Twitter application.
-
Then, you should go to the
Keys and Access Token
panel and record all your information corresponding to your current application.4.1 Customer Key
4.2 Customer Secret
4.3 Access Token
4.4 Access Token Secret
I'll put a good video below, and it can help you register such an account, or, you could read its official instructions.
<iframe width="560" height="315" src="https://www.youtube.com/embed/CVz1MjqTXMg" frameborder="0" allowfullscreen></iframe>
After you have your own account and application, then you need to use such information to replace the variable slots in easySentimentAnalyseR_Setup_Twitter.r
.
# get all credential information of your twitter developer account
# highly recommend you to store them into your OS's environment list
key <- Sys.getenv("twitter_developer_customer_key")
secret <- Sys.getenv("twitter_developer_customer_secret")
access_token <- Sys.getenv("twitter_developer_access_token")
access_token_secret <- Sys.getenv("twitter_developer_access_token_secret")
And then, do the authentication:
# do authentication
setup_twitter_oauth(consumer_key = key,
consumer_secret = secret,
access_token = access_token,
access_secret = access_token_secret)
When you see something like below:
[1] "Using direct authentication"
it means you've successfully authenticated.
Open the easySentimentAnalyseR_BoW_Twitter.r
, and set your own search_str
, num_twts
and other parameters (see ?searchTwitter
), for axample:
search_str = "the_topic_you_wanna_search"
num_twts = num_of_tweets # the number of tweets you wanna search
lang = "en" # lang: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
As a template, this script has already set up a general workflow for you. Thus, I think the good thing is that it can just let you focus more on the analysis part, especially when you are not a programmer but you also want to do some simple and quick sentiment analysis on a certatin topic.
If that is your requirement or goal, now the real analysis part should start officially. Based on the different context and scenario, users may need to adjust the general cleaning process by using their customized stopword list or importing other elements such as named entity, pos and so forth. Although I barely see named entity, pos and other elements are involved in BoW way, it could at least be a certain experience of trail and error. Just feel free to try to implement your thoughts.
# create copy of your corpus
# cleaning process should be decided based on your actual searching condition
copy_tweets_corpus <- VCorpus(VectorSource(tweets_list)) %>%
tm_map(str_to_lower) %>% # turn all characters to lower case
tm_map(removeNumbers) %>% # remove numbers
tm_map(removePunctuation) %>%
tm_map(function(x) {
removeWords(x, c(stopwords,str_replace_all(str_to_lower(search_str), "^#", ""),"RT")) %>% # remove stop words and the searching word
str_replace_all(pattern = "[^a-zA-Z\\s'-]",replacement = "")
}) %>%
tm_map(PlainTextDocument)
# clean corpus based your actual searching condition
tweets_corpus <- VCorpus(VectorSource(tweets_list)) %>% # generate a volatile corpus (you can try "Permanent" one or "Simple" one)
tm_map(str_to_lower) %>% # turn all characters to lower case
tm_map(removeNumbers) %>% # remove numbers
tm_map(removePunctuation) %>% # remove puncuations
tm_map(function(x) {
removeWords(x, c(stopwords,str_replace_all(str_to_lower(search_str), "^#", ""),"RT")) %>% # remove stop words and the searching word
str_replace_all(pattern = "[^a-zA-Z\\s'-]",replacement = "")
}) %>%
tm_map(stemDocument) %>% # do stemming, sometimes it will bring confused results, used it based on you actual requirement
tm_map(function(x){
tmp <- strsplit(x = x, split = " ")[[1]]
tmp <- tmp[tmp!=""]
paste(stemCompletion(tmp, dictionary = copy_tweets_corpus),collapse = ' ')
}) %>%
tm_map(stripWhitespace) %>% # clean extra white spaces
tm_map(PlainTextDocument) # turn the corppus into plain text doc
After cleaning the corpus, then you are able to generate a word cloud and calculate the sentiment scores of the searching topic by changing several parameters.
Here is an example that I posted on RPubs for testing purpose.
Also, here is some information related to my own system
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.2.1 data.table_1.10.4-2 rJava_0.9-9 openNLP_0.2-6
[5] qdap_2.2.8 qdapTools_1.3.3 qdapRegex_0.7.2 qdapDictionaries_1.0.6
[9] dendextend_1.5.2 SnowballC_0.5.1 tmcn_0.2-8 wordcloud_2.5
[13] RColorBrewer_1.1-2 stringr_1.2.0 magrittr_1.5 httr_1.3.1
[17] twitteR_1.1.9 XML_3.98-1.9 tm_0.7-1 NLP_0.1-11
[21] RevoUtilsMath_10.0.0 RevoUtils_10.0.3 RevoMods_11.0.0 MicrosoftML_1.3.0
[25] mrsdeploy_1.1.0 RevoScaleR_9.1.0 lattice_0.20-35 rpart_4.1-11
[29] curl_2.7 jsonlite_1.5
loaded via a namespace (and not attached):
[1] viridis_0.4.0 gender_0.5.1 bit64_0.9-7 viridisLite_0.2.0
[5] foreach_1.4.3 gtools_3.5.0 assertthat_0.2.0 stats4_3.3.3
[9] xlsxjars_0.6.1 mrupdate_1.0.1 yaml_2.1.14 robustbase_0.92-7
[13] slam_0.1-40 glue_1.1.1 chron_2.3-51 colorspace_1.3-2
[17] plyr_1.8.4 pkgconfig_2.0.1 mvtnorm_1.0-6 scales_0.5.0
[21] gdata_2.18.0 whisker_0.3-2 openssl_0.9.7 tibble_1.3.4
[25] reports_0.1.4 nnet_7.3-12 lazyeval_0.2.0 mclust_5.3
[29] MASS_7.3-47 class_7.3-14 tools_3.3.3 CompatibilityAPI_1.1.0
[33] trimcluster_0.1-2 xlsx_0.5.7 kernlab_0.9-25 munsell_0.4.3
[37] plotrix_3.6-6 cluster_2.0.6 fpc_2.1-10 bindrcpp_0.2
[41] rlang_0.1.2 grid_3.3.3 RCurl_1.95-4.8 iterators_1.0.8
[45] rjson_0.2.15 igraph_1.1.2 labeling_0.3 bitops_1.0-6
[49] venneuler_1.1-0 gtable_0.2.0 codetools_0.2-15 flexmix_2.3-14
[53] DBI_0.7 reshape2_1.4.2 R6_2.2.2 gridExtra_2.3
[57] prabclus_2.2-6 dplyr_0.7.4 bit_1.1-12 openNLPdata_1.5.3-3
[61] bindr_0.1 modeltools_0.2-21 stringi_1.1.5 parallel_3.3.3
[65] Rcpp_0.12.13 DEoptimR_1.0-8 diptest_0.75-7