Beyond Words - predicting user decision with text data

Executive Summary

Software as a service (SaaS) is a major sector of cloud computing business. To thrive in this competitive market, growing user base is a crucial drive of business. Predicting and understanding customer decision are imperative to help a company timely adapt its service to meet users’ needs.
Performing sentiment and text analysis on user communication data can be an effective approach to reflect user experience or satisfaction level. An algorithmic approach based on the user text data was carried out to predict when a user is about to subscribe or unsubscribe. The client of this consulting project is a startup company specializing in platforms designed for content creators to create their mobile apps. The data are generated from user communication in-app.
60 features were extracted from the text data marked with different time periods, including sentiment, number of characters, number of words, etc. Machine learning models such as Random Forest and XGBoost were trained by data with these features to predict user decision. Specifically, this model can 1) forecast users at high risk of churning 4 weeks in advance with 0.87 AUC and 2) esimate user lifecycle which was corroborated by additional time-series analysis. My client can have valuable time to take actions (e.g. sending out targeted surveys and in-app perks). And this model can evaluate the performance of these strategies. Therefore, this machine learning approach can help my client to grown their premium users base through prediction and evaluation.

Key Procedures

Preprocessing text data for machine to read
- Converte emoji and emoticon by emoji and emot packages, respectively.
- Note: although emot can also process emoji, its emoji database is not as comprehensive as emoji.
Choosing the right natural language processing (NLP)models
- Test unsupervised NLP: TextBlob and VADER
- Test supervised NLP: off-the-shelf pretrained BERT (state-of-the-art)
- Highly skewed data: user text contents were overwhelmingly positive and supportive, unsuitable for existing unsupervised models or off-the-shelf supervised models.
Tuning BERT model with proper labelling
- Create two type of labels for each text: Tone (positive/neutral/negative) and Content (rich/partial/none)
- Fine-tune two BERT models through ktrain for each label class separately
- Achieved accuracy score 0.85 and 0.78 for Tone and Content, respectively
- Note: another approach is to merge two label classes into one (2x3) to train one model (less costly but weakned prediction: accuracy score 0.67 due to data imbalance)
Predicting user churn and bounce
- Only use text data generated before user decisions
- Extract text features, including number of word, character, and text of differnt time periods for each user
- Combine text features and sentiment features (60 features)
- Applied classificiation models and a stacking ensemble (combined KNN, RF and XGB by Logistic Regression)
- Achieved 0.89 and 0.76 accuracy for churn and bounce, respectively.
Takeaways
- Strong correlation between text and sentiment features
  - text meta features are good enough to predict user decision (easy to scale up for big data)
- User engagement level is a key indicator of user decision
  - model can predict user churn 4 weeks before user decision
  - premium users have a lifetime 3-4 months
- With more data
  - real-time prediction and evaluation by sliding window approach

Presentation: YouTube and slides

Examples

Click to show to an example of Emoji and Emoticon Conversion

Click to show the Sanity Check of sentiment analysis by different NLP models

NLP Models Performance Comparision, OTS: off-the-shelf

Last update 2020/11/05

Created 2020/10/08

Current Page
Return to My GitHub

>>>>>> CC BY 4.0 <<<<<<

Name	Name	Last commit message	Last commit date
Latest commit er1czz Update README.md Nov 5, 2020 4c3492b · Nov 5, 2020 History 206 Commits
blob/master/_layouts	blob/master/_layouts	Create default.html	Oct 11, 2020
NLP_benchmark.png	NLP_benchmark.png	Add files via upload	Oct 20, 2020
README.md	README.md	Update README.md	Nov 5, 2020
background.PNG	background.PNG	Add files via upload	Oct 10, 2020
emo_convert.png	emo_convert.png	Add files via upload	Oct 11, 2020
word_cloud_1.png	word_cloud_1.png	Add files via upload	Oct 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Words - predicting user decision with text data

Executive Summary

Key Procedures

Presentation: YouTube and slides

Examples

Last update 2020/11/05

Created 2020/10/08

About

Releases

Packages

Languages

er1czz/beyondwords

Folders and files

Latest commit

History

Repository files navigation

Beyond Words - predicting user decision with text data

Executive Summary

Key Procedures

Presentation: YouTube and slides

Examples

Last update 2020/11/05

Created 2020/10/08

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages