Big data application of Machine Learning concepts for sentiment classification of US Airlines tweets. The focus is on the usage of pyspark libraries (ml-lib) on big data to solve a problem using Machine Learning algorithms and not about the choice of algorithm used in the ML model creation. It also involves data pre-processing using NLP techniques, cross-validation and parameter-grid builder.
Input and output files used have been attached in the repositories, the urls used are only for the sake of usage in the databricks cluster.