-
Setting up Apache Maven for Java project - User Interface and MapReduce functions
-
Setting up GitHub repository workflow
-
Setting up GitHub Actions for automation
-
Creating a web crawler in Python using Tweepy library to fetch data based on some parameter.
-
Create a HDFS cluster for MapReduce functionality and program Hadoop MapReduce in Java
-
Setup Hadoop Core and create Job Tracker and Task Trackers for the project
-
Implement MapReduce in HDFS using Java to count the frequency of significant words in Data dictionary, in Twitter string
-
Configure Apache Maven with MapReduce codes and install Apache Hadoop Jar dependency
-
Configure MapReduce code in GitHub Actions for automation
-
Automate the Big Data pipeline till MapReduce using GitHub Actions
-
WAP in Java to implement MapReduce from JSON file extracted from crawler to find the frequency of significant words - Textual Analysis
-
Data Classification - create a multi-class data dictionary for sentimental analysis - currently for words (in future, we might extend it for phrases and sentences for improved accuracy)
-
Data Predicition - Using the KNN algorithm in Python to find the relation between tweets and their sentiments.
-
Data Visualization - Using the Python matplotlib library to implement visualization.
-
pom.xml - Setup Apache Maven
-
helloworld.java - Basic Java project setup
-
maven.yml - setup GitHub Actions
-
crawler.py - Web Crawler in Python to extract twitter data based on specific hashtags.
-
info.csv - data file created as an output for the crawler and to be sent to the HDFS core for processing
-
MapReduce functionalities in Java
- Convolutional Neural Networks
- Decision Tree
- SVM
- Pre-Processing
- Random Forests
- Naive Bayes
- XGBoost
-
matplotlib.py - Data Visualization using matplotlib in python
-
Hadoop Setup
It is an open source project. Open for everyone.
Follow these contribution guidelines.
MIT License, copyrighted to Storms In Brewing (2019-2020)