Project in Distributed Data Analysis & Mining

U.S. Air Pollution - Data Analysis in Apache Spark

The goal of the project is to analyze, clustering, and classify surveys regarding U.S. air pollution levels recorded from 2000 to 2016 in a distributed, parallel environment.