Skip to content

The project aims to evaluate and select the optimal supervised learning algorithm available that is adequate to accurately model individuals' income using data collected from the 1994 U.S. Census consists of approximately 32,000 data points.

Notifications You must be signed in to change notification settings

athlatif/FindingDonorsProject

Repository files navigation

Project: Finding Donors for CharityML

Part of Udacity Data Scientist Nanodegree

About

The project aims to evaluate and select the optimal supervised learning algorithm available that is adequate to accurately model individuals' income using data collected from the 1994 U.S. Census. In addition to accurately predicts whether an individual makes more than $50,000. Understanding an individual's income can help a non-profit better understand how large of a donation to request.

Based on the accuracy and f-score and the training time the best model is Random Forest Classifier (RFC). Since we are dealing with a classfication problem using Random Forest would be optimal and fast and easy to communicate results to the stakeholders.

Steps

  1. preprocessing
  • Transforming Skewed Continuous Features
  • Normalizing Numerical Features
  • One-hot Encoding
  1. Implement performance metrics to evaluate the potential algorithms
  2. Choosing the Best Model & Model Tuning

Install

This project requires Python 3.x and the following Python libraries installed:

You will also need to have software installed to run and execute an iPython Notebook

Data

The modified census dataset consists of approximately 32,000 data points, with each datapoint having 13 features. This dataset is a modified version of the dataset published in the paper "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", by Ron Kohavi. You may find this paper online, with the original dataset hosted on UCI.

About

The project aims to evaluate and select the optimal supervised learning algorithm available that is adequate to accurately model individuals' income using data collected from the 1994 U.S. Census consists of approximately 32,000 data points.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published