Skip to content

JasonDahl/project-showcase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Jason's Portfolio

Welcome to my portfolio. Read on for a guided tour of some of my data projects.

Table of Contents

TDI Data Science Fellowship

In 2023 I completed The Data Incubator's (TDI) competitive Data Science Fellowship - an immersive training program designed to prepare individuals with strong quantitative and analytical backgrounds, often including those with advanced degrees in fields like physics, mathematics, computer science, or engineering, for careers in data science. The curriculum covered a wide range of data science topics, including but not limited to machine learning, statistical analysis, data visualization, and big data technologies. Fellowship work features hands-on projects, solving real-world problems and gaining practical experience. Modules included:

  • Data wrangling with Numpy and Pandas
  • Data analysis with SQL
  • Machine learning with Scikit-Learn
  • Natural Language Processing (NLP) and time series analysis
  • Distributed computing with Spark (RDD and dataframe/MLLib)
  • Deep learning with Keras/Tensorflow
  • A rigorous capstone project

Data Engineering Zoomcamp

DEZoomcamp is an educational initiative by DataTalksClub focused on teaching data engineering. It's curriculum includes several coding projects.

Project Link Area Project Description Tools
ETL with PostgreSQL, Docker, and Terraform ETL, containers and orchestration, resource provisioning Demonstrated essential data engineering techniques using Docker. ETL development in a Jupyter notebook with Pandas for data manipulation and SQLAlchemy for database creation and exploration. Container orchestration with Docker-Compose. Resource provisioning on GCP with Terraform. Google Cloud Platform, Pandas, SQLAlchemy, Jupyter, Docker, PostgreSQL, pgAdmin, Docker-Compose, Terraform

MLZoomcamp

MLZoomcamp is an educational initiative by DataTalksClub focused on teaching machine learning and data science. It's curriculum includes several coding projects.

Project Link Area Project Description Libraries
Exploratory Data Analysis and Linear Regression Data Analysis & Linear Regression Demonstrated essential data analysis techniques and linear regression using Python in a Jupyter notebook. Emphasized Pandas and NumPy for data manipulation, exploratory analysis, and linear regression implementation. Calculated linear regression weights through matrix inversion. Pandas, NumPy
Machine Learning for Regression Machine Learning & Regression Analysis Built a regression model predicting housing prices using the California Housing Prices dataset. Utilized Pandas, NumPy, Matplotlib, and Seaborn for data handling, visualization, and regression analysis. Regression by matrix inversion. Pandas, NumPy, Matplotlib, Seaborn
Classification with scikit-learn Machine Learning & Classification Focused on transforming a car price dataset into a classification problem, predicting whether a car's price is above its mean value ('above_average'). Utilized scikit-learn for classification modeling, NumPy, Pandas, Matplotlib, and Seaborn for data handling, visualization, and preprocessing. scikit-learn, NumPy, Pandas, Matplotlib, Seaborn
Model Evaluation Machine Learning & Evaluation Focused on model evaluation techniques—ROC-AUC, precision, recall, and F1 score. Transformed a car price dataset into a binary classification problem, predicting whether a car's price is above its mean value ('above_average'). Utilized scikit-learn for data prep, exploratory analysis, logistic regression, cross-validation, and hyperparameter tuning. scikit-learn, NumPy, Pandas, Matplotlib, Seaborn
Model Deployment with Flask, Gunicorn, and Docker Deployment & Machine Learning Focused on deploying a pre-trained ML model using Flask and containerizing it with Docker. Created a web service incorporating a Scikit-Learn model for predicting credit probabilities for clients. Utilized Flask, Gunicorn, and Docker for deployment. Pipenv, Scikit-Learn, Pickle, Flask, Gunicorn, Docker
Decision Trees and Ensemble Learning Machine Learning & Regression Analysis Analyzed the California Housing Prices dataset from Kaggle, focusing on predicting 'median_house_value.' Explored Decision Trees, Random Forests, and XGBoost for ensemble-based regression analysis. Implemented hyperparameter tuning for improved model performance. pandas, numpy, matplotlib, seaborn, scikit-learn, XGBoost
Image Classification with TensorFlow and Keras Deep Learning & Computer Vision Built an image classification model to differentiate between bees and wasps using TensorFlow and Keras. Leveraged the "Bee or Wasp?" dataset from Kaggle and implemented Convolutional Neural Networks (CNNs) for the classification task. Explored data augmentation through image transformations. numpy, pandas, matplotlib, seaborn, TensorFlow, Keras

About

A portfolio of some of my data projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published