Jason's Portfolio

Welcome to my portfolio. Read on for a guided tour of some of my data projects.

In 2023 I completed The Data Incubator's (TDI) competitive Data Science Fellowship - an immersive training program designed to prepare individuals with strong quantitative and analytical backgrounds, often including those with advanced degrees in fields like physics, mathematics, computer science, or engineering, for careers in data science. The curriculum covered a wide range of data science topics, including but not limited to machine learning, statistical analysis, data visualization, and big data technologies. Fellowship work features hands-on projects, solving real-world problems and gaining practical experience. Modules included:

Data wrangling with Numpy and Pandas
Data analysis with SQL
Machine learning with Scikit-Learn
Natural Language Processing (NLP) and time series analysis
Distributed computing with Spark (RDD and dataframe/MLLib)
Deep learning with Keras/Tensorflow
A rigorous capstone project

Data Engineering Zoomcamp

DEZoomcamp is an educational initiative by DataTalksClub focused on teaching data engineering. It's curriculum includes several coding projects.

Project Link	Area	Project Description	Tools
ETL with PostgreSQL, Docker, and Terraform	ETL, containers and orchestration, resource provisioning	Demonstrated essential data engineering techniques using Docker. ETL development in a Jupyter notebook with Pandas for data manipulation and SQLAlchemy for database creation and exploration. Container orchestration with Docker-Compose. Resource provisioning on GCP with Terraform.	Google Cloud Platform, Pandas, SQLAlchemy, Jupyter, Docker, PostgreSQL, pgAdmin, Docker-Compose, Terraform

MLZoomcamp

MLZoomcamp is an educational initiative by DataTalksClub focused on teaching machine learning and data science. It's curriculum includes several coding projects.

Project Link	Area	Project Description	Libraries
Exploratory Data Analysis and Linear Regression	Data Analysis & Linear Regression	Demonstrated essential data analysis techniques and linear regression using Python in a Jupyter notebook. Emphasized Pandas and NumPy for data manipulation, exploratory analysis, and linear regression implementation. Calculated linear regression weights through matrix inversion.	Pandas, NumPy
Machine Learning for Regression	Machine Learning & Regression Analysis	Built a regression model predicting housing prices using the California Housing Prices dataset. Utilized Pandas, NumPy, Matplotlib, and Seaborn for data handling, visualization, and regression analysis. Regression by matrix inversion.	Pandas, NumPy, Matplotlib, Seaborn
Classification with scikit-learn	Machine Learning & Classification	Focused on transforming a car price dataset into a classification problem, predicting whether a car's price is above its mean value ('above_average'). Utilized scikit-learn for classification modeling, NumPy, Pandas, Matplotlib, and Seaborn for data handling, visualization, and preprocessing.	scikit-learn, NumPy, Pandas, Matplotlib, Seaborn
Model Evaluation	Machine Learning & Evaluation	Focused on model evaluation techniques—ROC-AUC, precision, recall, and F1 score. Transformed a car price dataset into a binary classification problem, predicting whether a car's price is above its mean value ('above_average'). Utilized scikit-learn for data prep, exploratory analysis, logistic regression, cross-validation, and hyperparameter tuning.	scikit-learn, NumPy, Pandas, Matplotlib, Seaborn
Model Deployment with Flask, Gunicorn, and Docker	Deployment & Machine Learning	Focused on deploying a pre-trained ML model using Flask and containerizing it with Docker. Created a web service incorporating a Scikit-Learn model for predicting credit probabilities for clients. Utilized Flask, Gunicorn, and Docker for deployment.	Pipenv, Scikit-Learn, Pickle, Flask, Gunicorn, Docker
Decision Trees and Ensemble Learning	Machine Learning & Regression Analysis	Analyzed the California Housing Prices dataset from Kaggle, focusing on predicting 'median_house_value.' Explored Decision Trees, Random Forests, and XGBoost for ensemble-based regression analysis. Implemented hyperparameter tuning for improved model performance.	pandas, numpy, matplotlib, seaborn, scikit-learn, XGBoost
Image Classification with TensorFlow and Keras	Deep Learning & Computer Vision	Built an image classification model to differentiate between bees and wasps using TensorFlow and Keras. Leveraged the "Bee or Wasp?" dataset from Kaggle and implemented Convolutional Neural Networks (CNNs) for the classification task. Explored data augmentation through image transformations.	numpy, pandas, matplotlib, seaborn, TensorFlow, Keras

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jason's Portfolio

Table of Contents

TDI Data Science Fellowship

Data Engineering Zoomcamp

MLZoomcamp

About

Releases

Packages

JasonDahl/project-showcase

Folders and files

Latest commit

History

Repository files navigation

Jason's Portfolio

Table of Contents

TDI Data Science Fellowship

Data Engineering Zoomcamp

MLZoomcamp

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages