Autumn Peters autumnmarin

🚀 Exciting News! 🏆

Finished in the Top 5% of the Rainfall Prediction Kaggle competition! 🌧️

Write-up coming soon – stay tuned! ⭐

👋 Hello! I'm Autumn 🎓, a recent graduate from the Georgia Tech Analytics Masters Program 🐝. As a data scientist, I have an unending curiosity for uncovering patterns in data, transforming insights into action 💡, and communicating results in a way that drives meaningful impact 🤖.

🎒 What’s in the Bag? Breaking Down Price Prediction with ML

This project focuses on predicting product prices in the Backpack Price Prediction Kaggle competition. Rather than applying a basic regression model, the pipeline leverages feature engineering, real-world intuition, and model optimization to improve predictive accuracy in a noisy commercial dataset.

🔹 Key Highlights:

📌 Feature Engineering: Constructed product-specific features such as weight-to-compartment interactions, log transformations for skewed fields, and multi-way categorical combinations (e.g., brand + material + size).
📌 Modeling: Benchmarked XGBoost, LightGBM, and CatBoost, incorporating Optuna for tuning and using a stacked ensemble with Ridge regression for final predictions.
📌 Performance Metrics: Evaluated using RMSE on both notebook and Kaggle leaderboard submissions to track generalization.

📊 Technologies Used:

Python 🐍 Pandas, NumPy, Scikit-learn
Winning Model: Stacked Ensemble (XGBoost + LightGBM + CatBoost)
Feature Engineering & Preprocessing (One-hot encoding, interaction terms, outlier removal)
Hyperparameter Tuning (Optuna, Cross-Validation)
GitHub for Version Control 🛠

🔬 Innovative Methods Used:
While many tabular models focus solely on boosting performance, this project highlights the value of domain-aware feature construction and rigorous evaluation across multiple modeling pipelines. A shared preprocessing module ensured fairness across models and streamlined experimentation.

🔗 Check out the full write-up in the repository!

🏥 Predicting Cirrhosis Patient Outcomes with Multi-Class Classification

This project focuses on predicting patient outcomes in the Cirrhosis Outcome Prediction Kaggle competition. Instead of applying a basic classification model, I utilized feature engineering, domain knowledge, and model optimization techniques to improve multi-class prediction accuracy.

🔹 Key Highlights:

📌 Feature Engineering: Created domain-specific features like bilirubin-to-albumin ratio, log transformations for skeId features, and binary indicators for critical thresholds.
📌 Modeling: Compared XGBoost, LightGBM, and CatBoost, fine-tuning hyperparameters and using stacking ensembles for performance gains.
📌 Performance Metrics: Evaluated using multi-class log loss and cross-validation to ensure model generalization.

📊 Technologies Used:

Python 🐍 Pandas, NumPy, Scikit-learn
Winning Model XGBoost
Feature Engineering & Data Preprocessing (One-hot encoding, ratio calculations, outlier removal)
Hyperparameter Tuning (Randomized Search, Stratified K-Fold Validation)
GitHub for Version Control 🛠

🔬 Innovative Methods Used:
Many classification models for medical datasets rely on direct correlations or minimal preprocessing. This project takes a more data-driven and clinical approach, engineering features that reflect real-world liver disease progression. This improves both interpretability and predictive power.

🔗 Check out the full write-up in the repository!

🏠 A Different Approach to Feature Engineering for Predicting House Prices in Ames

This project is an in-depth analysis of the Ames Housing dataset, where I applied machine learning models to predict house sale prices. Instead of merely running standard models, I leveraged feature engineering, domain knowledge, and advanced model comparison techniques to improve prediction accuracy.

🔹 Key Highlights:

📌 Feature Engineering: Grouped related features to enhance predictive poIr
📌 Modeling: Compared Decision Trees, Random Forests, Gradient Boosting, and Linear Regression
📌 Performance Metrics: Evaluated RMSE and R² to measure model effectiveness

📊 Technologies Used:

Python 🐍 (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
Machine Learning Models (Linear Regression, Gradient Boosting, Random Forest, Decision Trees)
Feature Engineering & Data Preprocessing
GitHub for Version Control 🛠

📢 🔬 Innovative Methods Used: Most approaches to this dataset focus on either raw correlations or brute-force feature selection. My approach leverages real-estate knowledge to construct meaningful categories (e.g., grouping porch types, analyzing basement features separately), which led to better model interpretability and stronger predictions...in some cases. See the write-up where I explain.

🔗 Check out the full write-up in the repository!

🎯 Medley Relay Optimization

This project tackles the challenge of optimizing a medley relay lineup, where swimmers often excel in multiple strokes, creating trade-offs in event selection. Instead of guessing or manually shuffling times, I developed an Excel Solver-based optimization model to automatically determine the fastest possible relay combination.

🔗 Check out the full write-up in the repository!

🚀 ML Decoded

Machine learning terms with simple, intuitive explanations.

🔗 View the repository: ML Decoded on GitHub

🚀 Technical Skills

🤖 Machine Learning & Predictive Modeling

Developing and optimizing models using:
- Linear Regression, Decision Trees, Random Forests
- Gradient Boosting (LightGBM, XGBoost, CatBoost)
- Support Vector Machines (SVM), Neural Networks (TensorFlow, PyTorch)
- Clustering (K-Means, DBSCAN), Principal Component Analysis (PCA)

🧠 Data Analysis & Feature Engineering

Data wrangling, preprocessing, and feature engineering with:
- Pandas, NumPy, Scikit-learn, Statsmodels
- Handling missing values, scaling, encoding categorical variables
- Engineering domain-specific features to enhance model performance

📊 Data Visualization & Storytelling

Communicating insights using:
- Matplotlib, Seaborn, Plotly, Tableau
- Creating interactive and high-impact visualizations for stakeholder engagement

💾 Big Data & Scalable Computing

Working with large-scale datasets using:
- Amazon S3, Google BigQuery, Apache Spark, SQL
- Optimizing storage and query performance for large datasets

📈 Business Intelligence & Data-Driven Strategy

Applying data science for:
- Forecasting, market analysis, and strategic decision-making
- Business intelligence tools: PoIr BI, Looker
- Automating reporting and dashboarding solutions

🎓 Education

Masters of Science in Analytics 🐝
Georgia Institute of Technology 🌐
Web Development Professional Certificate 📜
University of California, Davis 🌟
Bachelor of Science in Business Finance 💹
California State University Sacramento 🌳

📚 Bookshelf:

Probabilistic Machine Learning for Finance and Investing: A Primer to Generative AI with Python

👀 👆 I am listed in the Acknowledgements section 😉
ISLR: Introduction to Statistical Learning
The Elements of Statistical Learning
DEBUGGING
Naked Statistics: Stripping the Dread from the Data

🌐 Let's Connect!

LinkedIn: https://linkedin.com/in/autumnpeters 🔗

Created 14 commits in 4 repositories

autumnmarin/Git_Certified_Notes 8 commits

autumnmarin/skills-introduction-to-gi... 3 commits

autumnmarin/autumnmarin 2 commits

autumnmarin/rainfall 1 commit

Created 1 repository

autumnmarin/skills-introduction-to-github Built by
This contribution was made on Apr 5

Opened 1 pull request in 1 repository

autumnmarin/skills-introduction-to-github 1 merged

Add my first file
This contribution was made on Apr 5

23 contributions in private repositories Apr 1 – Apr 18

	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar	Apr
Sun
Mon
Tue
Wed
Thu
Fri
Sat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autumn Peters autumnmarin

Achievements

Achievements

Block or report autumnmarin

🚀 Exciting News! 🏆

🎒 What’s in the Bag? Breaking Down Price Prediction with ML

🏥 Predicting Cirrhosis Patient Outcomes with Multi-Class Classification

🏠 A Different Approach to Feature Engineering for Predicting House Prices in Ames

🎯 Medley Relay Optimization

🚀 ML Decoded

🚀 Technical Skills

🤖 Machine Learning & Predictive Modeling

🧠 Data Analysis & Feature Engineering

📊 Data Visualization & Storytelling

💾 Big Data & Scalable Computing

📈 Business Intelligence & Data-Driven Strategy

🎓 Education

📚 Bookshelf:

🌐 Let's Connect!

Pinned Loading

246 contributions in the last year

Contribution activity

April 2025

	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar	Apr
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar	Apr
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar	Apr
Sun
Mon
Tue
Wed
Thu
Fri
Sat