Finished in the Top 5% of the Rainfall Prediction Kaggle competition! 🌧️
Write-up coming soon – stay tuned! ⭐
👋 Hello! I'm Autumn 🎓, a recent graduate from the Georgia Tech Analytics Masters Program 🐝. As a data scientist, I have an unending curiosity for uncovering patterns in data, transforming insights into action 💡, and communicating results in a way that drives meaningful impact 🤖.
![]() |
![]() |
This project focuses on predicting product prices in the Backpack Price Prediction Kaggle competition. Rather than applying a basic regression model, the pipeline leverages feature engineering, real-world intuition, and model optimization to improve predictive accuracy in a noisy commercial dataset.
🔹 Key Highlights:
📌 Feature Engineering: Constructed product-specific features such as weight-to-compartment interactions, log transformations for skewed fields, and multi-way categorical combinations (e.g., brand + material + size).
📌 Modeling: Benchmarked XGBoost, LightGBM, and CatBoost, incorporating Optuna for tuning and using a stacked ensemble with Ridge regression for final predictions.
📌 Performance Metrics: Evaluated using RMSE on both notebook and Kaggle leaderboard submissions to track generalization.
📊 Technologies Used:
- Python 🐍 Pandas, NumPy, Scikit-learn
- Winning Model: Stacked Ensemble (XGBoost + LightGBM + CatBoost)
- Feature Engineering & Preprocessing (One-hot encoding, interaction terms, outlier removal)
- Hyperparameter Tuning (Optuna, Cross-Validation)
- GitHub for Version Control 🛠
🔬 Innovative Methods Used:
While many tabular models focus solely on boosting performance, this project highlights the value of domain-aware feature construction and rigorous evaluation across multiple modeling pipelines. A shared preprocessing module ensured fairness across models and streamlined experimentation.
🔗 Check out the full write-up in the repository!
![]() |
![]() |
This project focuses on predicting patient outcomes in the Cirrhosis Outcome Prediction Kaggle competition. Instead of applying a basic classification model, I utilized feature engineering, domain knowledge, and model optimization techniques to improve multi-class prediction accuracy.
🔹 Key Highlights:
📌 Feature Engineering: Created domain-specific features like bilirubin-to-albumin ratio, log transformations for skeId features, and binary indicators for critical thresholds.
📌 Modeling: Compared XGBoost, LightGBM, and CatBoost, fine-tuning hyperparameters and using stacking ensembles for performance gains.
📌 Performance Metrics: Evaluated using multi-class log loss and cross-validation to ensure model generalization.
📊 Technologies Used:
- Python 🐍 Pandas, NumPy, Scikit-learn
- Winning Model XGBoost
- Feature Engineering & Data Preprocessing (One-hot encoding, ratio calculations, outlier removal)
- Hyperparameter Tuning (Randomized Search, Stratified K-Fold Validation)
- GitHub for Version Control 🛠
🔬 Innovative Methods Used:
Many classification models for medical datasets rely on direct correlations or minimal preprocessing. This project takes a more data-driven and clinical approach, engineering features that reflect real-world liver disease progression. This improves both interpretability and predictive power.
🔗 Check out the full write-up in the repository!
![]() |
![]() |
This project is an in-depth analysis of the Ames Housing dataset, where I applied machine learning models to predict house sale prices. Instead of merely running standard models, I leveraged feature engineering, domain knowledge, and advanced model comparison techniques to improve prediction accuracy.
🔹 Key Highlights:
- 📌 Feature Engineering: Grouped related features to enhance predictive poIr
- 📌 Modeling: Compared Decision Trees, Random Forests, Gradient Boosting, and Linear Regression
- 📌 Performance Metrics: Evaluated RMSE and R² to measure model effectiveness
📊 Technologies Used:
- Python 🐍 (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
- Machine Learning Models (Linear Regression, Gradient Boosting, Random Forest, Decision Trees)
- Feature Engineering & Data Preprocessing
- GitHub for Version Control 🛠
📢 🔬 Innovative Methods Used: Most approaches to this dataset focus on either raw correlations or brute-force feature selection. My approach leverages real-estate knowledge to construct meaningful categories (e.g., grouping porch types, analyzing basement features separately), which led to better model interpretability and stronger predictions...in some cases. See the write-up where I explain.
🔗 Check out the full write-up in the repository!
![]() |
![]() |
This project tackles the challenge of optimizing a medley relay lineup, where swimmers often excel in multiple strokes, creating trade-offs in event selection. Instead of guessing or manually shuffling times, I developed an Excel Solver-based optimization model to automatically determine the fastest possible relay combination.
🔗 Check out the full write-up in the repository!
Machine learning terms with simple, intuitive explanations.
🔗 View the repository: ML Decoded on GitHub
- Developing and optimizing models using:
- Linear Regression, Decision Trees, Random Forests
- Gradient Boosting (LightGBM, XGBoost, CatBoost)
- Support Vector Machines (SVM), Neural Networks (TensorFlow, PyTorch)
- Clustering (K-Means, DBSCAN), Principal Component Analysis (PCA)
- Data wrangling, preprocessing, and feature engineering with:
- Pandas, NumPy, Scikit-learn, Statsmodels
- Handling missing values, scaling, encoding categorical variables
- Engineering domain-specific features to enhance model performance
- Communicating insights using:
- Matplotlib, Seaborn, Plotly, Tableau
- Creating interactive and high-impact visualizations for stakeholder engagement
- Working with large-scale datasets using:
- Amazon S3, Google BigQuery, Apache Spark, SQL
- Optimizing storage and query performance for large datasets
- Applying data science for:
- Forecasting, market analysis, and strategic decision-making
- Business intelligence tools: PoIr BI, Looker
- Automating reporting and dashboarding solutions
-
Masters of Science in Analytics 🐝
Georgia Institute of Technology 🌐 -
Web Development Professional Certificate 📜
University of California, Davis 🌟 -
Bachelor of Science in Business Finance 💹
California State University Sacramento 🌳
-
Probabilistic Machine Learning for Finance and Investing: A Primer to Generative AI with Python
👀 👆 I am listed in the Acknowledgements section 😉
- LinkedIn: https://linkedin.com/in/autumnpeters 🔗