Skip to content
View autumnmarin's full-sized avatar

Block or report autumnmarin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
autumnmarin/README.md

🚀 Exciting News! 🏆

Finished in the Top 5% of the Rainfall Prediction Kaggle competition! 🌧️

Write-up coming soon – stay tuned! ⭐


👋 Hello! I'm Autumn 🎓, a recent graduate from the Georgia Tech Analytics Masters Program 🐝. As a data scientist, I have an unending curiosity for uncovering patterns in data, transforming insights into action 💡, and communicating results in a way that drives meaningful impact 🤖.


🎒 What’s in the Bag? Breaking Down Price Prediction with ML

GitHub Repo

This project focuses on predicting product prices in the Backpack Price Prediction Kaggle competition. Rather than applying a basic regression model, the pipeline leverages feature engineering, real-world intuition, and model optimization to improve predictive accuracy in a noisy commercial dataset.


🔹 Key Highlights:

📌 Feature Engineering: Constructed product-specific features such as weight-to-compartment interactions, log transformations for skewed fields, and multi-way categorical combinations (e.g., brand + material + size).
📌 Modeling: Benchmarked XGBoost, LightGBM, and CatBoost, incorporating Optuna for tuning and using a stacked ensemble with Ridge regression for final predictions.
📌 Performance Metrics: Evaluated using RMSE on both notebook and Kaggle leaderboard submissions to track generalization.


📊 Technologies Used:

  • Python 🐍 Pandas, NumPy, Scikit-learn
  • Winning Model: Stacked Ensemble (XGBoost + LightGBM + CatBoost)
  • Feature Engineering & Preprocessing (One-hot encoding, interaction terms, outlier removal)
  • Hyperparameter Tuning (Optuna, Cross-Validation)
  • GitHub for Version Control 🛠

🔬 Innovative Methods Used:
While many tabular models focus solely on boosting performance, this project highlights the value of domain-aware feature construction and rigorous evaluation across multiple modeling pipelines. A shared preprocessing module ensured fairness across models and streamlined experimentation.

🔗 Check out the full write-up in the repository!


🏥 Predicting Cirrhosis Patient Outcomes with Multi-Class Classification

GitHub Repo

This project focuses on predicting patient outcomes in the Cirrhosis Outcome Prediction Kaggle competition. Instead of applying a basic classification model, I utilized feature engineering, domain knowledge, and model optimization techniques to improve multi-class prediction accuracy.

🔹 Key Highlights:

📌 Feature Engineering: Created domain-specific features like bilirubin-to-albumin ratio, log transformations for skeId features, and binary indicators for critical thresholds.
📌 Modeling: Compared XGBoost, LightGBM, and CatBoost, fine-tuning hyperparameters and using stacking ensembles for performance gains.
📌 Performance Metrics: Evaluated using multi-class log loss and cross-validation to ensure model generalization.

📊 Technologies Used:

  • Python 🐍 Pandas, NumPy, Scikit-learn
  • Winning Model XGBoost
  • Feature Engineering & Data Preprocessing (One-hot encoding, ratio calculations, outlier removal)
  • Hyperparameter Tuning (Randomized Search, Stratified K-Fold Validation)
  • GitHub for Version Control 🛠

🔬 Innovative Methods Used:
Many classification models for medical datasets rely on direct correlations or minimal preprocessing. This project takes a more data-driven and clinical approach, engineering features that reflect real-world liver disease progression. This improves both interpretability and predictive power.

🔗 Check out the full write-up in the repository!


🏠 A Different Approach to Feature Engineering for Predicting House Prices in Ames

GitHub Repo

This project is an in-depth analysis of the Ames Housing dataset, where I applied machine learning models to predict house sale prices. Instead of merely running standard models, I leveraged feature engineering, domain knowledge, and advanced model comparison techniques to improve prediction accuracy.

🔹 Key Highlights:

  • 📌 Feature Engineering: Grouped related features to enhance predictive poIr
  • 📌 Modeling: Compared Decision Trees, Random Forests, Gradient Boosting, and Linear Regression
  • 📌 Performance Metrics: Evaluated RMSE and R² to measure model effectiveness

📊 Technologies Used:

  • Python 🐍 (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn)
  • Machine Learning Models (Linear Regression, Gradient Boosting, Random Forest, Decision Trees)
  • Feature Engineering & Data Preprocessing
  • GitHub for Version Control 🛠

📢 🔬 Innovative Methods Used: Most approaches to this dataset focus on either raw correlations or brute-force feature selection. My approach leverages real-estate knowledge to construct meaningful categories (e.g., grouping porch types, analyzing basement features separately), which led to better model interpretability and stronger predictions...in some cases. See the write-up where I explain.

🔗 Check out the full write-up in the repository!


🎯 Medley Relay Optimization

GitHub Repo

This project tackles the challenge of optimizing a medley relay lineup, where swimmers often excel in multiple strokes, creating trade-offs in event selection. Instead of guessing or manually shuffling times, I developed an Excel Solver-based optimization model to automatically determine the fastest possible relay combination.

🔗 Check out the full write-up in the repository!


🚀 ML Decoded

GitHub Repo

image

Machine learning terms with simple, intuitive explanations.

🔗 View the repository: ML Decoded on GitHub


🚀 Technical Skills

🤖 Machine Learning & Predictive Modeling

  • Developing and optimizing models using:
    • Linear Regression, Decision Trees, Random Forests
    • Gradient Boosting (LightGBM, XGBoost, CatBoost)
    • Support Vector Machines (SVM), Neural Networks (TensorFlow, PyTorch)
    • Clustering (K-Means, DBSCAN), Principal Component Analysis (PCA)

🧠 Data Analysis & Feature Engineering

  • Data wrangling, preprocessing, and feature engineering with:
    • Pandas, NumPy, Scikit-learn, Statsmodels
    • Handling missing values, scaling, encoding categorical variables
    • Engineering domain-specific features to enhance model performance

📊 Data Visualization & Storytelling

  • Communicating insights using:
    • Matplotlib, Seaborn, Plotly, Tableau
    • Creating interactive and high-impact visualizations for stakeholder engagement

💾 Big Data & Scalable Computing

  • Working with large-scale datasets using:
    • Amazon S3, Google BigQuery, Apache Spark, SQL
    • Optimizing storage and query performance for large datasets

📈 Business Intelligence & Data-Driven Strategy

  • Applying data science for:
    • Forecasting, market analysis, and strategic decision-making
    • Business intelligence tools: PoIr BI, Looker
    • Automating reporting and dashboarding solutions

🎓 Education

  • Masters of Science in Analytics 🐝
    Georgia Institute of Technology 🌐

  • Web Development Professional Certificate 📜
    University of California, Davis 🌟

  • Bachelor of Science in Business Finance 💹
    California State University Sacramento 🌳


📚 Bookshelf:


🌐 Let's Connect!


Visitor Count

Pinned Loading

  1. AITrading Public

    Jupyter Notebook

246 contributions in the last year

Contribution Graph
Day of Week April May June July August September October November December January February March April
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Less
No contributions.
Low contributions.
Medium-low contributions.
Medium-high contributions.
High contributions.
More

Contribution activity

April 2025

Created 1 repository
Opened 1 pull request in 1 repository
autumnmarin/skills-introduction-to-github 1 merged
23 contributions in private repositories Apr 1 – Apr 18
Loading