Role: Machine Learning Engineer (Intern) | Company: meleap Inc. (Shanghai) Tech Stack: Python, Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn
Due to Non-Disclosure Agreements (NDA), the raw source code, datasets, and specific internal balancing parameters cannot be shared publicly. This repository serves as a high-level overview of the system architecture, methodologies, and outcomes achieved during my tenure.
This portfolio entry outlines the analytics engine developed during my internship at meleap Inc. for HADO, a global Augmented Reality (AR) techno-sport.
The project analyzed high-dimensional telemetry data to solve three critical business challenges:
- Game Balancing: Identifying "meta" hero archetypes and ensuring competitive fairness.
- Player Categorization: Algorithmic classification of user demographics (e.g., "Pro Child" vs. "Coach") using gameplay metrics.
- Performance Benchmarking: Real-time generation of "S+ through F" rankings to provide players with actionable feedback.
This system processed data from a global player base, utilizing unsupervised learning models to benchmark user agility and standardize competitive integrity.
Script Reference: hero_meta_clustering.py, hero_3d_visualization.py
To understand player behavior, I engineered an unsupervised learning model using K-Means Clustering and Principal Component Analysis (PCA). This module analyzed 4D hero statistics (Attack, Defense, Healing, Control) to identify playstyle clusters.
- Dimensionality Reduction: Applied PCA to visualize high-dimensional telemetry, revealing distinct hero archetypes.
- Meta Visualization: Created 3D/4D visualizations to audit the game for "overpowered" strategies.
Script Reference: demographic_classification_model.py, statistical_fairness_validator.py
This module validated the use of "K-Value" (a proprietary performance metric) as a predictor for player demographics.
- Statistical Validation: Performed ANOVA (F=8.1788, p<0.0001) and Kruskal-Wallis tests to confirm that hero selection significantly impacts K-Value performance.
- Demographic Classification: Developed decision boundaries using Random Forest and Decision Tree classifiers to categorize users into groups based on telemetry ranges.
- Key Findings:
- Coaches: Highest mean K-Value (~0.040), significantly outperforming other groups.
- Pro Children: Distinct performance tier (~0.030) compared to general users.
- General Users: Baseline K-Value (~0.022).
Script Reference: realtime_scoring_engine.py, baseline_etl_processor.py
A production-ready evaluation engine that provides real-time feedback.
- ETL Pipeline: Acts as an ETL handler, cleaning raw CSV logs and generating JSON lookup tables for fast retrieval.
- Scoring System: Benchmarks individual matches against global percentiles, assigning roles (e.g., "Apex Predator", "Guardian") and ranks (S+ to F).
Script Reference: data_sanitation_pipeline.py
Implemented rigorous cleaning protocols to handle noise in the accelerometer and gameplay logs.
- Outlier Detection: Utilized Interquartile Range (IQR) and Isolation Forest methods to remove anomalous data points caused by sensor noise or bugged matches.
- Logic Checks: Filtered impossible values to ensure training data quality.
- Production Deployment: The clustering and scoring models were integrated into the backend, enabling automated skill assessment for users.
- Data-Driven Balancing: Provided the development team with quantitative evidence (ANOVA results) to adjust hero parameters, ensuring a fair experience for the player base.
- Automated Coaching: The system successfully distinguishes between "Pro Child" and "Amateur" playstyles, allowing for targeted algorithmic coaching.