This space is a work in progress, and honestly, that's part of the fun. I started this portfolio to share my journey as I learn, build, and sometimes stumble while exploring various technologies and techniques. Whether you are here to explore my capabilities, or just someone curious about data, I hope you find something here that feels real and useful.
The Metro project is my hands-on exploration of public transit data. Here, I've built ETL pipelines to ingest, clean, and transform raw train and station data, then modeled it for analytics and reporting. I use tools like Apache Spark and Delta Lake to process large datasets, and I will be using Power BI to visualise patronage trends and operational insights. This project is a practical showcase of how I approach real-world data engineering challenges from wrangling CSVs to orchestrating workflows and surfacing insights for decision-makers.
Portfolio Items:
- Metro: Public transit data engineering and analytics (ETL, Spark, Delta Lake, Power BI)
- Bronze/Silver/Gold Data Profiling: Data quality and profiling notebooks
- Data Pipeline Scripts: Modular Python scripts for data curation and transformation
- Config Management: Environment and configuration templates for reproducible workflows
- Visit my LinkedIn profile
- Check out my other projects on my GitHub Profile
Thanks again for visiting. I'll keep updating this as I grow, so check back for new projects, stories, and maybe a few lessons learned the hard way.