Data Engineer | AI & Analytics | MS in Information Systems @ Northeastern University
Welcome to my GitHub! Iβm a data engineer passionate about building scalable pipelines, analytics systems, and AI-driven applications. I love working at the intersection of data engineering, ML, and cloud, turning raw information into reliable, actionable, and intelligent systems.
- π Iβm currently pursuing my Masterβs in Information Systems at Northeastern University, Boston (Dec 2025)
- πΌ Previously worked as:
- Data Analyst Co-op at Boehringer Ingelheim - Built Tableau dashboards, automated data pipelines, and improved data quality across large-scale healthcare campaigns.
- Data Engineer at LTIMindtree - Engineered scalable data pipelines and optimized analytics workflows to deliver reliable insights.
- π€ Currently exploring GenAI, Python agents, RAG systems, and healthcare ML
- π€ Open to collaborating on data engineering, AI/ML, BI, and research projects
- π‘ Passionate about building scalable systems and making data accessible, reliable, and impactful
Here are some of my key projects hosted on GitHub:
-
AI Healthcare System
RAG-powered medication query engine integrating medical data ingestion, vector search, and chatbot interaction
LLAMA β’ RAG β’ LangChain β’ Streamlit β’ Pinecone β’ FastAPI β’ Snowflake β’ Python β’ SQL β’ Beautifulsoup -
Mindaid
LLM-powered mental health assistant with ML models, RAG search, and a Streamlit app for personalized counseling support
Falcon-7B β’ RAG β’ Streamlit β’ Pinecone β’ Docker
-
Food Inspection Analysis
End-to-end BI solution with ETL pipelines, dimensional modeling, and Tableau dashboards
Azure Data Factory β’ Snowflake β’ dimensional Modeling β’ Tableau β’ Python β’ SQL β’ Alteryx -
Optimizing Returns & Refunds in Supply Chain
End-to-end OLTP system automating returns, refunds, customer reliability scoring, and exception handling
OLTP β’ PL/SQL β’ ERD/DFD β’ Oracle Database β’ Supply Chain Systems β’ Normalization -
DBT Commercial Analytics Data Model
Data quality framework using schema validation, tests, and CI/CD for commercial analytics with DBT, Snowflake, data modeling, and GitHub Actions automation
DBT β’ Snowflake β’ Data Modeling β’ GitHub Actions -
Tableau Data Visualization Portfolio
Collection of interactive dashboards covering retail, public safety, and food compliance analytics, showcasing end-to-end data storytelling and insight generation
Tableau β’ Data Visualization β’ Analytics β’ Storytelling
-
US Accident Prediction
ML model predicting accident severity using traffic, weather, and road condition features
EDA β’ ML Models β’ Python -
Sentiment Analysis using LSTM
Deep learning model classifying Amazon customer reviews using LSTM and distributed training
NLP β’ LSTM β’ Distributed Training (DDP) β’ Pytorch
Python β’ SQL β’ PySpark β’ Scala β’ Java β’ Typescript
Snowflake β’ Redshift β’ BigQuery β’ DynamoDB β’ Delta Lake
Snowflake β’ Databricks β’ Airflow β’ Kafka β’ DBT β’ Pyspark β’ Flink β’ Lambda β’ Azure Datafactory β’ Alteryx β’ Docker
PyTorch β’ TensorFlow β’ Scikit-learn β’ LangChain β’ RAG β’ NLP β’ Transformers β’ OpenAI APIs
Tableau β’ Power BI β’ Plotly β’ Streamlit
AWS β’ Azure β’ GCP
- Tableau Certified Desktop Specialist
- AWS Certified Data Engineer Associate
- Snowflake SnowPro Core Certification
Feel free to explore my repositories, and letβs connect to collaborate on data engineering, AI, and impactful analytics.
