Skip to content

RAAHUL-tech/Real-Time-Fraud-Detection-Pipeline

Repository files navigation

πŸ’³ Real-Time Fraud Detection Pipeline

An end-to-end MLOps project for detecting credit card fraud in real time using modern ML engineering best practices.


Project Overview

This project simulates a real-time fraud detection pipeline using AWS services, MLOps tooling, and robust monitoring. It includes:

  • βš™οΈ DVC for data versioning
  • πŸ§ͺ MLflow & πŸͺ„ Weights & Biases for experiment tracking
  • 🧠 ONNX for portable model inference
  • 🧬 Hydra for configuration management
  • βš“ GitHub Actions for CI/CD
  • 🐍 uv for Python environment management
  • πŸ“Š Evidently AI for data drift monitoring
  • πŸ› οΈ Lambda + SQS for serverless real-time inference

Tools & Libraries Used

Category Tool/Library
Data Versioning DVC, AWS S3
Model Registry MLflow, ONNX
Experiment Tracking Weights & Biases, Hydra, Matplotlib
Model Training scikit-learn, PyTorch Lightning, SMOTE
Data Monitoring Evidently AI, CloudWatch
Inference AWS Lambda, ONNX Runtime, SQS
CI/CD GitHub Actions
Environment Mgmt uv, pipx

βš™οΈ Setup & Installation

Python Environment with uv

pipx install uv
uv venv .venv
uv pip install pytorch-lightning torch pandas scikit-learn numpy hydra-core matplotlib seaborn dvc wandb onnx onnxruntime kaggle
uv add pytorch-lightning torch pandas scikit-learn numpy hydra-core matplotlib seaborn dvc wandb onnx onnxruntime kaggle

Download Dataset from Kaggle

kaggle datasets download -d mlg-ulb/creditcardfraud -p data/raw/ --unzip

Data Preprocessing

uv run src/preprocessing.py

Data Versioning with DVC

dvc init

dvc remote add -d s3remote s3://<your-bucket-name>/dvcstore
dvc remote modify s3remote endpointurl https://s3.us-west-1.amazonaws.com
dvc remote modify s3remote region us-west-1

dvc add data/raw/creditcard.csv
dvc add data/processed/X_train.csv
dvc add data/processed/X_test.csv
dvc add data/processed/Y_train.csv
dvc add data/processed/Y_test.csv

git add .gitignore *.dvc dvc.yaml dvc.lock
git commit -m "Added data files tracked with DVC"
dvc push

Training

Default Training (no sweep)

uv python src/train.py

You can also use docker to run the training scripts,

docker build -t fraud-detection-app:latest .
docker run fraud-detection-app:latest

Screenshot (54) Screenshot (56)

With Weights & Biases Sweep

wandb sweep sweep.yaml
wandb agent <sweep-agent-endpoint>

Screenshot (57) Screenshot (58)

Track Experiments

MLflow UI

mlflow ui

Screenshot (59) Screenshot (60)

Weights & Biases Dashboard

Includes training loss, AUC, hyperparameter optimization, model comparison, etc.

Real-Time Data Simulation

  • Simulates credit card transactions
  • Pushes JSON payloads to SQS queue
  • Triggers AWS Lambda for real-time fraud prediction
python realtime_data_simulation.py

Monitoring & Alerting

  • Lambda logs β†’ CloudWatch
  • Open results/drift_report.html in web browser to monitor datadrift in real time data. Screenshot (61) Screenshot (62) Screenshot (63)

πŸ‘€ Author

Raahul Krishna Durairaju Machine Learning & MLOps Practitioner | MS CS @ Cal State Fullerton

πŸ”— LinkedIn β€’ GitHub

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published