Skip to content

End-to-end credit-risk classification model using XGBoost and scikit-learn. Includes EDA, feature engineering, model evaluation, and cross-validation pipeline.

Notifications You must be signed in to change notification settings

gilml/credit-classification-xgboost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Credit Classification with XGBoost

This project builds and evaluates a credit risk classification model using XGBoost and Scikit-Learn.
The goal is to predict whether a loan applicant is likely to be a good or bad credit risk, based on structured features from the classic German Credit dataset.


πŸ“‹ Project Overview

Pipeline summary:

  1. EDA: dataset inspection, class balance, and feature typing.
  2. Feature Engineering: numeric / categorical separation, encoding, and data preparation.
  3. Modeling: baseline XGBoost training with log-loss objective.
  4. Evaluation: accuracy, ROC-AUC, confusion matrix, and class-wise metrics.
  5. Hyperparameter Tuning: randomized search with stratified 5-fold CV.
  6. Threshold Optimization: F1/precision-recall trade-off for class 0 (bad credit).
  7. Model Export: final tuned model serialized with joblib.

πŸ“ Repository Structure

credit-classification-xgboost/
β”œβ”€β”€ data/                     # Input data (not versioned)
β”‚   └── german_credit_cleaned.csv
β”œβ”€β”€ notebooks/
β”‚   └── 01_eda.ipynb          # Main Jupyter workflow
β”œβ”€β”€ models/
β”‚   └── xgb_best_model.joblib # Trained XGBoost model
β”œβ”€β”€ src/
β”‚   └── credit_risk/          # Future Python package (helpers, pipelines)
β”œβ”€β”€ scripts/                  # Automation or CLI utilities
β”œβ”€β”€ reports/                  # Figures, plots, analysis outputs
β”‚   └── figures/
β”œβ”€β”€ docs/                     # Optional documentation
β”œβ”€β”€ requirements.txt          # Environment dependencies
└── README.md

βš™οΈ Environment

Python 3.13 (virtual environment)

Install dependencies:

pip install -r requirements.txt

πŸš€ Quick Start

Run the main notebook:

jupyter notebook notebooks/01_eda.ipynb

Outputs:

  • Baseline + tuned XGBoost metrics
  • ROC curve and threshold analysis
  • Serialized model in models/xgb_best_model.joblib

πŸ“ Notes

  • StratifiedKFold ensures balanced folds for the binary target.
  • Randomized search (40 iterations) achieved ROC-AUC β‰ˆ 0.79.
  • Decision threshold tuning improved interpretability for loan approval policy.
  • The notebook is modular β€” each markdown section corresponds to a distinct training phase.

About

End-to-end credit-risk classification model using XGBoost and scikit-learn. Includes EDA, feature engineering, model evaluation, and cross-validation pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published