This project analyzes and models real-life insurance claims data using various machine learning approaches such as XGBoost (Classification and Regression) and Tweedie Regression.
.
├── datasets/
│ ├── balanced_dataset.csv
│ └── freMTPL2freq.csv
├── models&results/
│ ├── TweedieReg
│ ├── XGBclass
│ └── XGBreg
├── utils/
│ ├── data_utils.py
│ ├── tweedie_utils.py
│ ├── XGB_class_utils.py
│ └── XGB_reg_utils.py
├── 1_data_analysis.ipynb
├── 2_XGBoost_class.ipynb
├── 2_XGBoost_reg.ipynb
├── 3_Tweedie.ipynb
├── 4_Interpretation.ipynb
├── LICENSE
├── README.md
└── requirements.txt
These instructions will help you set up the project locally.
git clone https://github.com/EdinZiga/InsuranceModeling
cd InsuranceModelingpython -m venv .venv- On Windows:
.venv\Scripts\activate- On macOS/Linux:
source .venv/bin/activatepip install --upgrade pip
pip install -r requirements.txtjupyter labIf you prefer using Docker instead of setting up a local Python environment, follow these steps:
docker build -t insurance-modeling .docker run -p 8888:8888 -v $(pwd):/app insurance-modelingOn Windows (CMD/PowerShell), use:
docker run -p 8888:8888 -v %cd%:/app insurance-modelingOnce the container starts, it will output a URL containing a token. Open that URL in your browser to access Jupyter Lab.
Once Jupyter Lab is running in your browser:
- Open the file:
notebooks/1_data_analysis.ipynb - Run through each cell step by step to explore the data and run models.
- Follow the notebooks in order:
1_data_analysis.ipynb2_XGBoost_class.ipynb2_XGBoost_reg.ipynb3_Tweedie.ipynb4_Interpretation.ipynb
Main dependencies include:
numpypandasscikit-learnxgboostmatplotlibseabornstatsmodelsjupyterlabscipyshapscikit-optimizetqdm
All required packages are listed in requirements.txt.
- Ensure Python 3.8+ is installed.
- The dataset is availalbe in its folder
- Everything is 'pre-ran', so there are representative outputs saved
- The
utilsfolder and the .py files inside contain all the functions used in the notebooks. - All models and results are saved under
models&resultsfolder, depending on the model chosen - All best runs achieved by me are presented at the start of each notebook.
- Enjoy!
This project is licensed under the MIT License.