Skip to content

An analysis of a Insurance Dataset with the goal of creating an ML model which can effectively predict insurance claims.

License

Notifications You must be signed in to change notification settings

EdinZiga/InsuranceModeling

Repository files navigation

Insurance Claims Modeling Project

This project analyzes and models real-life insurance claims data using various machine learning approaches such as XGBoost (Classification and Regression) and Tweedie Regression.


Project Structure

.
├── datasets/
│   ├── balanced_dataset.csv
│   └── freMTPL2freq.csv
├── models&results/
│   ├── TweedieReg
│   ├── XGBclass
│   └── XGBreg
├── utils/
│   ├── data_utils.py
│   ├── tweedie_utils.py
│   ├── XGB_class_utils.py
│   └── XGB_reg_utils.py
├── 1_data_analysis.ipynb
├── 2_XGBoost_class.ipynb
├── 2_XGBoost_reg.ipynb
├── 3_Tweedie.ipynb
├── 4_Interpretation.ipynb
├── LICENSE
├── README.md
└── requirements.txt

Setup Instructions

These instructions will help you set up the project locally.

1. Clone the Repository

git clone https://github.com/EdinZiga/InsuranceModeling
cd InsuranceModeling

2. Create a Virtual Environment

python -m venv .venv

3. Activate the Environment

  • On Windows:
.venv\Scripts\activate
  • On macOS/Linux:
source .venv/bin/activate

4. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

5. Launch Jupyter Lab

jupyter lab

6. Run with Docker (Alternative Setup)

If you prefer using Docker instead of setting up a local Python environment, follow these steps:

Build the Docker Image

docker build -t insurance-modeling .

Run the Container

docker run -p 8888:8888 -v $(pwd):/app insurance-modeling

On Windows (CMD/PowerShell), use:

docker run -p 8888:8888 -v %cd%:/app insurance-modeling

Access Jupyter Lab

Once the container starts, it will output a URL containing a token. Open that URL in your browser to access Jupyter Lab.


Usage

Once Jupyter Lab is running in your browser:

  1. Open the file: notebooks/1_data_analysis.ipynb
  2. Run through each cell step by step to explore the data and run models.
  3. Follow the notebooks in order:
    • 1_data_analysis.ipynb
    • 2_XGBoost_class.ipynb
    • 2_XGBoost_reg.ipynb
    • 3_Tweedie.ipynb
    • 4_Interpretation.ipynb

Dependencies

Main dependencies include:

  • numpy
  • pandas
  • scikit-learn
  • xgboost
  • matplotlib
  • seaborn
  • statsmodels
  • jupyterlab
  • scipy
  • shap
  • scikit-optimize
  • tqdm

All required packages are listed in requirements.txt.


Notes

  • Ensure Python 3.8+ is installed.
  • The dataset is availalbe in its folder
  • Everything is 'pre-ran', so there are representative outputs saved
  • The utils folder and the .py files inside contain all the functions used in the notebooks.
  • All models and results are saved under models&results folder, depending on the model chosen
  • All best runs achieved by me are presented at the start of each notebook.
  • Enjoy!

License

This project is licensed under the MIT License.

Releases

No releases published

Packages

No packages published