This project aims to predict the likelihood of a patient developing heart disease or cardiovascular disease using machine learning techniques. The project includes the following steps:
1. Importing necessary libraries
2. Loading and cleaning the data (heart.csv and cardio_train.csv)
3. Preprocessing and feature selection
4. Exploratory Data Analysis (EDA)
5. Model selection and training
6. Hyperparameter tuning
7. Evaluation using classification report
8. Model saving and testing on new datasets
This project requires the following packages:
pandas
numpy
matplotlib
seaborn
sklearn
xgboost
You can install these packages by running the command:
conda install --file requirements.txt
The datasets used in this project can be found on Kaggle:
1. heart.csv: https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset
2. cardio_train.csv: https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset?resource=download
Please note that the datasets may have been preprocessed and cleaned before being used in this project.
The project is divided into three notebooks:
heart.ipynb
cardio.ipynb
supervised_learning.ipynb
The heart.ipynb is used to analyze the heart.csv while cardio.ipynb is used to analyze the cardio_train.csv. The supervised_learning.ipynb serves as a guide and reference for supervised learning concepts and techniques used in the heart and cardio notebooks.
Please note that the code in the notebooks is written in Python and requires Jupyter Notebook to run.
1. Clone the repository
2. Install the required packages
3. Run the Jupyter Notebook
4. Run the cells in the notebooks
Please note that the code may take some time to run, especially when training the models and performing hyperparameter tuning.