GitHub - Gos2two/random-M81: Logistic regression model for predicting osteoporosis risk based on clinical and lifestyle variables. Includes data preprocessing, performance evaluation (ROC, Precision-Recall curves), and feature importance analysis. Developed in R for clarity and reproducibility.

Gos2two / random-M81 Public

forked from yass-78/Osteoporosis-Risk-Prediction

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Logistic regression model for predicting osteoporosis risk based on clinical and lifestyle variables. Includes data preprocessing, performance evaluation (ROC, Precision-Recall curves), and feature importance analysis. Developed in R for clarity and reproducibility.

1 star 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
API.R		API.R
README.Rmd		README.Rmd
RUN_API.R		RUN_API.R
Treball_Final_Bioinfo.R		Treball_Final_Bioinfo.R
osteoporosis.csv		osteoporosis.csv

Repository files navigation

# Osteoporosis Prediction Project

This project aims to develop a prediction system for osteoporosis using Logistic Regression and Random Forest models. The system predicts osteoporosis risk based on a variety of clinical and lifestyle factors. Below is a detailed explanation of the key components and scripts in this project.

## Documents Overview

**Treball_Final_Bioinfo.R**
This script contains the entire process of training and evaluating a Logistic Regression model for osteoporosis prediction. The key steps include:

1. Package Installation

2. Dataset Preparation:
- Loads the `osteoporosis.csv` dataset, removes irrelevant columns, handles missing data, and converts categorical variables into factors.

3. Model Training:
- Trains a Logistic Regression model using 70% of the dataset (training set) and evaluates it using confusion matrix, ROC curve, and Precision-Recall curve.

4. Feature Importance Evaluation:
- A Random Forest model is trained to identify the most influential features for predicting osteoporosis.

5. Model Saving:
- Saves the trained Logistic Regression and Random Forest models for future use.

**API.R**
This script sets up a **REST API** using the plumber package to provide osteoporosis predictions based on the trained Logistic Regression model. The key steps are:

1. Library Loading

2. API Endpoint Definition:
- Defines the predict_logistic endpoint that accepts user inputs such as gender, age, family history, physical activity, etc., and returns the probability and prediction (Yes/No) of osteoporosis.

3. Input Format:
- Accepts both categorical (e.g., gender, family history) and numeric (e.g., age) input variables for making predictions.

4. Prediction:
- Uses the trained Logistic Regression model to make predictions based on the provided inputs.

**Run_API.R**
This script runs the API defined in 'API.R' on a local server.

---

## How to Run the Project

1. **Train the model**:
- Run 'Treball_Final_Bioinfo.R' to train the Logistic Regression. Make sure the dataset (osteoporosis.csv) is placed correctly at the specified path.

2. **Run the API**:
- Run 'API.R' to load the trained Logistic Regression model and set up the API with the predict_logistic` endpoint.

3. **Start the server**:
- Run 'Run_API.R' to start the server and make the API available on port 8000.

---

## Required Packages

The following R packages are required to run the project:

- `plumber`: For creating REST APIs in R.
- `caret`: For building predictive models.
- `dplyr`: For data manipulation.
- `PRROC`: For Precision-Recall curve creation.
- `pROC`: For ROC curve creation and evaluation.
- `smotefamily`: For handling imbalanced data through SMOTE.
- `randomForest`: For building Random Forest models.