GitHub - lixianphys/MRImaster: AI-powered Medical Imaging Assistant

MRIMaster: AI-powered medical imaging classifier and segmenter

For App Users

How to run the app

Clone this repo

git clone git@github.com:lixianphys/MRImaster.git
cd mrimaster
git checkout published
mkdir models

Download model weights deployed_models/cnn_model.pt+unet_model.pt

Place this deployed_models under models

Setup Environment (Linux or WSL)

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Launch the App

streamlit run app.py

Take a look at the app

cnn model (Prediction+Grad-CAM)

unet3d model (Slice,Modality,Segmentation)

For developers

Datasets

Brats dataset - Task01 Brain Tumor (unet3d model) Brats2017 (Gliomas segmentation tumour and oedema in on brain images) This 4D image dataset contains brain MR images together with segmentation masks. All images and masks are provided in .nii.gz format with 4 channels (FLAIR,T1w, t1gd and T2w) per image. Masks are categorical with four classes: background, edema, non-enhancing tumor and enhancing tumour.

Kaggle dataset - brain-tumor-classification-mri (cnn model) This brain-tumor-classification-mri dataset contains Training and Testing folders. Each folder has four subfolders, which contain MRIs of respective tumor classes (Glioma, Meningioma, Pituitary and No Tumor).

Data Preprocessing

It is rather straightforward to download medium-sized, well-structured Kaggle dataset by using src.preprocess.kaggledata.KaggleDataPipe. While dealing with a large volume of nii.gz or nii files (a single file can exceed 100 Mb), it is worth considering about reducing the loading time during each epoch of training. For this consideration, please have a look at the design of src.preprocess.nifti.LazyLoadingNiftiDataset about caching and reloading. Differently, for evaluation and inference, this caching mechanism will slow down the process, we simply turn to a normal loading process encapsulated in src.preprocess.nifti.NormalLoadingNiftiDataset.

Load Kaggle Datasets

configure your Kaggle API credentials in .env

# Kaggle Information
KAGGLE_USERNAME = ''
KAGGLE_KEY = ''

or set the environment variables with export KAGGLE_USERNAME=XXX, then run

python3 scripts/load_kaggle_data.py

If everything goes well, you will see:

Authentication to Kaggle successful!
Dataset URL: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri
File download successful!

Model

For adapting models to more specific uses, some model hyperparameters, such as number of classes, can be modified directly at the model block in config files config/cnn.yaml and config/unet.yaml. Below are the default models for each type:

cnn model: 4 layers of convoluational neural network for classification task. Input is in shape of (C=3, H=256, W=256). Output is the prediction of 4 classes.
unet3d model: Unet shape for segmentation task. Input is in shape of (bach_size, C=4, H=128, W=128, D=128).

Training

Edit the train block in config files.

python scripts/train_model.py --model [cnn or unet3d] --config [path_to_config_file] --use_mlflow

This command-line together with the config files for training different models (cnng.yaml and unet.ymal) provides a easy-to-go and flexible access to training your model.

Additionally, adding --use_mlflow ensures logging the experiment, parameters, metrics and artifacts into a MLflow server. Make sure that you have already spinned up one like this:

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns --host 0.0.0.0 --port 5000

Batch Training via List in YAML file

model: # This block contains type and hyperparameters of the model.
  type: "cnn"
  shape_in: (3,256,256)  # default: [3,256,256]  
  num_classes: 4  # default: 4
  initial_filters: [8,16,32] # default: 8
  num_fc1: 100  # default: 100
  dropout_rate: 0.25  # default: 0.25

train:
  load:
    train_ratio: [0.7,0.8]
    save_path: "checkpoints/cnn_model/tunecnn"
    checkpoint_name: "#"

This YAML file will generate $3\times 2$ combinations of [8,16,32] and [0.7,0.8]. To save the checkpoint for each combination, use # for checkpoint_name. Otherwise, fill <checkpoint_name>.pt.

Validation

Evaluate the trained model with a fresh (not seen by the model yet) dataset can quickly provide a good feeling about how good the model can perform in real-world settings. After evaluation, a report in .md format will be generated, summarizing the performance (Confusion Matrix, Classification Report, IOU score and Dice score, etc.). Edit the eval block in config files.

python scripts/eval_model.py --model [cnn or unet3d] --config [path_to_config_file]

Inference

Edit the deploy block in config files.

python scripts/pred_model.py --model [cnn or unet3d] --config [path_to_config_file]

For the cnn model, the prediction result is directly displayed. While the unet3d model would output a mask of predicted labels to the path specified by ['deploy']['output'].

In the previous single-modal version (app_v0.py), Fastapi framework is used to deploy inference locally. Here we adopt the Streamlit to deploy this multi-modal inference (app.py), configured by the deploy block. For more details about this app. Jump here

Configuration

The configuration file is written in YAML format that contains blocks and subblocks. It is recommended to create a config file for each individual model and place these files under the config folder.

# cnn_config.yaml
model:
  type: "cnn"
  shape_in: (3,256,256)  # default: [C=3,W=256,H=256]  
  num_classes: 4  # default: 4
  initial_filters: 4 # default: 8
  num_fc1: 100  # default: 100
  dropout_rate: 0  # default: 0.25

train:
  skip_loading: true # set true if you want to use the existing data in output folder.
  data:
    dataset: "data/raw_data/brain-tumor-classification-mri/Testing" # raw data
    # !! if skip_loading == False, run this training WILL DELETE THIS 'output' DIRECTORY TO REMOVE DATA FROM PREVIOUS TRAINING WITH YOUR PERMISSION.
    output: "data/processed_data/brain-tumor-classification-mri" 
    train_set: "train" # path to train_set: output/train_set 
    val_set: "val" # path to val_set: output/valset
  load: # if skip_loading == True, this subblock is ignored.
    train_ratio: 0.8 # default: 0.8 split folders into train and val sets by this ratio
    image_size: (256,256) # default: [H=256,W=256] transform to this image size. 
  mlflow:
    enabled: true
    uri: "http://localhost:5000"
    experiment: "fineTuneCNN"
  batch_size: 64 # default: 64
  epochs: 3
  learning_rate: 3e-4 # default: 3e-4
  verbose: true
  device: 'cpu' # most commonly “cpu” or “cuda”, but also potentially “mps”, “xpu”, “xla” or “meta”.
  save_path: "checkpoints/cnn_model/tunecnn"
  checkpoint_name: "checkpoint.pt"

eval:
  model: "checkpoints/saved_models/cnn_model.pt"
  image_size: (256,256)
  batch_size: 64 # default: 64
  data: "data/raw_data/brain-tumor-classification-mri/Testing"
  device: 'cpu'
  report: "output/test.md"

deploy:
  model: "checkpoints/deployed_models/cnn_model.pt"
  input: "data/processed_data/brain-tumor-classification-mri/train/glioma_tumor/image.jpg"  # Path to the input image
  device: 'cpu'

This configuration file should contain four blocks: model, train, eval and deploy for the entire AI-model pipeline.

Disclaimer

This dataset contains medical images intended solely for research, educational, and informational purposes.

Features to add

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
config		config
frontend		frontend
output		output
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile-app_v0		Dockerfile-app_v0
LICENSE		LICENSE
README.md		README.md
README_app_v0.md		README_app_v0.md
app.py		app.py
app_v0.py		app_v0.py
e.env		e.env
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MRIMaster: AI-powered medical imaging classifier and segmenter

Table of Contents

For App Users

How to run the app

Take a look at the app

For developers

Datasets

Data Preprocessing

Load Kaggle Datasets

Model

Training

Batch Training via List in YAML file

Validation

Inference

Configuration

Disclaimer

Features to add

License

About

Releases

Packages

Languages

License

lixianphys/MRImaster

Folders and files

Latest commit

History

Repository files navigation

MRIMaster: AI-powered medical imaging classifier and segmenter

Table of Contents

For App Users

How to run the app

Take a look at the app

For developers

Datasets

Data Preprocessing

Load Kaggle Datasets

Model

Training

Batch Training via List in YAML file

Validation

Inference

Configuration

Disclaimer

Features to add

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages