Skip to content

lixianphys/MRImaster

Repository files navigation

MRIMaster: AI-powered medical imaging classifier and segmenter

License: MIT

Table of Contents

For App Users

How to run the app Take a look at the app

For Developers

Datasets Data Preprocessing Model Training Validation Inference Configuration Disclaimer Features to add

For App Users

How to run the app

Clone this repo

git clone git@github.com:lixianphys/MRImaster.git
cd mrimaster
git checkout published
mkdir models

Download model weights deployed_models/cnn_model.pt+unet_model.pt

Place this deployed_models under models

Setup Environment (Linux or WSL)

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Launch the App

streamlit run app.py

Take a look at the app

cnn model (Prediction+Grad-CAM)

Image 1 Image 2

unet3d model (Slice,Modality,Segmentation)

Image 1 Image 2

For developers

Datasets

Brats dataset - Task01 Brain Tumor (unet3d model) Brats2017 (Gliomas segmentation tumour and oedema in on brain images) This 4D image dataset contains brain MR images together with segmentation masks. All images and masks are provided in .nii.gz format with 4 channels (FLAIR,T1w, t1gd and T2w) per image. Masks are categorical with four classes: background, edema, non-enhancing tumor and enhancing tumour.

Kaggle dataset - brain-tumor-classification-mri (cnn model) This brain-tumor-classification-mri dataset contains Training and Testing folders. Each folder has four subfolders, which contain MRIs of respective tumor classes (Glioma, Meningioma, Pituitary and No Tumor).

Data Preprocessing

It is rather straightforward to download medium-sized, well-structured Kaggle dataset by using src.preprocess.kaggledata.KaggleDataPipe. While dealing with a large volume of nii.gz or nii files (a single file can exceed 100 Mb), it is worth considering about reducing the loading time during each epoch of training. For this consideration, please have a look at the design of src.preprocess.nifti.LazyLoadingNiftiDataset about caching and reloading. Differently, for evaluation and inference, this caching mechanism will slow down the process, we simply turn to a normal loading process encapsulated in src.preprocess.nifti.NormalLoadingNiftiDataset.

Load Kaggle Datasets

configure your Kaggle API credentials in .env

# Kaggle Information
KAGGLE_USERNAME = ''
KAGGLE_KEY = ''

or set the environment variables with export KAGGLE_USERNAME=XXX, then run

python3 scripts/load_kaggle_data.py

If everything goes well, you will see:

Authentication to Kaggle successful!
Dataset URL: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri
File download successful!

Model

For adapting models to more specific uses, some model hyperparameters, such as number of classes, can be modified directly at the model block in config files config/cnn.yaml and config/unet.yaml. Below are the default models for each type:

  • cnn model: 4 layers of convoluational neural network for classification task. Input is in shape of (C=3, H=256, W=256). Output is the prediction of 4 classes.
  • unet3d model: Unet shape for segmentation task. Input is in shape of (bach_size, C=4, H=128, W=128, D=128).

Training

Edit the train block in config files.

python scripts/train_model.py --model [cnn or unet3d] --config [path_to_config_file] --use_mlflow

This command-line together with the config files for training different models (cnng.yaml and unet.ymal) provides a easy-to-go and flexible access to training your model.

Additionally, adding --use_mlflow ensures logging the experiment, parameters, metrics and artifacts into a MLflow server. Make sure that you have already spinned up one like this:

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns --host 0.0.0.0 --port 5000

Batch Training via List in YAML file

model: # This block contains type and hyperparameters of the model.
  type: "cnn"
  shape_in: (3,256,256)  # default: [3,256,256]  
  num_classes: 4  # default: 4
  initial_filters: [8,16,32] # default: 8
  num_fc1: 100  # default: 100
  dropout_rate: 0.25  # default: 0.25

train:
  load:
    train_ratio: [0.7,0.8]
    save_path: "checkpoints/cnn_model/tunecnn"
    checkpoint_name: "#"

This YAML file will generate $3\times 2$ combinations of [8,16,32] and [0.7,0.8]. To save the checkpoint for each combination, use # for checkpoint_name. Otherwise, fill <checkpoint_name>.pt.

Validation

Evaluate the trained model with a fresh (not seen by the model yet) dataset can quickly provide a good feeling about how good the model can perform in real-world settings. After evaluation, a report in .md format will be generated, summarizing the performance (Confusion Matrix, Classification Report, IOU score and Dice score, etc.). Edit the eval block in config files.

python scripts/eval_model.py --model [cnn or unet3d] --config [path_to_config_file]

Inference

Edit the deploy block in config files.

python scripts/pred_model.py --model [cnn or unet3d] --config [path_to_config_file]

For the cnn model, the prediction result is directly displayed. While the unet3d model would output a mask of predicted labels to the path specified by ['deploy']['output'].

In the previous single-modal version (app_v0.py), Fastapi framework is used to deploy inference locally. Here we adopt the Streamlit to deploy this multi-modal inference (app.py), configured by the deploy block. For more details about this app. Jump here

Configuration

The configuration file is written in YAML format that contains blocks and subblocks. It is recommended to create a config file for each individual model and place these files under the config folder.

# cnn_config.yaml
model:
  type: "cnn"
  shape_in: (3,256,256)  # default: [C=3,W=256,H=256]  
  num_classes: 4  # default: 4
  initial_filters: 4 # default: 8
  num_fc1: 100  # default: 100
  dropout_rate: 0  # default: 0.25

train:
  skip_loading: true # set true if you want to use the existing data in output folder.
  data:
    dataset: "data/raw_data/brain-tumor-classification-mri/Testing" # raw data
    # !! if skip_loading == False, run this training WILL DELETE THIS 'output' DIRECTORY TO REMOVE DATA FROM PREVIOUS TRAINING WITH YOUR PERMISSION.
    output: "data/processed_data/brain-tumor-classification-mri" 
    train_set: "train" # path to train_set: output/train_set 
    val_set: "val" # path to val_set: output/valset
  load: # if skip_loading == True, this subblock is ignored.
    train_ratio: 0.8 # default: 0.8 split folders into train and val sets by this ratio
    image_size: (256,256) # default: [H=256,W=256] transform to this image size. 
  mlflow:
    enabled: true
    uri: "http://localhost:5000"
    experiment: "fineTuneCNN"
  batch_size: 64 # default: 64
  epochs: 3
  learning_rate: 3e-4 # default: 3e-4
  verbose: true
  device: 'cpu' # most commonly “cpu” or “cuda”, but also potentially “mps”, “xpu”, “xla” or “meta”.
  save_path: "checkpoints/cnn_model/tunecnn"
  checkpoint_name: "checkpoint.pt"

eval:
  model: "checkpoints/saved_models/cnn_model.pt"
  image_size: (256,256)
  batch_size: 64 # default: 64
  data: "data/raw_data/brain-tumor-classification-mri/Testing"
  device: 'cpu'
  report: "output/test.md"

deploy:
  model: "checkpoints/deployed_models/cnn_model.pt"
  input: "data/processed_data/brain-tumor-classification-mri/train/glioma_tumor/image.jpg"  # Path to the input image
  device: 'cpu'

This configuration file should contain four blocks: model, train, eval and deploy for the entire AI-model pipeline.

Disclaimer

This dataset contains medical images intended solely for research, educational, and informational purposes.

Features to add

  • Switch between models for different classification tasks
  • Data pipeline for additional datasets beyond Kaggle, e.g., TCIA API
  • Object detection for identifying and measuring tumor size
  • CNN model inference for a folder of 2D images
  • UNet3D model inference for a folder of nii or nii.gz images
  • Generalize UNet3D into UNet with a dim parameter to switch to 1D, 2D model.
  • Option to add validation during each epoch for UNet model.
  • Option to save the best model for each epoch
  • Better saving and loading model checkpoints. Check out monai.engines.SupervisedTrainer and monai.handlers.
  • Deterministic training support
  • Integrate UNETR model for 3D segmentation.
  • Option to remove background(case 0) in UNet training when the background class may dominate the calculation and lead the network to optimise by just ignoring small segmentation classes.
  • Logging module

License

This project is licensed under the MIT License.