How to run the app Take a look at the app
For DevelopersDatasets Data Preprocessing Model Training Validation Inference Configuration Disclaimer Features to add
Clone this repo
git clone git@github.com:lixianphys/MRImaster.git
cd mrimaster
git checkout published
mkdir models
Download model weights deployed_models/cnn_model.pt+unet_model.pt
Place this deployed_models
under models
Setup Environment (Linux or WSL)
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Launch the App
streamlit run app.py
cnn model (Prediction+Grad-CAM)
unet3d model (Slice,Modality,Segmentation)
Brats dataset - Task01 Brain Tumor (unet3d model)
Brats2017 (Gliomas segmentation tumour and oedema in on brain images)
This 4D image dataset contains brain MR images together with segmentation masks. All images and masks are provided in .nii.gz
format with 4 channels (FLAIR,T1w, t1gd and T2w) per image. Masks are categorical with four classes: background, edema, non-enhancing tumor and enhancing tumour.
Kaggle dataset - brain-tumor-classification-mri (cnn model) This brain-tumor-classification-mri dataset contains Training and Testing folders. Each folder has four subfolders, which contain MRIs of respective tumor classes (Glioma, Meningioma, Pituitary and No Tumor).
It is rather straightforward to download medium-sized, well-structured Kaggle dataset by using src.preprocess.kaggledata.KaggleDataPipe
. While dealing with a large volume of nii.gz
or nii
files (a single file can exceed 100 Mb), it is worth considering about reducing the loading time during each epoch of training. For this consideration, please have a look at the design of src.preprocess.nifti.LazyLoadingNiftiDataset
about caching and reloading. Differently, for evaluation and inference, this caching mechanism will slow down the process, we simply turn to a normal loading process encapsulated in src.preprocess.nifti.NormalLoadingNiftiDataset
.
configure your Kaggle API credentials in .env
# Kaggle Information
KAGGLE_USERNAME = ''
KAGGLE_KEY = ''
or set the environment variables with export KAGGLE_USERNAME=XXX
, then run
python3 scripts/load_kaggle_data.py
If everything goes well, you will see:
Authentication to Kaggle successful!
Dataset URL: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri
File download successful!
For adapting models to more specific uses, some model hyperparameters, such as number of classes, can be modified directly at the model
block in config files config/cnn.yaml
and config/unet.yaml
. Below are the default models for each type:
- cnn model: 4 layers of convoluational neural network for classification task. Input is in shape of (C=3, H=256, W=256). Output is the prediction of 4 classes.
- unet3d model: Unet shape for segmentation task. Input is in shape of (bach_size, C=4, H=128, W=128, D=128).
Edit the train
block in config files.
python scripts/train_model.py --model [cnn or unet3d] --config [path_to_config_file] --use_mlflow
This command-line together with the config files for training different models (cnng.yaml
and unet.ymal
) provides a easy-to-go and flexible access to training your model.
Additionally, adding --use_mlflow
ensures logging the experiment, parameters, metrics and artifacts into a MLflow server. Make sure that you have already spinned up one like this:
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns --host 0.0.0.0 --port 5000
model: # This block contains type and hyperparameters of the model.
type: "cnn"
shape_in: (3,256,256) # default: [3,256,256]
num_classes: 4 # default: 4
initial_filters: [8,16,32] # default: 8
num_fc1: 100 # default: 100
dropout_rate: 0.25 # default: 0.25
train:
load:
train_ratio: [0.7,0.8]
save_path: "checkpoints/cnn_model/tunecnn"
checkpoint_name: "#"
This YAML file will generate [8,16,32]
and [0.7,0.8]
. To save the checkpoint for each combination, use #
for checkpoint_name
. Otherwise, fill <checkpoint_name>.pt
.
Evaluate the trained model with a fresh (not seen by the model yet) dataset can quickly provide a good feeling about how good the model can perform in real-world settings. After evaluation, a report in .md
format will be generated, summarizing the performance (Confusion Matrix, Classification Report, IOU score and Dice score, etc.). Edit the eval
block in config files.
python scripts/eval_model.py --model [cnn or unet3d] --config [path_to_config_file]
Edit the deploy
block in config files.
python scripts/pred_model.py --model [cnn or unet3d] --config [path_to_config_file]
For the cnn model, the prediction result is directly displayed. While the unet3d model would output a mask of predicted labels to the path specified by ['deploy']['output']
.
In the previous single-modal version (app_v0.py
), Fastapi framework is used to deploy inference locally. Here we adopt the Streamlit to deploy this multi-modal inference (app.py
), configured by the deploy
block. For more details about this app. Jump here
The configuration file is written in YAML format that contains blocks and subblocks. It is recommended to create a config file for each individual model and place these files under the config
folder.
# cnn_config.yaml
model:
type: "cnn"
shape_in: (3,256,256) # default: [C=3,W=256,H=256]
num_classes: 4 # default: 4
initial_filters: 4 # default: 8
num_fc1: 100 # default: 100
dropout_rate: 0 # default: 0.25
train:
skip_loading: true # set true if you want to use the existing data in output folder.
data:
dataset: "data/raw_data/brain-tumor-classification-mri/Testing" # raw data
# !! if skip_loading == False, run this training WILL DELETE THIS 'output' DIRECTORY TO REMOVE DATA FROM PREVIOUS TRAINING WITH YOUR PERMISSION.
output: "data/processed_data/brain-tumor-classification-mri"
train_set: "train" # path to train_set: output/train_set
val_set: "val" # path to val_set: output/valset
load: # if skip_loading == True, this subblock is ignored.
train_ratio: 0.8 # default: 0.8 split folders into train and val sets by this ratio
image_size: (256,256) # default: [H=256,W=256] transform to this image size.
mlflow:
enabled: true
uri: "http://localhost:5000"
experiment: "fineTuneCNN"
batch_size: 64 # default: 64
epochs: 3
learning_rate: 3e-4 # default: 3e-4
verbose: true
device: 'cpu' # most commonly “cpu” or “cuda”, but also potentially “mps”, “xpu”, “xla” or “meta”.
save_path: "checkpoints/cnn_model/tunecnn"
checkpoint_name: "checkpoint.pt"
eval:
model: "checkpoints/saved_models/cnn_model.pt"
image_size: (256,256)
batch_size: 64 # default: 64
data: "data/raw_data/brain-tumor-classification-mri/Testing"
device: 'cpu'
report: "output/test.md"
deploy:
model: "checkpoints/deployed_models/cnn_model.pt"
input: "data/processed_data/brain-tumor-classification-mri/train/glioma_tumor/image.jpg" # Path to the input image
device: 'cpu'
This configuration file should contain four blocks: model
, train
, eval
and deploy
for the entire AI-model pipeline.
This dataset contains medical images intended solely for research, educational, and informational purposes.
- Switch between models for different classification tasks
- Data pipeline for additional datasets beyond Kaggle, e.g., TCIA API
- Object detection for identifying and measuring tumor size
- CNN model inference for a folder of 2D images
- UNet3D model inference for a folder of nii or nii.gz images
- Generalize UNet3D into UNet with a
dim
parameter to switch to 1D, 2D model. - Option to add validation during each epoch for UNet model.
- Option to save the best model for each epoch
- Better saving and loading model checkpoints. Check out monai.engines.SupervisedTrainer and monai.handlers.
- Deterministic training support
- Integrate UNETR model for 3D segmentation.
- Option to remove background(case 0) in UNet training when the background class may dominate the calculation and lead the network to optimise by just ignoring small segmentation classes.
- Logging module
This project is licensed under the MIT License.