This work presents VoCo, a new method for Large-Scale 3D Medical Image Pre-training. We release a new benchmark, including 160K volumes (42M slices) for pre-training, 31M~1.2B params of pre-trained models, various pre-training recipes, and 50+ downstream tasks implementation.
Linshan Wu, Jiaxin Zhuang, and Hao Chen. "Large-Scale 3D Medical Image Pre-training with Geometric Context Priors". CVPR 2024 Extension.
- Models: 31M~1.2B params of pre-trained models.
- Downstream: 50+ tasks implementations (segmentation, classification, registration, vision-language).
- Datasets:
-
- PreCT-160K: The existing largest dataset in this field: 160K CT volumes (42M slices)
-
- VoComni: 20K volumes with pseudo labels (20 organ & tumor classes)
-
- VoCovid: Semi-supervised covid segmentation
- Pre-training:
-
- Fully-supervised: Pre-training with labeled data
-
- Self-supervised: Pre-training with unlabeled data
-
- Semi-supervised: Pre-training with labeled and unlabeled data
-
- Omni-supervised: Pre-training with labeled and unlabeled data
- CVPR version
- 中文解读
- 公众号
We provide various models for downstream tasks. For nnUNet, please refer to nnunet trainer.
- 'SSL_head' represents trained by Self-supervised pre-training.
- 'Omni' represents trained by Omni-supervised pre-training.
Model | Params | Checkpoint |
---|---|---|
VoComni_nnunet | 31M | Download |
VoCo_B_SSL_head | 53M | Download |
VoCo_L_SSL_head | 206M | Download |
VoCo_H_SSL_head | 818M | Download |
VoComni_B | 72M | Download |
VoComni_L | 290M | Download |
VoComni_H | 1.2B | Download |
We download checkpoints of previous methods from SuPreM for comparison (Thanks for their great efforts!).
Summary: We spent over 10,000 GPU hours in evaluating 50+ downstream tasks. SuPreM appears to be the best in previous methods. You can try these models in Downstream.
The path of pre-trained models should be organized as:
├── YOUR/DIRECTORY/OF/PRETRAINED/MODELS
├── VoComni_nnunet.pt
├── VoCo_B_SSL_head.pt
├── VoCo_L_SSL_head.pt
├── VoCo_H_SSL_head.pt
├── VoComni_B.pt
├── VoComni_L.pt
├── VoComni_H.pt
├── supervised_dodnet_unet_920.pth
├── supervised_clip_driven_universal_swin_unetr_2100.pth
├── self_supervised_unimiss_nnunet_small_5022.pth
├── self_supervised_nv_swin_unetr_5050.pt
├── self_supervised_models_genesis_unet_620.pt
└── supervised_suprem_swinunetr_2100.pth
import torch
import argparse
from monai.networks.nets import SwinUNETR
def load(model, model_dict):
# make sure you load our checkpoints
if "state_dict" in model_dict.keys():
state_dict = model_dict["state_dict"]
else:
state_dict = model_dict
current_model_dict = model.state_dict()
for k in current_model_dict.keys():
if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()):
print(k)
new_state_dict = {
k: state_dict[k] if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()) else current_model_dict[k]
for k in current_model_dict.keys()}
model.load_state_dict(new_state_dict, strict=True)
return model
parser = argparse.ArgumentParser(description="VoCo models")
parser.add_argument("--feature_size", default=48, type=int,
help="feature size: 48 Base (B), 96 Large (L), 192 Huge (H)")
parser.add_argument("--in_channels", default=1, type=int, help="number of input channels")
parser.add_argument("--out_channels", default=21, type=int, help="number of output channels")
parser.add_argument("--roi_x", default=96, type=int, help="roi size in x direction")
parser.add_argument("--roi_y", default=96, type=int, help="roi size in y direction")
parser.add_argument("--roi_z", default=96, type=int, help="roi size in z direction")
args = parser.parse_args()
model = SwinUNETR(img_size=(args.roi_x, args.roi_y, args.roi_z),
in_channels=args.in_channels,
out_channels=args.out_channels,
feature_size=args.feature_size,
use_v2=True)
# YOUR PATH OF PRETRAINED MODELS. MODIFY IT
pretrained_path = './pretrained/VoComni_B.pt'
model_dict = torch.load(pretrained_path, map_location=torch.device('cpu'))
model = load(model, model_dict)
NOTE: "roi" is flexible according to your own settings. Your need to adjust "in_channels" and "out_channels" for specific datasets. If "in_channels != 1" or "out_channels != 21", only the first layer or the last layer would not be loaded.
git clone https://github.com/Luffy03/Large-Scale-Medical
cd Large-Scale-Medical
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Please refer to Acknowledgment. Download our pre-processed downstream datasets for downstream tasks.
Please refer to Downstream: 50+ downstream tasks implementations.
We are uploading our fine-tuning checkpoints to BaiduYun to make sure fair comparisons.
Please refer to Acknowledgment. Download our PreCT-160K for pre-training.
WARNING:
- It requires 22.6 TB space to store the original datasets. For pre-training, it requires extra 30 TB space to cache the data, otherwise the pre-training will be very slow. And please store them in SSD.
- If you do not have enough space for PreCT-160K, you can try our VoComni dataset. It requires less than 10 TB only.
Please refer to:
- Fully-supervised pre-training.
- Self-supervised pre-training.
- Semi-supervised pre-training.
- Omni-supervised pre-training.
To facilitate the following research, we use VoCo to generate pseudo labels on 20K volumes, with 20 organ and tumor classes. Please refer to VoComni.
Please refer to VoCovid for Semi-supervised Covid Segmentation. Dataset can be downloaded from hugging face.
NOTE THAT we are not the authors of these datasets. Although all these datasets are publicly available for academic research, you need to cite the original works as shown in our paper. For certain datasets (e.g., WORD) that necessitate approval from the authors, you need to download it from the original link.
If you find this repo useful for your research, please consider citing the paper as follows:
@article{wu2024large,
title={Large-Scale 3D Medical Image Pre-training with Geometric Context Priors},
author={Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
journal={arXiv preprint arXiv:2410.09890},
year={2024}
}
@InProceedings{voco-v1,
author = {Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
title = {VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis},
booktitle = {CVPR},
month = {June},
year = {2024},
pages = {22873-22882}
}