In this article, we present a novel transformer-based multi-view network, MV-Swin-T, built upon the Swin Transformer architecture for mammographic image classification to fully exploit multi-view insights. Our contributions include:
- Designing a novel multi-view network entirely based on the transformer architecture, capitalizing on the benefits of transformer operations for enhanced performance.
- A novel "Multi-headed Dynamic Attention Block (MDA)" with fixed and shifted window features to enable self and cross-view information fusion from both CC and MLO views of the same breast.
- Addressing the challenge of effectively combining data from multiple views or modalities, especially when images may not align correctly.
- We present results using the publicly available CBIS-DDSM And VinDr-Mammo dataset.
If you find this work useful, please consider citing our paper:
@article{sarker2024mv,
title={MV-Swin-T: Mammogram Classification with Multi-view Swin Transformer},
author={Sarker, Sushmita and Sarker, Prithul and Bebis, George and Tavakkoli, Alireza},
journal={arXiv preprint arXiv:2402.16298},
year={2024}
}