This project focuses on the classification of Invasive Ductal Carcinoma (IDC) in breast histopathology image patches. Using a dataset of 50×50 pixel images of tissue samples (either healthy or IDC-positive), we compare two machine learning pipelines:
- Hybrid Approach - CNN-based feature extraction followed by traditional ML classifiers.
- End-to-End Deep Learning - Fine-tuning pre-trained CNN models directly on labeled image patches.
📂 IDC_Breast_Cancer_Classification
├── IDC Breast Cancer Classification_Report.docx # Project report
├── idc-breast-cancer-classification_Final.ipynb # Jupyter notebook (main pipeline)
├── idc-breast-cancer-classification_Final.pdf # Notebook exported as PDF
├── idc-breast-cancer-classification_Final.html # Notebook exported as HTML
├── README.md # Project documentation (this file)
- Source: Breast Histopathology Images – Kaggle
- Size: 277,524 image patches (50×50 pixels)
- Classes:
0– Healthy Tissue1– IDC-Positive (Invasive Ductal Carcinoma)
- Balanced subset: 12,000 samples per class
- Images resized to 50×50 RGB
- Dataset split: 70% training, 30% testing
- Used MobileNetV2 (ImageNet pretrained, top layers removed) for feature extraction
- Trained models:
- Logistic Regression
- Random Forest
- K-Nearest Neighbors
- Gradient Boosting
- XGBoost
- Best performance: XGBoost with 79% accuracy
- Fine-tuned CNN models:
- MobileNetV2: 84.07% test accuracy
- ResNet50: 84.57% test accuracy
- EfficientNetB0: 85.40% test accuracy
- Models trained with dropout, global average pooling, and a softmax head
- Evaluation metrics: Accuracy, Confusion Matrix, ROC-AUC
- Best ML model (XGBoost): ~79% accuracy
- Best Deep Learning model (EfficientNetB0): 85.4% accuracy
- End-to-end fine-tuning significantly outperformed hybrid methods due to its ability to adapt feature extraction to task-specific patterns.
- Integrate with real-time pathology tools for deployment
- Evaluate generalizability on whole-slide histology images
- Gandhar Ravindra Pansare (Master’s in Data Science – Indiana University Bloomington)
- Guided by Professor Krista Li
This project is licensed under the MIT License.
This project bridges computer vision and oncology using advanced machine learning to support faster, more accurate cancer diagnostics.