Skip to content

This project focuses on the classification of Invasive Ductal Carcinoma (IDC) in breast histopathology image patches.

Notifications You must be signed in to change notification settings

gandharpansare/IDC_Breast_Cancer_Classification

Repository files navigation

IDC Breast Cancer Classification 🧬

Overview

This project focuses on the classification of Invasive Ductal Carcinoma (IDC) in breast histopathology image patches. Using a dataset of 50×50 pixel images of tissue samples (either healthy or IDC-positive), we compare two machine learning pipelines:

  1. Hybrid Approach - CNN-based feature extraction followed by traditional ML classifiers.
  2. End-to-End Deep Learning - Fine-tuning pre-trained CNN models directly on labeled image patches.

📁 Project Files

📂 IDC_Breast_Cancer_Classification

├── IDC Breast Cancer Classification_Report.docx # Project report

├── idc-breast-cancer-classification_Final.ipynb # Jupyter notebook (main pipeline)

├── idc-breast-cancer-classification_Final.pdf # Notebook exported as PDF

├── idc-breast-cancer-classification_Final.html # Notebook exported as HTML

├── README.md # Project documentation (this file)


Dataset


Methodology

🔹 Data Preprocessing

  • Balanced subset: 12,000 samples per class
  • Images resized to 50×50 RGB
  • Dataset split: 70% training, 30% testing

🔹 Hybrid Modeling (Feature Extraction + ML Classifiers)

  • Used MobileNetV2 (ImageNet pretrained, top layers removed) for feature extraction
  • Trained models:
    • Logistic Regression
    • Random Forest
    • K-Nearest Neighbors
    • Gradient Boosting
    • XGBoost
  • Best performance: XGBoost with 79% accuracy

🔹 End-to-End Deep Learning

  • Fine-tuned CNN models:
    • MobileNetV2: 84.07% test accuracy
    • ResNet50: 84.57% test accuracy
    • EfficientNetB0: 85.40% test accuracy
  • Models trained with dropout, global average pooling, and a softmax head
  • Evaluation metrics: Accuracy, Confusion Matrix, ROC-AUC

Key Results

  • Best ML model (XGBoost): ~79% accuracy
  • Best Deep Learning model (EfficientNetB0): 85.4% accuracy
  • End-to-end fine-tuning significantly outperformed hybrid methods due to its ability to adapt feature extraction to task-specific patterns.

Future Enhancements

  • Integrate with real-time pathology tools for deployment
  • Evaluate generalizability on whole-slide histology images

👨‍💻 Contributor

  • Gandhar Ravindra Pansare (Master’s in Data Science – Indiana University Bloomington)
  • Guided by Professor Krista Li

⚖️ License

This project is licensed under the MIT License.


This project bridges computer vision and oncology using advanced machine learning to support faster, more accurate cancer diagnostics.

About

This project focuses on the classification of Invasive Ductal Carcinoma (IDC) in breast histopathology image patches.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published