IDC Breast Cancer Classification 🧬

Overview

This project focuses on the classification of Invasive Ductal Carcinoma (IDC) in breast histopathology image patches. Using a dataset of 50×50 pixel images of tissue samples (either healthy or IDC-positive), we compare two machine learning pipelines:

Hybrid Approach - CNN-based feature extraction followed by traditional ML classifiers.
End-to-End Deep Learning - Fine-tuning pre-trained CNN models directly on labeled image patches.

📁 Project Files

📂 IDC_Breast_Cancer_Classification

├── IDC Breast Cancer Classification_Report.docx # Project report

├── idc-breast-cancer-classification_Final.ipynb # Jupyter notebook (main pipeline)

├── idc-breast-cancer-classification_Final.pdf # Notebook exported as PDF

├── idc-breast-cancer-classification_Final.html # Notebook exported as HTML

├── README.md # Project documentation (this file)

Dataset

Source: Breast Histopathology Images – Kaggle
Size: 277,524 image patches (50×50 pixels)
Classes:
- 0 – Healthy Tissue
- 1 – IDC-Positive (Invasive Ductal Carcinoma)

Methodology

🔹 Data Preprocessing

Balanced subset: 12,000 samples per class
Images resized to 50×50 RGB
Dataset split: 70% training, 30% testing

🔹 Hybrid Modeling (Feature Extraction + ML Classifiers)

Used MobileNetV2 (ImageNet pretrained, top layers removed) for feature extraction
Trained models:
- Logistic Regression
- Random Forest
- K-Nearest Neighbors
- Gradient Boosting
- XGBoost
Best performance: XGBoost with 79% accuracy

🔹 End-to-End Deep Learning

Fine-tuned CNN models:
- MobileNetV2: 84.07% test accuracy
- ResNet50: 84.57% test accuracy
- EfficientNetB0: 85.40% test accuracy
Models trained with dropout, global average pooling, and a softmax head
Evaluation metrics: Accuracy, Confusion Matrix, ROC-AUC

Key Results

Best ML model (XGBoost): ~79% accuracy
Best Deep Learning model (EfficientNetB0): 85.4% accuracy
End-to-end fine-tuning significantly outperformed hybrid methods due to its ability to adapt feature extraction to task-specific patterns.

Future Enhancements

Integrate with real-time pathology tools for deployment
Evaluate generalizability on whole-slide histology images

👨‍💻 Contributor

Gandhar Ravindra Pansare (Master’s in Data Science – Indiana University Bloomington)
Guided by Professor Krista Li

⚖️ License

This project is licensed under the MIT License.

This project bridges computer vision and oncology using advanced machine learning to support faster, more accurate cancer diagnostics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IDC Breast Cancer Classification 🧬

Overview

📁 Project Files

Dataset

Methodology

🔹 Data Preprocessing

🔹 Hybrid Modeling (Feature Extraction + ML Classifiers)

🔹 End-to-End Deep Learning

Key Results

Future Enhancements

👨‍💻 Contributor

⚖️ License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
IDC Breast Cancer Classification_Report.docx		IDC Breast Cancer Classification_Report.docx
README.md		README.md
idc-breast-cancer-classification_Final.html		idc-breast-cancer-classification_Final.html
idc-breast-cancer-classification_Final.ipynb		idc-breast-cancer-classification_Final.ipynb
idc-breast-cancer-classification_Final.pdf		idc-breast-cancer-classification_Final.pdf

gandharpansare/IDC_Breast_Cancer_Classification

Folders and files

Latest commit

History

Repository files navigation

IDC Breast Cancer Classification 🧬

Overview

📁 Project Files

Dataset

Methodology

🔹 Data Preprocessing

🔹 Hybrid Modeling (Feature Extraction + ML Classifiers)

🔹 End-to-End Deep Learning

Key Results

Future Enhancements

👨‍💻 Contributor

⚖️ License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages