An educational demonstration of how machine learning can support clinical decision-making in oncology using transparent and interpretable models.
This project demonstrates an end-to-end workflow for AI-assisted clinical decision support:
- Narrative to Structured Data - Converting unstructured clinical text into structured variables
- Predictive Modeling - Building interpretable ML models for:
- 2-year cancer recurrence risk (Random Forest)
- Treatment toxicity risk (Logistic Regression)
- Explainability - Using SHAP values to explain predictions at both global and patient levels
- Clinical Integration - Presenting results in clinician-friendly formats
- Python 3.12 or higher
# Clone the repository
git clone <repository-url>
cd cas_demo
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtjupyter labThen open AI_Clinical_Oncology_Demo.ipynb in the JupyterLab interface.
cas_demo/
├── AI_Clinical_Oncology_Demo.ipynb # Main demonstration notebook
├── data/
│ └── lung_cancer_dataset_toxicity_survival_interaction.csv # Synthetic dataset
├── pyproject.toml # Project configuration
├── requirements.txt # Python dependencies
└── README.md # This file
| Package | Purpose |
|---|---|
| jupyterlab | Interactive notebook environment |
| pandas | Data manipulation and analysis |
| scikit-learn | Machine learning models |
| shap | Model explainability |
| matplotlib | Visualizations |
| seaborn | Statistical visualizations |
The project uses a synthetic dataset of ~10,000 lung cancer patients with the following characteristics:
- 24 variables including demographics, tumor characteristics, molecular markers, and outcomes
- Outcomes: 2-year recurrence, 5-year survival, treatment toxicity
- Educational purposes only - does not contain real patient information
This is an educational demonstration for decision-support exploration only. All predictions should be interpreted by qualified healthcare professionals in the context of the individual patient. The models are trained on synthetic data and should not be used for actual clinical decisions.
This project is for educational purposes.