- Overview
- Key Contributions
- Dataset
- Data Preprocessing
- Evaluation Strategy
- Models
- Dimensionality Reduction & Feature Selection
- Results
- Discussion
- Conclusion
This project builds and compares two classic classifiers—Support Vector Machine (SVM) and Decision Tree (DT)—to predict fetal health status (Normal, Suspect, Pathological) from Cardiotocogram (CTG) measurements. Careful handling of class imbalance and feature engineering maximizes detection of high‑risk cases.
- Balanced a 3‑class CTG dataset via manual oversampling to ensure equal representation.
- Demonstrated the effect of feature scaling, hold‑out evaluation, and stratified splitting on model performance.
- Benchmarked SVM vs. DT across accuracy and macro F1‑score, showing DT + RFE achieved the best results.
- Explored dimensionality reduction (PCA, LDA) and feature selection (Correlation + RFE) to optimize model performance.
- Source: Kaggle “Fetal Health Classification” (2,126 CTG records).
- Classes (original counts):
- Normal: 1,655
- Suspect: 295
- Pathological: 176
- Balancing
- Duplicated minority‑class samples to match the Normal class count (1,655 each).
- Scaling
- Applied
StandardScalerso all features have zero mean and unit variance.
- Applied
- Train/Test Split
- 80/20 stratified hold‑out split to preserve class ratios.
- Hold‑Out Method with stratification
- Metrics:
- Overall Accuracy
- Macro F1‑Score (gives equal weight to each class)
| Model | Key Settings |
|---|---|
| SVM | Linear kernel; C tuned via grid search |
| Decision Tree (DT) | Gini criterion; max depth tuned via cross‑validation |
- PCA (95% Variance)
- Reduced 21 features → 14 components; slight drop in accuracy.
- LDA (2 Components)
- Reduced to 2 dimensions; further drop in accuracy.
- Correlation + RFE
- Filtered out low‑correlation features, then applied Recursive Feature Elimination to pick the top 10.
- Best performance:
- DT + RFE → 91.5% accuracy
- SVM + RFE → 89.4% accuracy
| Configuration | Accuracy | Macro F1‑Score |
|---|---|---|
| SVM (raw features) | 88.0% | 0.805 |
| DT (raw features) | 89.7% | 0.812 |
| SVM + PCA | 87.1% | 0.79 |
| DT + PCA | 87.2% | 0.79 |
| SVM + LDA | 83.6% | 0.75 |
| DT + LDA | 86.6% | 0.80 |
| SVM + Correlation + RFE | 89.4% | 0.82 |
| DT + Correlation + RFE | 91.5% | 0.84 |
- Feature selection (Correlation + RFE) outperformed dimensionality reduction, highlighting the importance of retaining informative variables rather than compressing them.
- Decision Tree proved more balanced on minority classes than SVM, especially after RFE.
- PCA and LDA helped visualization but removed discriminative power for this multiclass problem.
Balancing the dataset, scaling features, and strategically selecting the most relevant CTG measurements enables a simple Decision Tree to reach 91.5% accuracy in fetal health classification—providing a lightweight, interpretable model for early risk detection.