You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+109-75
Original file line number
Diff line number
Diff line change
@@ -5,19 +5,22 @@
5
5
Neo LS-SVM is a modern [Least-Squares Support Vector Machine](https://en.wikipedia.org/wiki/Least-squares_support_vector_machine) implementation in Python that offers several benefits over sklearn's classic `sklearn.svm.SVC` classifier and `sklearn.svm.SVR` regressor:
6
6
7
7
1. ⚡ Linear complexity in the number of training examples with [Orthogonal Random Features](https://arxiv.org/abs/1610.09072).
8
-
2. 🚀 Hyperparameter free: zero-cost optimization of the regularisation parameter γ and kernel parameter σ.
8
+
2. 🚀 Hyperparameter free: zero-cost optimization of the [regularisation parameter γ](https://en.wikipedia.org/wiki/Ridge_regression#Tikhonov_regularization) and [kernel parameter σ](https://en.wikipedia.org/wiki/Radial_basis_function_kernel).
9
9
3. 🏔️ Adds a new tertiary objective that minimizes the complexity of the prediction surface.
10
10
4. 🎁 Returns the leave-one-out residuals and error for free after fitting.
11
11
5. 🌀 Learns an affine transformation of the feature matrix to optimally separate the target's bins.
12
12
6. 🪞 Can solve the LS-SVM both in the primal and dual space.
13
-
7. 🌡️ Isotonically calibrated `predict_proba` based on the leave-one-out predictions.
14
-
8. 🎲 Asymmetric conformal Bayesian confidence intervals for classification and regression.
13
+
7. 🌡️ Isotonically calibrated `predict_proba`.
14
+
8. ✅ Conformally calibrated `predict_quantiles` and `predict_interval`.
15
+
9. 🔔 Bayesian estimation of the predictive standard deviation with `predict_std`.
16
+
10. 🐼 Pandas DataFrame output when the input is a pandas DataFrame.
15
17
16
18
## Using
17
19
18
20
### Installing
19
21
20
22
First, install this package with:
23
+
21
24
```bash
22
25
pip install neo-ls-svm
23
26
```
@@ -45,53 +48,61 @@ model = NeoLSSVM().fit(X_train, y_train)
45
48
model.score(X_test, y_test) # 82.4% (compared to sklearn.svm.SVR's -11.8%)
46
49
```
47
50
48
-
### Confidence intervals
51
+
### Predicting quantiles
49
52
50
-
Neo LS-SVM implements conformal prediction with a Bayesian nonconformity estimate to compute confidence intervals for both classification and regression. Example usage:
53
+
Neo LS-SVM implements conformal prediction with a Bayesian nonconformity estimate to compute quantiles and prediction intervals for both classification and regression. Example usage:
51
54
52
55
```python
53
-
from neo_ls_svm import NeoLSSVM
54
-
from pandas import get_dummies
55
-
from sklearn.datasets import fetch_openml
56
-
from sklearn.model_selection import train_test_split
57
-
58
-
# Load a regression problem and split in train and test.
59
-
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
60
-
X_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=50, random_state=42)
In addition to quantile prediction, you can use `predict_interval` to predict conformally calibrated prediction intervals. Compared to quantiles, these focus on reliable coverage over quantile accuracy. Example usage:
117
+
118
+
```python
119
+
# Compute prediction intervals for the houses in the test set.
When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of `ŷ_test_interval` yields:
128
+
129
+
| house_id | 0.025 | 0.975 |
130
+
|-----------:|---------:|---------:|
131
+
| 1357 | 114283.0 | 245849.2 |
132
+
| 2367 | 85518.3 | 114411.4 |
133
+
| 2822 | 147165.9 | 292179.2 |
134
+
| 2126 | 81788.7 | 122838.1 |
135
+
| 1544 | 94507.1 | 284062.6 |
136
+
103
137
## Benchmarks
104
138
105
139
We select all binary classification and regression datasets below 1M entries from the [AutoML Benchmark](https://arxiv.org/abs/2207.12560). Each dataset is split into 85% for training and 15% for testing. We apply `skrub.TableVectorizer` as a preprocessing step for `neo_ls_svm.NeoLSSVM` and `sklearn.svm.SVC,SVR` to vectorize the pandas DataFrame training data into a NumPy array. Models are fitted only once on each dataset, with their default settings and no hyperparameter tuning.
@@ -109,29 +143,29 @@ We select all binary classification and regression datasets below 1M entries fro
0 commit comments