You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+68-11
Original file line number
Diff line number
Diff line change
@@ -11,38 +11,95 @@ Neo LS-SVM is a modern [Least-Squares Support Vector Machine](https://en.wikiped
11
11
5. 🌀 Learns an affine transformation of the feature matrix to optimally separate the target's bins.
12
12
6. 🪞 Can solve the LS-SVM both in the primal and dual space.
13
13
7. 🌡️ Isotonically calibrated `predict_proba` based on the leave-one-out predictions.
14
+
8. 🎲 Asymmetric conformal Bayesian confidence intervals for classification and regression.
14
15
15
16
## Using
16
17
18
+
### Installing
19
+
17
20
First, install this package with:
18
21
```bash
19
22
pip install neo-ls-svm
20
23
```
21
24
25
+
### Classification and regression
26
+
22
27
Then, you can import `neo_ls_svm.NeoLSSVM` as an sklearn-compatible binary classifier and regressor. Example usage:
23
28
24
29
```python
25
30
from neo_ls_svm import NeoLSSVM
31
+
from pandas import get_dummies
26
32
from sklearn.datasets import fetch_openml
27
33
from sklearn.model_selection import train_test_split
28
-
from sklearn.pipeline import make_pipeline
29
-
from skrub import TableVectorizer # Vectorizes a pandas DataFrame into a NumPy array.
30
34
31
35
# Binary classification example:
32
-
X, y = fetch_openml("credit-g", version=1, return_X_y=True, as_frame=True, parser="auto")
33
-
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
34
-
model = make_pipeline(TableVectorizer(), NeoLSSVM())
35
-
model.fit(X_train, y_train)
36
-
print(model.score(X_test, y_test)) # 76.7% (compared to sklearn.svm.SVC's 70.7%)
36
+
X, y = fetch_openml("churn", version=3, return_X_y=True, as_frame=True, parser="auto")
37
+
X_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=0.15, random_state=42)
38
+
model = NeoLSSVM().fit(X_train, y_train)
39
+
model.score(X_test, y_test) # 93.1% (compared to sklearn.svm.SVC's 89.6%)
37
40
38
41
# Regression example:
39
42
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
40
-
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
41
-
model = make_pipeline(TableVectorizer(), NeoLSSVM())
42
-
model.fit(X_train, y_train)
43
-
print(model.score(X_test, y_test)) # 81.8% (compared to sklearn.svm.SVR's -11.8%)
43
+
X_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=0.15, random_state=42)
44
+
model = NeoLSSVM().fit(X_train, y_train)
45
+
model.score(X_test, y_test) # 82.4% (compared to sklearn.svm.SVR's -11.8%)
46
+
```
47
+
48
+
### Confidence intervals
49
+
50
+
Neo LS-SVM implements conformal prediction with a Bayesian nonconformity estimate to compute confidence intervals for both classification and regression. Example usage:
51
+
52
+
```python
53
+
from neo_ls_svm import NeoLSSVM
54
+
from pandas import get_dummies
55
+
from sklearn.datasets import fetch_openml
56
+
from sklearn.model_selection import train_test_split
57
+
58
+
# Load a regression problem and split in train and test.
59
+
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
60
+
X_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=50, random_state=42)
61
+
62
+
# Fit a Neo LS-SVM model.
63
+
model = NeoLSSVM().fit(X_train, y_train)
64
+
65
+
# Predict the house prices and confidence intervals on the test set.
We select all binary classification and regression datasets below 1M entries from the [AutoML Benchmark](https://arxiv.org/abs/2207.12560). Each dataset is split into 85% for training and 15% for testing. We apply `skrub.TableVectorizer` as a preprocessing step for `neo_ls_svm.NeoLSSVM` and `sklearn.svm.SVC,SVR` to vectorize the pandas DataFrame training data into a NumPy array. Models are fitted only once on each dataset, with their default settings and no hyperparameter tuning.
0 commit comments