We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import xgboost as xgb from sklearn.metrics import ( confusion_matrix, accuracy_score, precision_score, recall_score, roc_auc_score, ) def train_matcher(pairs_file: PathLike) -> None: pairs = [] for pair in read_pairs(pairs_file): if pair.judgement == Judgement.UNSURE: pair.judgement = Judgement.NEGATIVE pairs.append(pair) positive = len([p for p in pairs if p.judgement == Judgement.POSITIVE]) negative = len([p for p in pairs if p.judgement == Judgement.NEGATIVE]) log.info("Total pairs loaded: %d (%d pos/%d neg)", len(pairs), positive, negative) X, y = pairs_to_arrays(pairs) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) log.info("Training model with XGBoost...") model = xgb.XGBClassifier(use_label_encoder=False, eval_metric="logloss") model.fit(X_train, y_train) y_pred = model.predict(X_test) cnf_matrix = confusion_matrix(y_test, y_pred) print("Confusion matrix:\n", cnf_matrix) print("Accuracy:", accuracy_score(y_test, y_pred)) print("Precision:", precision_score(y_test, y_pred)) print("Recall:", recall_score(y_test, y_pred)) y_pred_proba = model.predict_proba(X_test)[:, 1] auc = roc_auc_score(y_test, y_pred_proba) print("Area under curve:", auc) model.save_model("/tmp/xgboost_v1.ubj")
Previous code for v1 gave:
Confusion matrix: [[ 19393 15738] [ 7085 111833]] Accuracy: 0.851845841258301 Precision: 0.8766334041435749 Recall: 0.9404211305269177 Area under curve: 0.8888057796734737
while the same features/data on xgboost gives this:
Confusion matrix: [[ 27154 8261] [ 5642 112992]] Accuracy: 0.9097494952904596 Precision: 0.9318697269345914 Recall: 0.9524419643609757 Area under curve: 0.9312480095583613
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Previous code for v1 gave:
while the same features/data on xgboost gives this:
The text was updated successfully, but these errors were encountered: