Skip to content

Replicate result of cv splits with the solution of a fit #791

@bsaldivaremc2

Description

@bsaldivaremc2

Best regards.

I read and tested the previous issues:
Fit best model on new data in Optuna mode
Saving mljar automl model for future use

I want to replicate manually the results found with the normal fit (compete, optuna or other).
For instance, I already run fit with optuna mode and some custom cv_indices and the results were stored.
The cv_indices has 5 elements, so I want to do:

mean_metric = 0
for train_index, val_index in cv_indices:
 model = AutoML(...something with the path)
 model.fit(X[train_index,:],y[train_index])
 y_pred = model.predict_proba(X[val_index,:])
 mean_metric+=some_metric(y[val_index],y_pred)
print(mean_metric/5)

And this print should be similar to the results reported during the fit:.

import pandas as pd
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML

# Initialize AutoML with custom CV
automl = AutoML(
    mode="Optuna",  # Or "Explain" / "Perform"
    ml_task="binary_classification",
    results_path=f"{DATADIR}AutoML_Optuna_Results_2",
    validation_strategy={
        "validation_type": "custom",
        "custom_cv": cv_indices
    },
    eval_metric="auc",optuna_time_budget=60*5
)

# Fit the model
automl.fit(xdf, target,cv=cv_indices)

Nonetheless, the result I get is higher (like if the model already saw the data, therefore already part of "training set")
I appreciate your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions