You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One thing I want to mention that I think could be quite useful is adding a way to recover original and transformed features from openml.
Something like that:
df, y = repo.openml_dataframe(dataset="airplane", fold=2) # gets the raw columns from the dataset
X, y = repo.openml_transformed_features(dataset="airplane", fold=2) # gets the features as provided to the model
This would allow to use Tabrepo to train TabPFN models (probably with larger scales that what they currently use). Also it would make it easier to train new models and add them in tabrepo.
The text was updated successfully, but these errors were encountered:
Getting the original features is easy, but getting the transformed ones is a bit nuanced.
The transformation logic in AutoGluon could change between versions. We could either accept this and warn the user, or cache the transformed features as part of the Repo creation process while fitting the original models.
Additionally, we ran the models via AutoMLBenchmark, which might have non-standard handling of the data, such as converting dtypes prior to sending to AutoGluon. I would need to double check if loading the data through OpenML is identical to loading it via AMLB.
Thanks makes lot of sense!
What we could have then is to support something like repo.openml_dataframe(dataset="airplane", fold=2) in the repository.
For the feature matrix, I see your point that this may change, we could perhaps just add a simple util (outside of EvaluationRepository) to cast this dataframe to a feature matrix by just calling AG featurizer. I believe this would also be valuable to get quickly a matrix to fit a model.
If we have those, we should ping TabPFN folks as this may be quite useful for their training and evaluations.
From @geoalgo:
The text was updated successfully, but these errors were encountered: