Add a way to recover original and transformed features from OpenML #67

Innixma · 2024-07-18T17:20:03Z

One thing I want to mention that I think could be quite useful is adding a way to recover original and transformed features from openml.

Something like that:

df, y = repo.openml_dataframe(dataset="airplane", fold=2) # gets the raw columns from the dataset
X, y = repo.openml_transformed_features(dataset="airplane", fold=2)  # gets the features as provided to the model

This would allow to use Tabrepo to train TabPFN models (probably with larger scales that what they currently use). Also it would make it easier to train new models and add them in tabrepo.

The text was updated successfully, but these errors were encountered:

Innixma · 2024-07-18T17:24:24Z

A note on this:

Getting the original features is easy, but getting the transformed ones is a bit nuanced.

The transformation logic in AutoGluon could change between versions. We could either accept this and warn the user, or cache the transformed features as part of the Repo creation process while fitting the original models.

Additionally, we ran the models via AutoMLBenchmark, which might have non-standard handling of the data, such as converting dtypes prior to sending to AutoGluon. I would need to double check if loading the data through OpenML is identical to loading it via AMLB.

geoalgo · 2024-07-18T18:40:48Z

Thanks makes lot of sense!
What we could have then is to support something like repo.openml_dataframe(dataset="airplane", fold=2) in the repository.

For the feature matrix, I see your point that this may change, we could perhaps just add a simple util (outside of EvaluationRepository) to cast this dataframe to a feature matrix by just calling AG featurizer. I believe this would also be valuable to get quickly a matrix to fit a model.

If we have those, we should ping TabPFN folks as this may be quite useful for their training and evaluations.

Innixma added this to the TabRepo 2.0 milestone Jul 18, 2024

Innixma mentioned this issue Jul 18, 2024

TabRepo 2.0 Feature Tracker #63

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to recover original and transformed features from OpenML #67

Add a way to recover original and transformed features from OpenML #67

Innixma commented Jul 18, 2024

Innixma commented Jul 18, 2024

geoalgo commented Jul 18, 2024

Add a way to recover original and transformed features from OpenML #67

Add a way to recover original and transformed features from OpenML #67

Comments

Innixma commented Jul 18, 2024

Innixma commented Jul 18, 2024

geoalgo commented Jul 18, 2024