Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: could not convert string to float: 'Rural Adj' #128

Open
sbpatel2009 opened this issue Aug 15, 2023 · 2 comments
Open

ValueError: could not convert string to float: 'Rural Adj' #128

sbpatel2009 opened this issue Aug 15, 2023 · 2 comments

Comments

@sbpatel2009
Copy link

I am attempting to fit the Preprocessor on training data that only includes categorical variables. It appears that when I pass an empty list to the num_feats parameter, the Preprocessor attempts to convert all columns to floats and returns an error.

This code:

cat_feats = ['Rural']
num_feats = []

preprocessor = Preprocessor()
preprocessor.fit(data=X, cat_feats=cat_feats, num_feats=num_feats)

...returns this error:

ValueError: could not convert string to float: 'Rural Adj'

I may be missing it, but I don't see how to handle this issue in the documentation. Does this class require there to be numerical features?

@matteo4diani
Copy link
Contributor

matteo4diani commented Nov 9, 2023

Hi @sbpatel2009, sorry for the late reply and thanks for contributing to auton-survival 🙂

You may already have figured it out by yourself, but the reason for that error lies in this bit of code:

if self._num_feats:
self.scaler = scaler.fit(df[self._num_feats])
else:
self.scaler = scaler.fit(df)

As you can see from the code, the preprocessor assumes that if no num_feats are provided, all feats are num_feats.
I'm going to work on this and other bugs in the next week(s). If you want to contribute a fix, you are more than welcome 😄

A workaround can be to add a dummy numerical column to the input DataFrame:

from auton_survival.preprocessing import Preprocessor
import pandas as pd


cat_feats = ['Rural']
num_feats = ['Dummy']

X = pd.DataFrame({'Rural': ['yes', 'no', 'maybe'], 'Dummy': [0, 0, 0]})

preprocessor = Preprocessor()
X = preprocessor.fit_transform(data=X, cat_feats=cat_feats, num_feats=num_feats)

X = X.drop(columns=['Dummy'])

print(X)

@sbpatel2009
Copy link
Author

sbpatel2009 commented Nov 10, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants