-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling NaN values #91
Comments
Thanks for the ticket. Do you have an example/use case in mind? I'm inclined to say no, because NaN is a number type in python, thus we should translate to a number type in Excel. Additionally I'm somewhat against silently cleaning up data as the user might not expect it, but a #NUM! is rather annoying to see so I might be able to be convinced otherwise. |
I have a dataset holding numeric data is some fields (unknown in advance) with some NaN values. I need to apply pd.to_numeric(data, errors='ignore') before writing to excel. If I do not do so, excel does not treat them as numeric and sometime the data is too big to format manually in excel. I cannot get rid of NaN before passing to PyExcelerate, because empty string resets dtype to object. Would be nice to have an option to choose what to do with NaN in PyExcelerate. That would fulfill needs of different scenarios. |
Ah I see, thanks for the information. I will think of a good API to do this. We've received a couple similar feature requests in the past, but yours lays out explicitly why the current solution isn't great. I'm still not quite a fan of making the output directly configurable, but I think there might be a way to do this with a "data view" type of object that transforms NaN to empty string before PyExcelerate sees it. I'll think about this a little more and get a solution for you. |
Thank you! |
Ok I've thought about this issue more and I've struggled to come up with an ergonomic API as there are a lot of edge cases and unusual behaviors. I think this might be better done through a user-implemented transform. For example, taking a pandas.DataFrame
This can then be passed to PyExcelerate:
This actually might end up being faster than directly passing |
Adding to the answer by @kevmo314: The following adaptation solved the #NUM! issue for my dataframe (which is an idiosycratic blend of numeric, NaN, and string elements). If the above solution doesn't work for the reader (as it didn't in my case), perhaps this one will.
|
Here is my way. What I do is, get the null columns first then use the null_cols = latest_df.columns[latest_df.isna().any()]
latest_df[null_cols] = df[null_cols].fillna("") |
Alternative to @amzar96 answer , it can also be:
|
I use |
Having an option sounds reasonable, I think we can add it in |
Thank you for this great project!
Would it be better to replace NaN values with empty string instead of #NUM! ?
The text was updated successfully, but these errors were encountered: