Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling NaN values #91

Open
ghost opened this issue Apr 15, 2020 · 10 comments
Open

Handling NaN values #91

ghost opened this issue Apr 15, 2020 · 10 comments

Comments

@ghost
Copy link

ghost commented Apr 15, 2020

Thank you for this great project!

Would it be better to replace NaN values with empty string instead of #NUM! ?

@kevmo314
Copy link
Collaborator

Thanks for the ticket. Do you have an example/use case in mind?

I'm inclined to say no, because NaN is a number type in python, thus we should translate to a number type in Excel. Additionally I'm somewhat against silently cleaning up data as the user might not expect it, but a #NUM! is rather annoying to see so I might be able to be convinced otherwise.

@ghost
Copy link
Author

ghost commented Apr 15, 2020

I have a dataset holding numeric data is some fields (unknown in advance) with some NaN values. I need to apply pd.to_numeric(data, errors='ignore') before writing to excel. If I do not do so, excel does not treat them as numeric and sometime the data is too big to format manually in excel. I cannot get rid of NaN before passing to PyExcelerate, because empty string resets dtype to object. Would be nice to have an option to choose what to do with NaN in PyExcelerate. That would fulfill needs of different scenarios.

@kevmo314
Copy link
Collaborator

Ah I see, thanks for the information. I will think of a good API to do this. We've received a couple similar feature requests in the past, but yours lays out explicitly why the current solution isn't great.

I'm still not quite a fan of making the output directly configurable, but I think there might be a way to do this with a "data view" type of object that transforms NaN to empty string before PyExcelerate sees it. I'll think about this a little more and get a solution for you.

@ghost
Copy link
Author

ghost commented Apr 15, 2020

Thank you!

@kevmo314
Copy link
Collaborator

Ok I've thought about this issue more and I've struggled to come up with an ergonomic API as there are a lot of edge cases and unusual behaviors. I think this might be better done through a user-implemented transform.

For example, taking a pandas.DataFrame df, we can strip out the NaN's by enumerating it into a list

data = df.values.tolist()
nans_stripped = [['' if math.isnan(value) for value in row] for row in data]

This can then be passed to PyExcelerate:

wb.new_sheet("sheet name", data=nans_stripped)

This actually might end up being faster than directly passing df, as PyExcelerate has some optimizations around list objects. We don't have any pandas-specific optimizations.

@arbreazeale
Copy link

Adding to the answer by @kevmo314:

The following adaptation solved the #NUM! issue for my dataframe (which is an idiosycratic blend of numeric, NaN, and string elements). If the above solution doesn't work for the reader (as it didn't in my case), perhaps this one will.

data = df.values.tolist()
strippedNans = [['' if str(val) == 'nan' else val for val in row] for row in data]
wb.new_sheet("sheet name", data=strippedNans)

@amzar96
Copy link

amzar96 commented Oct 19, 2020

Here is my way. What I do is, get the null columns first then use the .fillna() function from Pandas.

null_cols = latest_df.columns[latest_df.isna().any()]
latest_df[null_cols] = df[null_cols].fillna("")

@pd2871
Copy link

pd2871 commented Dec 15, 2022

Alternative to @amzar96 answer , it can also be:

wb = Workbook()
df_n = df.fillna("")
df_cols = [df_n.columns.tolist()] 
df_val = df_n.values.tolist()
df_data = df_cols + df_val
wb.new_sheet('Sheet 1', data=df_data)
wb.save('data.xlsx')

@TacoBel42
Copy link

I use
df.fillna('', inplace=True)
but think we should have empty cells on #NUM! values, and provide choose by argument on worksheet/workbook, like:
workbook.new_sheet(sheetname: str, empty_error_cells: bool...

@kevmo314
Copy link
Collaborator

I use df.fillna('', inplace=True) but think we should have empty cells on #NUM! values, and provide choose by argument on worksheet/workbook, like: workbook.new_sheet(sheetname: str, empty_error_cells: bool...

Having an option sounds reasonable, I think we can add it in save(filename: str, empty_error_cells: bool). Welcome to PRs for that change :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants