Handling NaN values #91

ghost · 2020-04-15T16:19:10Z

Thank you for this great project!

Would it be better to replace NaN values with empty string instead of #NUM! ?

kevmo314 · 2020-04-15T16:45:02Z

Thanks for the ticket. Do you have an example/use case in mind?

I'm inclined to say no, because NaN is a number type in python, thus we should translate to a number type in Excel. Additionally I'm somewhat against silently cleaning up data as the user might not expect it, but a #NUM! is rather annoying to see so I might be able to be convinced otherwise.

ghost · 2020-04-15T19:19:55Z

I have a dataset holding numeric data is some fields (unknown in advance) with some NaN values. I need to apply pd.to_numeric(data, errors='ignore') before writing to excel. If I do not do so, excel does not treat them as numeric and sometime the data is too big to format manually in excel. I cannot get rid of NaN before passing to PyExcelerate, because empty string resets dtype to object. Would be nice to have an option to choose what to do with NaN in PyExcelerate. That would fulfill needs of different scenarios.

kevmo314 · 2020-04-15T19:32:30Z

Ah I see, thanks for the information. I will think of a good API to do this. We've received a couple similar feature requests in the past, but yours lays out explicitly why the current solution isn't great.

I'm still not quite a fan of making the output directly configurable, but I think there might be a way to do this with a "data view" type of object that transforms NaN to empty string before PyExcelerate sees it. I'll think about this a little more and get a solution for you.

ghost · 2020-04-15T19:53:00Z

Thank you!

kevmo314 · 2020-07-18T03:07:21Z

Ok I've thought about this issue more and I've struggled to come up with an ergonomic API as there are a lot of edge cases and unusual behaviors. I think this might be better done through a user-implemented transform.

For example, taking a pandas.DataFrame df, we can strip out the NaN's by enumerating it into a list

data = df.values.tolist()
nans_stripped = [['' if math.isnan(value) for value in row] for row in data]

This can then be passed to PyExcelerate:

wb.new_sheet("sheet name", data=nans_stripped)

This actually might end up being faster than directly passing df, as PyExcelerate has some optimizations around list objects. We don't have any pandas-specific optimizations.

arbreazeale · 2020-07-23T21:44:28Z

Adding to the answer by @kevmo314:

The following adaptation solved the #NUM! issue for my dataframe (which is an idiosycratic blend of numeric, NaN, and string elements). If the above solution doesn't work for the reader (as it didn't in my case), perhaps this one will.

data = df.values.tolist()
strippedNans = [['' if str(val) == 'nan' else val for val in row] for row in data]
wb.new_sheet("sheet name", data=strippedNans)

amzar96 · 2020-10-19T12:37:40Z

Here is my way. What I do is, get the null columns first then use the .fillna() function from Pandas.

null_cols = latest_df.columns[latest_df.isna().any()]
latest_df[null_cols] = df[null_cols].fillna("")

pd2871 · 2022-12-15T10:12:22Z

Alternative to @amzar96 answer , it can also be:

wb = Workbook()
df_n = df.fillna("")
df_cols = [df_n.columns.tolist()] 
df_val = df_n.values.tolist()
df_data = df_cols + df_val
wb.new_sheet('Sheet 1', data=df_data)
wb.save('data.xlsx')

TacoBel42 · 2023-01-12T16:04:58Z

I use
df.fillna('', inplace=True)
but think we should have empty cells on #NUM! values, and provide choose by argument on worksheet/workbook, like:
workbook.new_sheet(sheetname: str, empty_error_cells: bool...

kevmo314 · 2023-01-12T16:15:15Z

I use df.fillna('', inplace=True) but think we should have empty cells on #NUM! values, and provide choose by argument on worksheet/workbook, like: workbook.new_sheet(sheetname: str, empty_error_cells: bool...

Having an option sounds reasonable, I think we can add it in save(filename: str, empty_error_cells: bool). Welcome to PRs for that change :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling NaN values #91

Handling NaN values #91

ghost commented Apr 15, 2020

kevmo314 commented Apr 15, 2020

ghost commented Apr 15, 2020

kevmo314 commented Apr 15, 2020

ghost commented Apr 15, 2020

kevmo314 commented Jul 18, 2020

arbreazeale commented Jul 23, 2020

amzar96 commented Oct 19, 2020

pd2871 commented Dec 15, 2022 •

edited

Loading

TacoBel42 commented Jan 12, 2023

kevmo314 commented Jan 12, 2023

Handling NaN values #91

Handling NaN values #91

Comments

ghost commented Apr 15, 2020

kevmo314 commented Apr 15, 2020

ghost commented Apr 15, 2020

kevmo314 commented Apr 15, 2020

ghost commented Apr 15, 2020

kevmo314 commented Jul 18, 2020

arbreazeale commented Jul 23, 2020

amzar96 commented Oct 19, 2020

pd2871 commented Dec 15, 2022 • edited Loading

TacoBel42 commented Jan 12, 2023

kevmo314 commented Jan 12, 2023

pd2871 commented Dec 15, 2022 •

edited

Loading