Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory improvements #138

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Memory improvements #138

wants to merge 3 commits into from

Conversation

domoritz
Copy link
Contributor

I just made a pr for a branch from @karol-szuster because these changes seem to be a valuable improvement. Haven't tested them but wanted to document.

When csv file is longer than 1000 lines only first attempt to fetch it content
will succeed. Any subsequent attempt will return only first 1000 lines. This
patch should fix this problem but it may affect memory footprint.
@pudo
Copy link
Contributor

pudo commented Aug 6, 2015

Wow, that should almost be a bug. +1

@karol-szuster
Copy link

I have created those patches because in my project I'm very often processing very large excel and csv files (larger than 500K lines). Without those patches I was not able to parse some of them because OOM killer was killing python interpreter on machine witch has 4GB of memory. With those patches python processes consumes about 400MB of RAM.
I didn't want to created pr without unit test but if you like those patches then you can take them and they should work find. At least I haven't experienced any issues with them.

Karol Szuster added 2 commits August 25, 2015 13:01
headers_guess functions reads entire content of file into memory. This leads
to huge memory footprint when working on large excel files. This patch fixes
this problem by reading only first 1000 lines of file into memory. This should
be sufficient to find the header and saves a lot of memory.
@pwalsh
Copy link
Member

pwalsh commented Jul 20, 2016

@pudo would be nice if you have time to check this, as you expressed interest to me. If good, I can rebase and merge.

@rufuspollock
Copy link
Member

This seem super simple improvements - looking at the patches and LGTM. I guess we need to rebase against master to get these in ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants