Skip to content

Conversation

@jiemakel
Copy link

Without huge_tree=True, lxml parsing apparently fails on certain, even slightly largish responses (apparently of more than 9.5MB).

Because also recover=True, from the viewpoint of Sickle, this happens silently. I only noticed it happening because it results in losing also the resumption token and therefore ending the crawl, upon which I started to wonder why I had way less records than I should have had.

Alternatively, if one wanted to get fancy, one might want to add the XMLParser to use as an optional parameter passed to Sickle and from then on down to the OAIResponse. This would allow people to customize for themselves what kind of XML parsing behaviour they want. For this PR however, I opted for the most simple fix.

…sing fails silently on certain large response (apparently of more than 9.5MB)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant