Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[http- unzip_http] handle errors in retrieving URLs #2655

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

midichef
Copy link
Contributor

This PR catches a few more error cases when retrieving URLs. Every new case handled is a situation I triggered in testing.

So for example, the stack trace on a nonexistent zipfile from a 404 with a particular server configuration:

        File "/home/midichef/.venv/lib/python3.12/site-packages/visidata/loaders/unzip_http.py", line 158, in infoiter
            self.zip_size = int(resp.headers['Content-Length'])
        File "/home/midichef/.venv/lib/python3.12/site-packages/urllib3/_collections.py", line 260, in __getitem__
            val = self._container[key.lower()]
        KeyError: 'content-length'

becomes cannot open URL: status code 404.

The web servers that host data tend to be poorly maintained, so visidata users will run into such errors unusually often.

Tested on Python 3.8 and Python 3.12 to make sure the Exception classes exist in the oldest and newest libraries.

@@ -149,7 +149,13 @@ def namelist(self):
return list(r.filename for r in self.infoiter())

def infoiter(self):
resp = self.http.request('HEAD', self.url)
urllib3 = vd.importExternal('urllib3')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (unzip_http.py) should be taken directly from https://github.com/saulpw/unzip-http/blob/master/unzip_http.py, with the vd.importExternal line being manually substituted. It's not a clean process but I didn't/don't expect unzip_http to be updated that often. They've already drifted a bit, but if we could bring them into sync again and make these changes in a way that keeps them as close as possible, that's my preference. Probably the easiest way is to let these errors percolate up and then handle them at a visidata layer instead of in unzip_http.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One complication is, unzip-http uses urllib3, so the exceptions would be from urllib3 and can't be caught in loader/archive.py which does not import it.

I submitted a draft patch that transforms the urllib3 errors into regular urllib errors. What do you think of that? It makes sense in the context of visidata, but it's quite odd in the context of unzip-http by itself.

@midichef midichef force-pushed the http_error_handling branch from ce2e971 to 21d078b Compare December 27, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants