-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out why we can't run the netflix_to_wikidata script on all the movies #12
Comments
Looks like the network requests are just a bit flaky and eventually one gets stuck and times out. Maybe we're getting rate limited (we should look into that). |
Okay, I ran the code with the version in #17, with the exponential backoff and with
A 400 error means that there was something wrong with the request, with what we sent Wikidata. My guess is that there is a movie name like "Face/Off" that has a slash in the name or something, and it's not getting properly escaped, which would make the SPARQL invalid. Picture a movie like
which would turn into
And the quotes would be messed up. We know the error happens at or around item 4824 in the data, so we should just be able to look at the movie titles at that line and figure out what's going on. Please checkout #17 and run the code and try to figure out what's going on. |
Okay I was curious. I changed the iteration to:
and added error handling around
And got this:
Putting aside that this is unlikely to match any movies anyways, we should either:
|
Here's a few more that didn't work from my handling of the 400 error:
|
We should just run the entire thing, on all movies (should take about 90 minutes) and make sure this is no longer an issue, then close. To be clear, if we can download query data for all movies, this isn't an issue and can be closed. |
It seems to be crashing part of the way through. Let's post the stack trace in this bug and see if we can figure it out.
The text was updated successfully, but these errors were encountered: