Replies: 1 comment 1 reply
-
Bonjour Thomas, You could extend the protocol implementation and simply rewrite the 503 code into a 200 and maybe store the fact that it was a 503 somewhere in the metadata. This way you wouldn't have to modify the FetcherBolt class and the amount of code to write would really be minimal. Would that work? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I'm using Stormcrawler for many times and I just want to start by thank you for all this work done !
I'm looking for a way to parse webpage fetched without a 200 status code because we need to parse 503 page to understand errors (captcha, js enable needed...).
I don't see solution except rewriting entirely the com.digitalpebble.stormcrawler.bolt.FetcherBolt FetcherThread run method which emit only 200 webpage on default stream and others on statustream.
Do you think at any solution to solve my problems ?
If not, maybe can I contribute to this project to propose you something ?
Thanks you,
Regards,
Thomas
Beta Was this translation helpful? Give feedback.
All reactions