webmagic-0.3.0
code4craft
released this
04 Sep 03:02
·
1025 commits
to develop
since this release
-
Change default XPath selector from HtmlCleaner to Xsoup.
Xsoup is an XPath selector based on Jsoup written by me. It has much better performance than HtmlCleaner.
Time of processing a page is reduced from 7~9ms to 0.4ms.
If Xsoup is not stable for your usage, just use
Spider.xsoupOff()
to turn off it and report an issue to me! -
Add cycle retry times for Site.
When cycle retry times is set, Spider will put the url which downloading failed back to scheduler, and retry after a cycle of queue.