Skip to content

webmagic-0.3.0

Compare
Choose a tag to compare
@code4craft code4craft released this 04 Sep 03:02
· 1025 commits to develop since this release
  • Change default XPath selector from HtmlCleaner to Xsoup.

    Xsoup is an XPath selector based on Jsoup written by me. It has much better performance than HtmlCleaner.

    Time of processing a page is reduced from 7~9ms to 0.4ms.

    If Xsoup is not stable for your usage, just use Spider.xsoupOff() to turn off it and report an issue to me!

  • Add cycle retry times for Site.

    When cycle retry times is set, Spider will put the url which downloading failed back to scheduler, and retry after a cycle of queue.