You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+36-16
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ Add the following dependency to your pom.xml:
11
11
<dependency>
12
12
<groupId>com.github.peterbencze</groupId>
13
13
<artifactId>serritor</artifactId>
14
-
<version>1.1</version>
14
+
<version>1.2</version>
15
15
</dependency>
16
16
```
17
17
@@ -26,38 +26,58 @@ See the [Wiki](https://github.com/peterbencze/serritor/wiki) page.
26
26
BaseCrawler provides a skeletal implementation of a crawler to minimize the effort to create your own. First, create a class that extends BaseCrawler. In this class, you can customize the behavior of your crawler. There are callbacks available for every stage of crawling. Below you can find a sample implementation:
System.out.println("Could not get response from: "+ request.getCrawlRequest().getRequestUrl());
49
58
}
50
59
}
51
60
```
52
61
That's it! In just a few lines you can make a crawler that extracts and crawls every URL it finds, while filtering duplicate and offsite requests. You also get access to the WebDriver, so you can use all the features that are provided by Selenium.
53
62
54
-
By default, the crawler uses [HtmlUnitDriver](https://github.com/SeleniumHQ/selenium/wiki/HtmlUnitDriver) but you can also set your preferred WebDriver:
63
+
By default, the crawler uses [HtmlUnit headless browser](http://htmlunit.sourceforge.net/):
55
64
```java
56
-
config.setWebDriver(newChromeDriver());
65
+
publicstaticvoid main(String[] args) {
66
+
MyCrawler myCrawler =newMyCrawler();
67
+
68
+
// Use HtmlUnit headless browser
69
+
myCrawler.start();
70
+
}
57
71
```
72
+
Of course, you can also use any other browsers by specifying a corresponding WebDriver instance:
73
+
```java
74
+
publicstaticvoid main(String[] args) {
75
+
MyCrawler myCrawler =newMyCrawler();
58
76
59
-
## Support
60
-
The developers would like to thank [Precognox](http://precognox.com/) for the support.
77
+
// Use Google Chrome
78
+
myCrawler.start(newChromeDriver());
79
+
}
80
+
```
61
81
62
82
## License
63
83
The source code of Serritor is made available under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
0 commit comments