- Auto Restart after once cycle finished
- Configuration to set time between two cycles
- Capability to start crawling process with same state in case of JVM crash/down or Server crash/down where it left while crash/shutdown occurred.
- Configuration to run crawler processes with different domains.
- Configuration to set domain wise different set of url filters
- Configuration to set domain wise different parsers
- Configuration to set robots.txt rules enable/disable
- Configuration to set maximum url visit per second
- Configuration to set maximum depth to visit
- Configuration to set maximum bytes per page to download
- Sitemaps parsing support
- Retry support with parsing
- Spring Boot
- Spring Integration
- Redis
- Jsoup
- ActiveMQ
- ElasticSearch
NOTE : It's still ongoing project, not ready to use yet.