Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers

1. good structure for scrapy with items and pipelines
2. automatically proxy changing
3. simply running - don't need to remember the command to run scrapy
4. flexible config- the crawler gets data by patterns in template/product.yml
5. save data to databases: mongo or es
6. applying pybloom for checking duplicate crawled data when crawling
7. stopping after time -

Install requirements.txt

$python app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers

Install requirements.txt

Files

README.md

Latest commit

History

README.md

File metadata and controls

Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers

Install requirements.txt