Skip to content

Latest commit

 

History

History
10 lines (10 loc) · 574 Bytes

README.md

File metadata and controls

10 lines (10 loc) · 574 Bytes

Project for crawling data from lazada, websosanh, compare.vn, cdiscount and cungmua with many cooling wrappers


1. good structure for scrapy with items and pipelines
2. automatically proxy changing
3. simply running - don't need to remember the command to run scrapy
4. flexible config- the crawler gets data by patterns in template/product.yml
5. save data to databases: mongo or es
6. applying pybloom for checking duplicate crawled data when crawling
7. stopping after time -

Install requirements.txt


$python app.py