GitHub

A Scrapy crawler that crawls manga from a pre-selected website.

Splash is used to render the javascript on the page such that Scrapy can get the image url, which is rendered on load in the browser

A few bash scripts to automate the process.

By default, it will crawl three times. Each time it checks against MySQL database and local file system to see if all pages of a chapter are downloaded. Next time it will retry downloading missing pages.

Made the crawlManga.sh into a crontab that runs every 1 hours to check for new chapters. Will receive wechat notification if found.

pyasn1 1.4 works, while pyasn 1.6 does not for unknown reason

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
__pycache__		__pycache__
spiders		spiders
README.md		README.md
__init__.py		__init__.py
__init__.pyc		__init__.pyc
crawlChecker.py		crawlChecker.py
crawlManga.sh		crawlManga.sh
crawl_log		crawl_log
download.sh		download.sh
items.py		items.py
jsonReader.py		jsonReader.py
mangaList.py		mangaList.py
mangaList.pyc		mangaList.pyc
middlewares.py		middlewares.py
middlewares.pyc		middlewares.pyc
pipelines.py		pipelines.py
settings.py		settings.py
settings.pyc		settings.pyc
sqlConnector.py		sqlConnector.py
sqlConnector.pyc		sqlConnector.pyc
watchList.py		watchList.py
watchList.pyc		watchList.pyc
wechat.py		wechat.py
wechat.pyc		wechat.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

midknight24/mangaCrawler

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages