This python script crawls through all the completed textbooks in http://tbc-python.fossee.in/
It checks for any errors present in the uploaded python codes and this error data is stored in
error_log dictionary
- Codes with errors are fetched
- Chapters with broken urls are fetched
-
Python version 2.7.6
-
BeautifulSoup 4
If you’re using a recent version of Debian or Ubuntu Linux, you can install Beautiful Soup with the system package manager:
apt-get install python-bs4If unable to install with system package manager, try:
easy_install beautifulsoup4orpip install beautifulsoup4If you don’t have
easy_installorpipinstalled,
you can download Beautiful Soup source tarball and install it withsetup.py -
urllib2
-
Jinja2 template engine
easy_install jinja2orpip install jinja2
Run the following command from the directory where tbc_python_web_crawler.py is present.
python tbc_python_web_crawler.py http://tbc-python.fossee.in/completed-books/
- Run the following command from
/tbc-python-web-crawler/testingfolder.
python test_tbc_web_crawler.py
- Following tests are implemented in the
test_tbc_web_crawler.pyfile- Testing
get_chapter_errorsfunction with2 example errorsandNone example errors - Testing
get_chapter_errorsfunction with page loading error - Testing
get_detailsfunction with fetching chapter details - Testing
error_log_to_htmlfunction with sample html output
- Testing