A Python crawler uses Facebook Graph API to crawling fan page's public posts, comments, and reactions.
Using Facebook Graph API, that's all.
Facebook Page Crawler is built on Python 3 and use requests module. After clone this repository, use following command to install this module:
python setup.py develop
This crawler can be used under command line as:
facebook_page_crawler $app_id $app_secret $targets $since $until
Run following command:
python setup.py develop --uninstall
Facebook Page Crawler requires five arguments:
- app_id: app_id of your Facebook app, the will used to access Facebook Graph API.
- app_secret: app_secret of your Facebook app, the will used to access Facebook Graph API.
- targets: The page name you want to crawl.
- since: The date you want to start the crawling.
- until: The date you want to finish the crawling.
And other additional arguments:
- -att, --attachments: Default is False. Set to True will collect attachments of post and comments.
- -r, --reactions: Default is False. Set to True will collect reactions data. Because the number of reactions is too large, use it CAREFULLY!!!
- -api, --api-version: Default is v2.7. This will cange the version of Facebook Graph API, but currently this crawler only test under v2.7.
- -l, --limit: Default is 100. This argument will limit the number of feed or comments of each request, larger number will decrease the number of request.
- -d, --debug: Default is False. Enable debug mode to see additional information of crawling.
- -p, --process_num: Default is the number of your CPU. Parallel processing feeds at the same time.
- -w, --write: Default is True. Write to json files under
Results/
You can use this command to find some help:
facebook_page_crawler --help
facebook_page_crawler $APP_ID $APP_SECRET 'appledaily.tw' '2016-09-01 00:00:00' '2016-09-01 23:59:59'
facebook_page_crawler $APP_ID $APP_SECRET 'appledaily.tw' '2016-09-01 00:00:00' '2016-09-01 23:59:59' -r yes
facebook_page_crawler $APP_ID $APP_SECRET 'appledaily.tw,ETtoday' '2016-09-01 00:00:00' '2016-09-01 23:59:59'
This crawler use app_id, app_secret to get the token.
Please create an app at https://developers.facebook.com/ and use the app_id and app_secret at this crawler.
- Add tests
- Maybe publish to PyPI
- Counts of reactions and likes