This respository will help you hit the ground running in the DSI hackathon! Included are lists of resources for performing data analysis tasks, as well as starter scripts in several popular langauges. If you have any suggestions or improvements, please open a pull request on GitHub.
See http://snap.stanford.edu/data/web-Reddit.html
- R:
parse/load.r
- python:
parse/load.py
- python:
scrap/scrap.py
: extracts comment text from reddit HTML
Consider using UCI's own AsterixDB for exploring reddit data.
R:
- ggplot2
- ggvis
- plyr
python:
- scikit-learn
- pandas