Skip to content

UCIDataScienceInitiative/redditHackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UCI Data Science Initiative Hackathon: reddit

This respository will help you hit the ground running in the DSI hackathon! Included are lists of resources for performing data analysis tasks, as well as starter scripts in several popular langauges. If you have any suggestions or improvements, please open a pull request on GitHub.

Dataset

See http://snap.stanford.edu/data/web-Reddit.html

Introduction Slides

https://docs.google.com/a/uci.edu/presentation/d/1MnqnSn9p_oYTcdobct6WyLIQz3drYjOYzlP_-Aw9Ey8/edit?usp=sharing

Starter scripts

  • R: parse/load.r
  • python: parse/load.py

Extraction scripts

  • python: scrap/scrap.py: extracts comment text from reddit HTML

Databases

Consider using UCI's own AsterixDB for exploring reddit data.

Useful libraries & tools

R:

  • ggplot2
  • ggvis
  • plyr

python:

  • scikit-learn
  • pandas

About

Script to scrap user comments from Reddit html files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published