This project attempted to identify factors that describe prominent links between online communities on Reddit. Data was mined and processed from an online storage service, called Google BigQuery. After processing this dataset, we created a social network graph, that identified nodes as subreddits, and shared users as edges. A weight was assigned to these links: whenever a user would post a comment in two subreddits, the weighted value for the edge would increase. After this process was completed, we chose four centrality measures, which are widely used in social networking analysis, to use as a measure for the popularity of subreddits.
We had three research objectives: replicate results from a previous study, find addi- tional factors that are unique to Reddit, and use an advanced sentiment analysis tool to analyse the content of the comments in context. For each one of our three research objectives, we analysed different types of factors that we could correlate to the results obtained previously, and found several predictors that helped explain the variance ob- tained in the results. Finally, we provided detailed discussion and interpretation by comparing it to previous research done in this field, while suggesting future areas to explore.