-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme
35 lines (27 loc) · 1.91 KB
/
readme
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#The OSS startup acquisition & its effect on community dynamics.
This script serves as a tool for the master thesis of David Roegiers on Open Source Software at the chair of Strategic
Management and Innovation at the ETH Zurich. To grasp a better understanding of the context, I refer the reader to the
final report, that can be found under 'Report.pdf'.
The goal of this script is to collect organisations and repositories on GitHub and transform the development activities
to such an extent that the data can be used for statistical analysis in R.
For an understanding of how the classes and methods work, we advise the reader to go through the calls and comments
in the 'main.py'-file.
REMARK1 : The accompanied code has been written using Python 2.7.10
REMARK2 : This code uses the external package PyGitHub version 1.26.0
Please, first install the library on your device (http://pygithub.github.io/PyGithub/v1/introduction.html).
REMARK3 : This script is licensed under LGPL. I copy-lefted this from PyGithub, although it may not even be necessary.
In the script we talk about development actions/contributions. These are classified in a certain way, however we have used different terms in the code.
The mapping of these is as following.
Code Contribution - (commit)
Code Review - (merger)
Pull Request - (pullrequest)
Issue Creation - (issue)
Comments - (comment)
Technical Contributions are Code Contributions and Code Review (commit & merger)
Non-technical Contributions are Pull Request, Issue Creation, and Comments (pullrequest, issue, & comment)
Propositions for improvement:
- Redesign the class structure, make them inherited.
- Find the optimal number of dummy accounts (weight between API speed and multi-thread memory locking).
- Test the validity of the GitHub account selection heuristics.
- Look into OSS libraries for data analysis inside Python.
- See if integration with GHTorrent is interesting to speed up data collection.