This project is means of exploring open source activity associated with university of Wisconsin-Madison.
The project contains scripts for downloading information from GitHub and GitLab about open source projects and people and storing this information in a database for further analysis and includes a REST API for retreiving this information.
Results data in a variety of formats are contained in the following directories:
data/ │ ├── github/ │ ├── csv/ │ ├── json/ │ └── sql/ │ └── gitlab/ ├── csv/ ├── json/ └── sql/
Below are some sample findings from GitHub about respositories related to "Wisconsin":
Description | Count | Percent |
---|---|---|
All repositories | 3028 | 100% |
Repositories that are not part of the Wisconsin breast Cancer dataset or CS classes | 1748 | 58% |
Component | Count | Percent |
---|---|---|
Description | 2433 | 80% |
README | 2185 | 72% |
README Images | 256 | 8% |
Homepage | 151 | 5% |
License | 436 | 14% |
Description | README | README Images | License | Homepage | Count | Percent |
---|---|---|---|---|---|---|
✓ | ✓ | 1111 | 37% | |||
✓ | ✓ | ✓ | 149 | 5% | ||
✓ | ✓ | ✓ | 84 | 3% | ||
✓ | ✓ | ✓ | ✓ | 59 | 1.5% | |
✓ | ✓ | ✓ | ✓ | 31 | 1% | |
✓ | ✓ | ✓ | ✓ | ✓ | 17 | 0.5% |
To run the code in this project, you will need the following:
- A SQL Database - MySQL, MariaDB etc.
- Python3 or PHP
Before running the scripts in this project, you will need to create a database to store the data as described here.
Before running the scripts in this project, you will need to configure your code to use GitHub / GitLab access tokens as described here.
Once you have created a database and have configured the code with your access tokens, you are ready to run the data collection scripts as described here.
Distributed under the permissive MIT license. See the license for more information.
This software was created by the Data Science Institute at the University of Wisconsin-Madison