Skip to content

Latest commit

 

History

History

graph_data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

GitHub Graph Data

This folder contains different graphs (raw data and/or graphical representation) extracted from the GHArchive database. The graph visualizations are made with Gephi.

Structure

The folder is orgonised as following :

  • data source
    • graph name
      • periode
        • data.csv
        • graph.csv
        • graph.pdf

Graphs

To reduce the scope the repo are only the top 1000 Java projects in terms of stars (list available here).

repo_developer

(repo)<--[number of contributions]--(user)

SQL query :

select repo_name, login, contributions from `tx01-234015.java_projects.stars` stars join (
SELECT p.repo.name as repo_name, actor.login as login, count(id) as contributions
FROM `githubarchive.month.201801` p
where date(created_at) between date('2018-01-01')
                         and date('2018-02-01')
and type in ('PullRequestReviewCommentEvent','IssueCommentEvent', 'PullRequestEvent', 'CommitCommentEvent', 'PullRequestEvent', 'PullRequestReviewEvent')
group by repo.name, actor.login
) on repo_name = stars.name
order by contributions desc

Yet, a contribution is described as an event of one of the following GitHub event type :

  • PullRequestReviewCommentEvent
  • IssueCommentEvent
  • PullRequestEvent
  • CommitCommentEvent
  • PullRequestEvent
  • PullRequestReviewEvent

co_contributors

This graph is based on the repo_developer graph.

(repo)<--[number of contributions]--(user)

SQL query :

SELECT name as repo1, t2.name2 as repo2, count(login) as contributors FROM `tx01-234015.GHA.contributions_201801` t1
join ( SELECT name as name2, login as login2 FROM `tx01-234015.GHA.contributions_201801` ) t2
on login = t2.login2
WHERE name <> name2
GROUP BY repo1, repo2
order by contributors desc