-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #321 from dpordomingo/extract-data-team-and-os
Extract data team and os
- Loading branch information
Showing
10 changed files
with
291 additions
and
892 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,256 +1,69 @@ | ||
# | ||
# README : | ||
# | ||
# It is accepted two different kinds of projects: | ||
# - without documentation -> the project card will point to the repository source code | ||
# - having documentation generated by [docsrv](src-d/docs) -> the project card will point to the docs | ||
# | ||
# PROJECT WITHOUT DOCS schema: | ||
# | ||
# - name: short name of the project as it will appear in the card title (i.e. kmcuda) | ||
# url: link to the repository source code (i.e. //github.com/src-d/kmcuda) | ||
# desc: project description that will appear in the project card ('/projects' page) | ||
# logo: (optional) svg name of the icon (under 'static/img/icons') without the '.svg' extension. | ||
# (i.e. projects/bblfsh for the image under static/img/icons/projects/bblfsh.svg) | ||
# If it's not provided, it will use a random codepill | ||
# | ||
# PROJECT HAVING DOCS schema: | ||
# | ||
# - name: (SAME AS ABOVE) | ||
# hostname: hostname of the documentation (without the protocol, i.e. engine.sourced.tech) | ||
# url: link to the docs root (it should have the format '//hostname', i.e. //engine.sourced.tech) | ||
# desc: (SAME AS ABOVE) plus description shown in the hero section at the project documentation site | ||
# repository: identifier of the Github project with the format 'owner/project-name' (i.e. src-d/engine) | ||
# minVersion: the minimum version of the project release containing documentation to be served. (i.e. v0.0.11) | ||
# It will ensure that it will not be generated/served documentation for old releases | ||
# languages: list of the languages whose API documentation will be generated | ||
# (supported: python, cpp, scala, go) | ||
# logo: (SAME AS ABOVE) | ||
# logosmall: project nav icon (SAME AS DEFINED FOR $LOGO) | ||
# | ||
sections: | ||
|
||
no_data_message: We still have things to polish here, soon to be released. Join our community to keep posted! | ||
categories: | ||
order: # all groups appearing here, will appear in the Landing | ||
order: # only groups appearing here, will appear in the Landing in this order | ||
- datasets | ||
- models | ||
- retrieval | ||
- languages | ||
- science | ||
- demos | ||
contents: | ||
- applications | ||
|
||
# ########## datasets ########## | ||
collection: | ||
|
||
datasets: | ||
name: datasets | ||
colors: {left: "#003ca1", right: "#656afa"} | ||
desc: Output from our pipeline for source code analysis, our open datasets provide a ready-to-use baseline for your next machine learning and code analysis projects | ||
title: Datasets | ||
name: Datasets | ||
projects: | ||
# no datasets added so far as legacy ones didn't use the current pipeline | ||
|
||
# ########## machine learning models ########## | ||
- name: Public Git Archive | ||
url: https://github.com/src-d/datasets/tree/master/PublicGitArchive | ||
|
||
models: | ||
name: models | ||
colors: {left: "#e4415a", right: "#ff6d4c"} | ||
desc: A selection of machine learning models trained using our tools over large datasets and ready to be used in your research or project with the supporting libraries | ||
title: Models | ||
name: Models | ||
projects: | ||
- name: id2vec | ||
url: //github.com/src-d/models/blob/master/id2vec/92609e70-f79c-46b5-8419-55726e873cfc.md | ||
desc: Source code identifier embeddings, where every identifier is represented by a dense vector; no splitting or stemming, later converted with quality loss | ||
repository: src-d/models | ||
- name: nbow | ||
url: //github.com/src-d/models/blob/master/nbow/1e3da42a-28b6-4b33-94a2-a5671f4102f4.md | ||
desc: Weighted bag-of-words where every word is a dense vector; trained over the code of the 140k top starred GitHub repositories | ||
repository: src-d/models | ||
- name: docfreq | ||
url: //github.com/src-d/models/blob/master/docfreq/f64bacd4-67fb-4c64-8382-399a8e7db52a.md | ||
desc: Document frequencies of code identifiers, i.e. how many projects contain a certain identifier after splitting & stemming; trained on 10M GitHub repos after de-duplication | ||
repository: src-d/models | ||
- name: topics | ||
url: //github.com/src-d/models/blob/master/topics/c70a7514-9257-4b33-b468-27a8588d4dfa.md | ||
desc: Topic modeling of Git repositories; trained over 10M GitHub repositories after de-duplication | ||
repository: src-d/models | ||
|
||
# ########## data retrieval tools ########## | ||
- name: Topic Modeling | ||
url: https://github.com/src-d/models#topics | ||
- name: Identifier Embeddings | ||
url: https://github.com/src-d/models#id2vec | ||
- name: TF/IDF BoW | ||
url: https://github.com/src-d/models#bow | ||
|
||
retrieval: | ||
name: data retrieval tools | ||
colors: {left: "#317d19", right: "#4ecc7b"} | ||
desc: A set of tools that allow you to discover, fetch, store, access, filter and extract features from just a single source code repository to tens of millions of repositories | ||
title: Code Retrieval Tools | ||
name: Code Retrieval | ||
projects: | ||
- name: engine | ||
hostname: engine.sourced.tech | ||
url: //engine.sourced.tech | ||
desc: the source{d} engine combines data retrieval and language analysis tools for scalable pipelines that process any number of Git repositories for source code analysis | ||
repository: src-d/engine | ||
minVersion: v0.0.11 | ||
languages: | ||
- python | ||
- scala | ||
logo: | ||
logosmall: | ||
- name: go-git | ||
hostname: go-git.sourced.tech | ||
url: //github.com/src-d/go-git | ||
desc: go-git is a highly extensible Git implementation in pure Go language | ||
repository: src-d/go-git | ||
minVersion: v4.0.0 | ||
languages: | ||
- go | ||
logo: | ||
logosmall: | ||
- name: rovers | ||
url: //github.com/src-d/rovers | ||
desc: rovers is a service to retrieve repository URLs from multiple code repository hosting providers, similarly to a search engine crawler | ||
repository: src-d/rovers | ||
minVersion: v2.5.3 | ||
languages: | ||
- go | ||
- name: borges | ||
url: //github.com/src-d/borges | ||
desc: borges reads code repository URLs, then collects and stores them at large scale by using a producer-consumer architecture | ||
repository: src-d/borges | ||
minVersion: v0.7.1 | ||
languages: | ||
- go | ||
- name: śiva | ||
hostname: siva.sourced.tech | ||
url: //siva.sourced.tech | ||
desc: śiva is an archiving format similar to TAR/ZIP, focused on allowing constant-time random file access, seekable access to contained files and concatenable files | ||
repository: src-d/go-siva | ||
minVersion: v1.1.3 | ||
languages: | ||
- go | ||
|
||
# ########## language analysis tools ########## | ||
- name: go-git | ||
url: https://github.com/src-d/go-git | ||
- name: Rovers | ||
url: https://github.com/src-d/rovers | ||
- name: Borges | ||
url: https://github.com/src-d/borges | ||
|
||
languages: | ||
name: language analysis tools | ||
colors: {left: "#c2732a", right: "#f18406"} | ||
desc: A toolset that enables you to identify with speed and precision programming languages from source code files and turn them into universal abstract syntax trees (UASTs) | ||
title: Code Analysis Tools | ||
name: Code Analysis | ||
projects: | ||
- name: babelfish | ||
hostname: bblf.sh | ||
url: //bblf.sh | ||
desc: babelfish is a self-hosted server for universal source code parsing, turning code files into Universal Abstract Syntax Trees (UASTs) | ||
repository: bblfsh/bblfshd | ||
logo: projects/bblfsh | ||
minVersion: v2.1.1 | ||
languages: | ||
- go | ||
- name: enry | ||
hostname: enry.sourced.tech | ||
url: //enry.sourced.tech | ||
desc: enry is a faster source code file programming language detector based on github/linguist and toolbox that ignores binary or vendored files | ||
repository: src-d/enry | ||
minVersion: v1.5.2 | ||
languages: | ||
- go | ||
- name: babelfish tools | ||
url: //github.com/bblfsh/tools | ||
desc: babelfish tools are easy-to-use command line tools for simple code analysis, such as tokenizer, cyclomatic complexity, npath complexity, patch | ||
repository: bblfsh/tools | ||
languages: | ||
- go | ||
|
||
# ########## machine learning tools ########## | ||
- name: Babelfish | ||
url: https://doc.bblf.sh/ | ||
- name: Gitbase | ||
url: https://github.com/src-d/gitbase | ||
- name: Engine | ||
url: https://github.com/src-d/engine | ||
- name: Lookout | ||
url: https://github.com/src-d/lookout | ||
|
||
science: | ||
name: machine learning tools | ||
colors: {left: "#832fcc", right: "#c05cea"} | ||
desc: Our ML tools range from feature extraction on top of source code abstract syntax trees to lightning-fast, large scale clustering algorithms running on GPUs | ||
title: Machine Learning | ||
name: Machine Learning | ||
projects: | ||
- name: ml | ||
url: //github.com/src-d/ml | ||
desc: sourced.ml provides a framework for Machine Learning on Source Code (MLoSC) over UASTs, including identifier embeddings, document frequencies, topic modeling and more | ||
repository: src-d/ast2vec | ||
minVersion: 0.3.5-alpha | ||
languages: | ||
- python | ||
- name: modelforge | ||
url: //github.com/src-d/modelforge | ||
desc: modelforge is the foundation for storing and sharing machine learning models, with an extensible registry backend and using the ASDF storage format | ||
repository: src-d/modelforge | ||
minVersion: 0.3.1-alpha | ||
languages: | ||
- python | ||
- name: kmcuda | ||
url: //github.com/src-d/kmcuda | ||
desc: kmcuda is a large-scale K-means and K-nn implementation that supports diverse distance metrics and can be accelerated using multiple NVIDIA GPUs (CUDA) | ||
repository: src-d/kmcuda | ||
minVersion: 6.2.0 | ||
languages: | ||
- python | ||
- cpp | ||
- name: minhashcuda | ||
url: //github.com/src-d/minhashcuda | ||
desc: minhashcuda is a large-scale weighted MinHash implementation optimized for low memory and high speed by running on multiple NVIDIA GPUs (CUDA) | ||
repository: src-d/minhashcuda | ||
minVersion: 1.1.1 | ||
languages: | ||
- cpp | ||
- python | ||
- name: wmd-relax | ||
url: //github.com/src-d/wmd-relax | ||
desc: wmd-relax is a large-scale Word Mover's Distance implementation optimized for speed by using google/or-tools that is compatible with spaCy | ||
repository: src-d/wmd-relax | ||
minVersion: v1.2.6 | ||
languages: | ||
- python | ||
- cpp | ||
|
||
# ########## demos ########## | ||
|
||
demos: | ||
name: demos | ||
colors: {left: "#ffba34", right: "#fff444"} | ||
desc: Demos are use case examples based on our tech stack and which both help you to get started with them as well as provide real-world, concrete functionality | ||
projects: | ||
- name: dashboard | ||
url: //github.com/bblfsh/dashboard | ||
desc: babelfish dashboard is a visualization tool that uses the babelfish universal code parser to display UASTs and its details in a human-friendly manner | ||
repository: bblfsh/dashboard | ||
languages: | ||
- js | ||
- go | ||
- name: vecino | ||
url: //github.com/src-d/vecino | ||
desc: vecino is a CLI app to discover the most similar Git repositories to the one provided through matching or synonymical source code identifiers | ||
repository: src-d/vecino | ||
languages: | ||
- python | ||
- name: tmsc | ||
url: //github.com/src-d/tmsc | ||
desc: tmsc is a CLI tool that applies topic modeling on source code to discover the topics of a repository the user provides | ||
repository: src-d/tmsc | ||
minVersion: 0.1.1-alpha | ||
languages: | ||
- python | ||
- name: hercules | ||
url: //github.com/src-d/hercules | ||
desc: hercules (and its labours) calculates and displays various Git repository statistics as code burndown, developer ownership, file & developer copulas | ||
repository: src-d/hercules | ||
minVersion: v2 | ||
languages: | ||
- go | ||
- python | ||
# further demos are either legacy or WIP | ||
|
||
# ########## other projects not appearing in the landing ########## | ||
- name: sourced.ml | ||
url: https://github.com/src-d/ml | ||
|
||
others: | ||
name: others | ||
colors: {left: "#888088", right: "#BBBBBB"} | ||
desc: random and unrelared projects | ||
applications: | ||
title: Applications | ||
name: Applications | ||
projects: | ||
- name: landing | ||
hostname: landing.sourced.tech | ||
url: //github.com/src-d/landing | ||
desc: landing of source{d} | ||
repository: src-d/landing | ||
languages: | ||
- js | ||
- html | ||
- css | ||
- name: Gemini | ||
url: https://github.com/src-d/gemini | ||
- name: Hercules | ||
url: https://github.com/src-d/hercules |
Oops, something went wrong.