New to the team? Start here.
Name | Role | GitHub |
---|---|---|
Dan Moran | Tech Lead | @danxmoran |
Emily Munro-Ludders | Scrum Master | @emunrolu |
Jeff Korte | Product Owner | @JeffKorte |
Kathy Reinold | Data Modeler | @kreinold |
Raaid Arshad | Software Engineer | @raaidbroad |
- DSP Monsters - Team for repositories under the
broadinstitute
org - Emerald Writers - Team for repositories under the
DataBiosphere
org
Linked Data definitions for the DSP Core Data Model, with extensions for unmodeled datasets.
- DSP Data Models - Data Model definitions and examples
Pipelines for moving data into the Jade Data Repository.
- ClinVar - ETL pipeline for the ClinVar dataset
- ENCODE - ETL pipeline for the ENCODE dataset
- Dog Aging - ETL pipeline for the Dog Aging Project dataset
Infrastructure, configuration, and shared code used to manage developing and deploying our services.
- sbt plugins - Common build plugins used across Monster projects
- Helm charts - Custom Helm charts for pieces of Monster infrastructure
- Core infrastructure - Terraform modules and Helm release definitions for Monster's GCP environments
The repositories in this section are still being used, but we're trying to move away from them.
Our first stabs at data ingest envisioned a framework of dataset-agnostic services. We shifted away from that pattern because it introduced significant overhead vs. custom pipelines using common command-line tools.
- Transporter - Bulk file-transfer system
- Monster ETL - Apache Beam workflows for ingest
- Extractors - Tools / services for mechanically transforming external metadata into Beam-friendly JSON
- Ingest Deploy - Terraform and Kubernetes configuration for deploying ingest components into GCP, based on the now-abandoned dsp-k8s-deploy
- Storage Libs - Utility libraries for I/O against external storage systems