This library implements a novel method for mapping MaveDB scoreset data to GA4GH Variation Representation Specification (VRS 2.0) objects, enhancing interoperability for genomic medicine applications. See Arbesfeld et. al. (2024) for a preprint edition of the mapping manuscript, or download the resulting mappings directly.
If you make use of this software or the resultant mappings, please cite the manuscript:
Jeremy A. Arbesfeld, Estelle Y. Da, James S. Stevenson, Kori Kuzma, Anika Paul, Tierra Farris, Benjamin J. Capodanno, Sally B. Grindstaff, Kevin Riehle, Nuno Saraiva-Agostinho, Jordan F. Safer, Aleksandar Milosavljevic, Julia Foreman, Helen V. Firth, Sarah E. Hunt, Sumaiya Iqbal, Melissa S. Cline, Alan F. Rubin, Alex H. Wagner. bioRxiv 2023.06.20.545702; doi: https://doi.org/10.1101/2023.06.20.545702
- Universal Transcript Archive (UTA): see README for setup instructions. Users with access to Docker on their local devices can use the available Docker image; otherwise, start a relatively recent (version 14+) PostgreSQL instance and add data from the available database dump.
- SeqRepo: see README for setup instructions. The SeqRepo data directory must be writeable; see specific instructions here for more.
- Gene Normalizer: see documentation for data setup instructions.
- blat: Must be available on the local PATH and executable by the user. Otherwise, its location can be set manually with the
BLAT_BIN_PATH
env var. See the UCSC Genome Browser FAQ for download instructions.
Install from PyPI:
python3 -m pip install dcd-mapping
Use the dcd-map
command with a scoreset URN, eg
$ dcd-map urn:mavedb:00000083-c-1
Output is saved in the format <URN>_mapping_results_<ISO datetime>.json
in the directory specified by the environment variable MAVEDB_STORAGE_DIR
, or ~/.local/share/dcd-mapping
by default.
Use dcd-map --help
to see other available options.
Notebooks for manuscript data analysis and figure generation are provided within notebooks/analysis
. See notebooks/analysis/README.md
for more information.
Clone the repo
git clone https://github.com/ave-dcd/dcd_mapping
cd dcd_mapping
Create and activate a virtual environment
python3 -m virtualenv venv
source venv/bin/activate
Install as editable and with developer dependencies
python3 -m pip install -e '.[dev,tests]'
Add pre-commit hooks
pre-commit install
Run tests with pytest
pytest