This package implements a Structured Support Vector Machine (SSVM) model for the molecule structure prediction of liquid chromatography (LC) tandem mass spectrometry data (MS²). This work is part of the publication:
"Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data",
Eric Bach, Emma L. Schymanski and Juho Rousu, 2022
We consider the output of an LC-MS² experiment as structured output. The structure is thereby assumed to be imposed by the observed retention orders (RO) of the MS features, i.e. MS¹-information, MS²-spectrum, and retention time (RT). We assume, that for each MS feature a set of potential molecular structures, the so-called candidate set, can be generated. The idea is to predict a ranking of the candidate structures associated with each features. The SSVM framework allows us to predict rankings that are not independent of each other, but are taking into account the observed ROs, which are assumed to give structure respectively additional constraints which improve the ranking.
That's how you install the package:
- Clone the package and change to the directory:
git clone https://github.com/aalto-ics-kepaco/msms_rt_ssvm
cd msms_rt_ssvm
- Create a conda environment and install dependencies:
conda env create -f environment.yml
conda activate lcms2struct
- Install the package:
pip install .
- Leave the package directory:
cd ..
- Clone the package-dependency "msmsrt_scorer", implementing the max-marginal (see Paper) inference, and change to the directory:
git clone https://github.com/aalto-ics-kepaco/msms_rt_score_integration
cd msms_rt_score_integration
- Install the "msmsrt_scorer" package (it is assumed that the conda environment is active):
pip install .
- (Optional) Change back to the msms_rt_ssvm directory and test the package:
cd ../msms_rt_ssvm
# Unpack test databases
gunzip --keep ssvm/tests/Bach2020_test_db.sqlite.gz
gunzip --keep ssvm/tests/Massbank_test_db.sqlite.gz
# Run the tests
python -m unittest discover -s ssvm/tests -p 'unittests*.py'
## Expected output ##
# .............s................s.....................s...................s.....s..................................s......
# ----------------------------------------------------------------------
# Ran 121 tests in 99.599s
#
# OK (skipped=6)
All code was developed and tested in a Linux environment. Other operating systems are not supported.
Example usages of the package can be found the repository of the experiments done for the manuscript.
If you use this package, please cite our original publication:
@article {Bach2022,
author = {Bach, Eric and Schymanski, Emma L. and Rousu, Juho},
title = {Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data},
elocation-id = {2022.02.11.480137},
year = {2022},
doi = {10.1101/2022.02.11.480137},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2022/04/27/2022.02.11.480137},
eprint = {https://www.biorxiv.org/content/early/2022/04/27/2022.02.11.480137.full.pdf},
journal = {bioRxiv}
}