Forked version of the Chemprop framework that allows the calculation of atomistic Jazzy and Kallisto features (https://www.nature.com/articles/s41598-023-30089-x) as described in Jazzy and Kallisto Features. The functionality currently only works for molecules and not for reactions.
This readme only contains information related to the extension of Chemprop with Jazzy and Kallisto. The official documentation of Chemprop can be found at https://github.com/chemprop/chemprop.
To install the library, refer to the Installation section. Bear in mind that this version of Chemprop can only be installed from source.
Special thanks go Shih-Cheng Li from MIT for his help and support with designing the integration.
- Documentation of Chemprop is available at https://chemprop.readthedocs.io/en/latest/. Note that this site is several versions behind. An up-to-date version of Read the Docs is forthcoming with the release of Chemprop v2.0.
- This README is currently the best source for documentation on more recently-added features.
- Please also see descriptions of all the possible command line arguments in our
args.py
file.
- Benchmark scripts - scripts from our 2023 paper, providing examples of many features using Chemprop v1.6.1
- ACS Fall 2023 Workshop - presentation, interactive demo, exercises on Google Colab with solution key
- Google Colab notebook - several examples, intended to be run in Google Colab rather than as a Jupyter notebook on your local machine
- nanoHUB tool - a notebook of examples similar to the Colab notebook above, doesn't require any installation
- YouTube video - lecture accompanying nanoHUB tool
- These slides provide a Chemprop tutorial and highlight additions as of April 28th, 2020
For small datasets (~1000 molecules), it is possible to train models within a few minutes on a standard laptop with CPUs only. However, for larger datasets and larger Chemprop models, we recommend using a GPU for significantly faster training.
To use chemprop
with GPUs, you will need:
- cuda >= 8.0
- cuDNN
Note for machines with GPUs: You may need to manually install a GPU-enabled version of PyTorch by following the instructions here. If you're encountering issues with Chemprop not using a GPU on your system after following the instructions below, check which version of PyTorch you have installed in your environment using conda list | grep torch
or similar. If the PyTorch line includes cpu
, please uninstall it using conda remove pytorch
and reinstall a GPU-enabled version using the instructions at the link above.
̶
git clone https://github.com/ghiander/chemprop-jazzy.git
cd chemprop-jazzy
conda env create -f environment.yml
conda activate jazzprop
pip install -e .
git clone https://github.com/ghiander/chemprop-jazzy.git
cd chemprop-jazzy
python<version> -m venv .venv
(used 3.8 at the time of writing - specifically Python 3.8.5)source <path_prefix_to_environment>/.venv/bin/activate
(e.g./home/ghiander/chemprop-jazzy/.venv/bin/activate
)python -m pip install flake8 pytest parameterized
(optional - for developers)python -m pip install -e .
(make sure you are insidechemprop-jazzy
)
Jazzy or Kallisto features (https://doi.org/10.1038/s41598-023-30089-x) can be included additionally in the graph convolution by adding the flag --additional_atom_descriptors
and including the options jazzy
and/or kallisto
.
chemprop_train --data_path tests/data/regression.csv --dataset_type regression --save_dir test_model_checkpoints --quiet --additional_atom_descriptors kallisto jazzy
Here are some examples of how to use Chemprop-Jazzy
# Includes both Kallisto and Jazzy atomic properties
chemprop_train --data_path tests/data/regression.csv --dataset_type regression --save_dir test_model_checkpoints --quiet --additional_atom_descriptors kallisto jazzy
# Includes only Kallisto atomic properties
chemprop_train --data_path tests/data/regression.csv --dataset_type regression --save_dir test_model_checkpoints --quiet --additional_atom_descriptors kallisto
# Includes only Jazzy atomic properties
chemprop_train --data_path tests/data/regression.csv --dataset_type regression --save_dir test_model_checkpoints --quiet --additional_atom_descriptors jazzy
# Includes Jazzy atomic and free hydration energy molecular properties
chemprop_train --data_path tests/data/regression.csv --dataset_type regression --save_dir test_model_checkpoints --quiet --additional_atom_descriptors jazzy --features_generator jazzy_hyd
# Includes Jazzy atomic and hydrogen-bond strength molecular properties
chemprop_train --data_path tests/data/regression.csv --dataset_type regression --save_dir test_model_checkpoints --quiet --additional_atom_descriptors jazzy --features_generator jazzy_hbs
# Chemprop automatically understands whether Jazzy or/and Kallisto were used to train the model
chemprop_predict --test_path tests/data/regression_small.csv --checkpoint_dir test_model_checkpoints --preds_path regression_preds.csv
import tempfile
import chemprop
import tarfile
def untar_file_to_folder(filepath, output_dir):
with tarfile.open(filepath) as tf:
tf.extractall(output_dir)
# Extract the model from an archive
model_tar = "tests/data/regression_model_jazzy_kallisto.tar.gz"
with tempfile.TemporaryDirectory() as tmp_folder:
untar_file_to_folder(model_tar, tmp_folder)
# Configure arguments
args_list = ['--test_path', '/dev/null',
'--preds_path', '/dev/null',
'--checkpoint_dir', tmp_folder]
args = chemprop.args.PredictArgs().parse_args(args_list)
# Load model in memory
model_objects = chemprop.train.load_model(args=args)
# Predict values for a list of SMILES
smiles_list = [['CCC'], ['CCCC'], ['OCC']]
print(chemprop.train.make_predictions(args=args,
smiles=smiles_list,
model_objects=model_objects))
>>> ...
[[-3.6102219301449665], [-3.5861583065877727], [-3.361401796920642]]
- To run the test suite use
pytest -v
- To run only the Jazzy unit/integration tests
pytest -v -m jazzy
- To run the only the unit tests use
pytest tests/test_unit