ChemFold

ChemFold provides several methods for computing train-validation-test splits, designed for both ordinary ML and federated ML tasks involving small molecules. Details can be found in the following publication: https://doi.org/10.1186/s13321-021-00576-2. Following methods are included:

Random split
Sphere exclusion clustering based split
Locality sensitive hashing (LSH) based split
Scaffold trees

Installation of ChemFold

ChemFold can be installed by cloning its repository. First we need two dependencies rdkit (at least 2021.03.3) and cython:

conda install -c conda-forge rdkit ## at least 2021.03.3
pip install cython
pip install -e .

Example of running scaffold-network

python -m chemfold.scaffold_network --infolder <root folder of melloddy_tuner_output> \
                                    --out <folder to write output to> \
                                    --params_file <location of parameters.json>

For the parameters.json must contain values for ["key"]["key"] and ["lsh"]["nfolds"] entries. This method also works in federated setting and guarantees consistent folding results across multiple parties.

Example of running random split

Here is an example executing random split:

python -m chemfold.random_split --infolder <root folder of melloddy_tuner_output> \
                                --out <folder to write output to> \
                                --params_file <location of parameters.json>

This method also works in federated setting and guarantees consistent folding results across multiple parties.

Example of running sphere exclusion clustering

python -m chemfold.sphere_exclusion --x chembl_23mini_x.npy \
                                    --dists 0.2 0.4 0.6 \
                                    --out chembl_23mini_clusters.npy

where chembl_23mini_x.npy is the sparse matrix (either scipy.sparse saved in .npy or matrix market, ending with .mtx) of descriptor values such as ECFPs. This command does sequence of sphere exclusion clusterings for distances 0.2, 0.4 and 0.6.

Note this implementation of sphere exclusion can be used only for non-federated setting and does not create consistent results if used by different parties.

Example of getting Locality sensitive hashing

MELLODDY TUNER (see next section) directly offers a LSH-based fold splitting. The respective fold-file can be found in files_4_ml/T11_fold_vector.npy.

Example of running analyses

After calculating the fold using one of the above functions analyses can be run. The performance difference (needs SparseChem (https://github.com/melloddy/SparseChem) runs before), label and data imbalance as well as similarity of chemical substances within one fold can be analyzed.

python -m chemfold.analyze --inp <ChemFold folder including the output files from the folding methods, e.g., sn_scaff_folds.npy> \
                           --out <folder to write output to> \
                           --psc <location of SparseChem folder inclusing the models folder>
                           --baseline_prefix <prefix used in SparseChem that corresponds to the baseline folding methods>
                           --analysis <which analyses to run, choose from 'all', 'performance', 'imbalance' and 'similarity'>

Inputs from MELLODDY-TUNER

ChemFold uses as input processed molecular data, for which can be obtained from MELLODDY-TUNER.

Here we outline basic steps to get MELLODDY-TUNER installed and data preparation executed, for in-depth guide you can check MELLODDY-TUNER's manual.

Clone MELLODDY-TUNER

First, clone the git repository from the MELLODDY gitlab repository:

git clone -b release/1.0 https://github.com/melloddy/MELLODDY-TUNER.git

Create enviroment

Create your own enviroment from the given yml file with:

cd MELLODDY-TUNER
conda env create -f melloddy_pipeline_env.yml

Activate the environment:

conda activate melloddy_pipeline

Install MELLODDY-TUNER

You have to install the melloddy-tuner package with pip:

pip install -e .

0. Prepare Input Files for MELLODDY-TUNER

See specifications in MELLODDY-TUNER README.md under section Prepare Input Files.

1. Run Data Prepration Script

python bin/prepare_4_melloddy.py \
--structure_file {path/to/your/structure_file_T2.csv} \
--activity_file {/path/to/your/activity_data_file_T4.csv} \
--weight_table {/path/to/your/weight_table_T3.csv} \
--config_file {/path/to/the/distributed/parameters.json} \
--output_dir {path/to/the/output_directory} \
--run_name {name of your current run} \
--number_cpu {number of CPUs to use} \
--ref_hash {path/to/the/provided/ref_hash.json} \

Example for ChEMBL:

python bin/prepare_4_melloddy.py \
--structure_file {path/to/your/chembl25_T2.csv} \
--activity_file {/path/to/your/chembl25_T4.csv} \
--weight_table {/path/to/your/chembl25_T3.csv} \
--config_file {/tests/structure_preparation_test/example_parameters.json} \
--output_dir {path/to/the/output_directory} \
--run_name {name of your current run} \
--number_cpu {number of CPUs to use} \
--ref_hash {/tests/structure_preparation_test/ref_hash.json}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
chemfold		chemfold
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemFold

Installation of ChemFold

Example of running scaffold-network

Example of running random split

Example of running sphere exclusion clustering

Example of getting Locality sensitive hashing

Example of running analyses

Inputs from MELLODDY-TUNER

Clone MELLODDY-TUNER

Create enviroment

Install MELLODDY-TUNER

0. Prepare Input Files for MELLODDY-TUNER

1. Run Data Prepration Script

About

Releases

Packages

Languages

License

melloddy/ChemFold

Folders and files

Latest commit

History

Repository files navigation

ChemFold

Installation of ChemFold

Example of running scaffold-network

Example of running random split

Example of running sphere exclusion clustering

Example of getting Locality sensitive hashing

Example of running analyses

Inputs from MELLODDY-TUNER

Clone MELLODDY-TUNER

Create enviroment

Install MELLODDY-TUNER

0. Prepare Input Files for MELLODDY-TUNER

1. Run Data Prepration Script

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages